ID-Sim: An Identity-Focused Similarity Metric

CVPR 2026

Julia Chae$^1$, Nicholas Kolkin$^2$, Jui-Hsien Wang$^2$, Richard Zhang$^2$, Sara Beery$^{1*}$, Cusuh Ham$^{2*}$

$^1$ MIT CSAIL $^2$ Adobe Research $^*$ Equal advising

ID-Sim is a feed-forward similarity metric designed to capture fine-grained identity similarity — the kind humans rely on when distinguishing highly similar subjects across diverse viewpoints and contexts. Its key contribution is a training recipe combining diverse real instance-level data with generative augmentation for context diversity and hard negatives, enabling selective sensitivity: invariance to contextual changes while remaining sensitive to fine-grained identity differences.

📢 News

[Jun. 2026] Code with pretrained checkpoint, training, and evaluation released.
[May. 2026] ID-Sim accepted to CVPR 2026 and released on arXiv.

Coming soon:

Training dataset curation and generative augmentation pipelines
Selective sensitivity analysis tools

Installation

git clone https://github.com/JuliaChae/id_sim.git
cd id_sim

For inference only:

pip install -e .

For training and evaluation:

pip install -e ".[research]"

Repository structure:

id_sim/: installable package (from id_sim import id_sim)
training/ and evaluation/: source-checkout entrypoints
configs/standard_config.yaml: canonical training and evaluation config
examples/inference_pair.py: command-line pair-scoring example
docs/training.md: training documentation

Only id_sim/ is installed by pip install -e .; training/, evaluation/, dataset/, and util/ are source-checkout only.

Supported Backbones

`id_sim_type`	Backbone	Backbone weights
`dinov3_vitl16_cls_patch` (default)	DINOv3 ViT-L/16	Manual download required (see below)
`dinov3_vitb16_cls_patch`	DINOv3 ViT-B/16	Manual download required (see below)
`dinov2_vitl14_cls_patch`	DINOv2 ViT-L/14	Auto-downloaded from Meta
`dinov2_vitb14_cls_patch`	DINOv2 ViT-B/14	Auto-downloaded from Meta

DINOv3 ViT-L/16 is the default and recommended backbone — it achieves the best identity similarity performance. DINOv2 variants require no manual setup and are useful for quick experimentation.

Setup

1. DINOv3 backbone weights — required for DINOv3 models, one-time manual download

Meta gates the DINOv3 weights; ID-Sim does not redistribute them. Request and download the checkpoint(s) here, then place them under your cache directory:

mkdir -p models/id_sim_checkpoint/checkpoints

# Required for dinov3_vitl16_cls_patch (default)
mv dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth \
   models/id_sim_checkpoint/checkpoints/

# Required for dinov3_vitb16_cls_patch
mv dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth \
   models/id_sim_checkpoint/checkpoints/

<cache_dir> defaults to ./models/id_sim_checkpoint; override with the cache_dir= argument to id_sim(...). The DINOv3 model code is fetched automatically via torch.hub — no manual clone needed.

DINOv2 models (dinov2_vitl14_cls_patch, dinov2_vitb14_cls_patch) auto-download their backbone weights and require no manual setup.

2. ID-Sim adapter weights — downloaded automatically

On first call, ID-Sim downloads its adapter and MLP heads from Hugging Face into <cache_dir>. This happens automatically for all id_sim_type values.

Usage

Quick Start

import torch
from PIL import Image
from id_sim import id_sim

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = id_sim(pretrained=True, device=device)
# Loads dinov3_vitl16_cls_patch by default.
# For other backbones: id_sim(id_sim_type="dinov2_vitl14_cls_patch", ...)

img_a = preprocess(Image.open("image_a.jpg")).to(device)
img_b = preprocess(Image.open("image_b.jpg")).to(device)

with torch.inference_mode():
    distance = model(img_a, img_b, mode="cls")

print(float(distance.squeeze().cpu()))

Lower distance = more similar. Default scoring uses the cls mode.

A command-line example is available at examples/inference_pair.py:

python examples/inference_pair.py image_a.jpg image_b.jpg --device cuda

An interactive walkthrough is available at examples/quickstart.ipynb.

Extract Features

img = preprocess(Image.open("image.jpg")).to(device)

with torch.inference_mode():
    features = model.embed(img, mode="joint")  # returns both cls and patch
    # mode="cls"   → {"cls": tensor [1, D]}
    # mode="patch" → {"patch": tensor [1, N, D]}
    # mode="joint" → {"cls": tensor [1, D], "patch": tensor [1, N, D]}

Returns {"cls": cls_embedding, "patch": patch_embeddings}. Batched:

batch = torch.cat([preprocess(Image.open(p)).to(device) for p in paths], dim=0)

with torch.inference_mode():
    features = model.embed(batch)

Score Similarity

with torch.inference_mode():
    default_distance = model(img_a, img_b)
    cls_distance     = model(img_a, img_b, mode="cls")
    patch_distance   = model(img_a, img_b, mode="patch")
    joint_distance   = model(img_a, img_b, mode="joint")

Training

Full documentation: docs/training.md.

Prepare train and validation parquet files with local image paths in these columns:

ref_path   pos_path   neg_path

python3 -m training.train \
  --config configs/standard_config.yaml \
  "train.parquet_path_train=/path/to/train.parquet" \
  "train.parquet_path_val=/path/to/val.parquet" \
  "train.log_dir=./logs" \
  "job_tag=my_id_sim_run"

Evaluation

Run immediately with no dataset download:

python3 -m evaluation.eval_percep \
  --pretrained \
  --subjects2k_eval \
  --pods_eval \
  --output ./eval_outputs/pretrained

Benchmarks

Hugging Face (instant):

Benchmark	What it tests	Flag
`subjects2k`	Subject identity verification (same / different pairs)	`--subjects2k_eval`
`pods`	Object-level perceptual similarity ranking	`--pods_eval`

Auto-download (one-time setup, ~10 min first run):

Benchmark	What it tests	Flag
`aerialcattle`	Cattle re-identification from aerial imagery	`--aerialcattle_eval`

Downloads ~690 MB and extracts 46k images to data/aerialcattle/ on first run. Pass --aerialcattle_dir PATH to use an existing local copy instead.

Local dataset required:

Benchmark	What it tests	Download	Flag
`dreambench`	Text-to-image subject fidelity	DreamBench+	`--dreambench_eval --dreambench_dir PATH`
`cute`	Fine-grained visual similarity (triplet accuracy, easy + hard)	CUTE	`--cute_eval --cute_dir PATH`
`petface`	Pet face identity verification (cats + dogs)	PetFace	`--petface_eval --petface_dir PATH`
`deepfashion2`	Clothing instance retrieval	DeepFashion2 — eval metadata coming soon	`--deepfashion2_eval --deepfashion2_parquet FILE`

Evaluating a Trained Checkpoint

python3 -m evaluation.eval_percep \
  --config configs/standard_config.yaml \
  "job_tag=my_id_sim_run" \
  "eval.eval_checkpoint_selector=last" \
  "eval.output=./eval_outputs/my_id_sim_run"

Checkpoint selectors: last, all, epoch:<N>. Keep machine-specific paths in configs/local/.

Citation

@InProceedings{Chae_2026_CVPR,
    author    = {Chae, Julia and Kolkin, Nicholas and Wang, Jui-Hsien and Zhang, Richard and Beery, Sara and Ham, Cusuh},
    title     = {ID-Sim: An Identity-Focused Similarity Metric},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {11250-11262}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
dataset		dataset
docs		docs
evaluation		evaluation
examples		examples
id_sim		id_sim
scripts		scripts
tests		tests
training		training
util		util
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ID-Sim: An Identity-Focused Similarity Metric

CVPR 2026

📢 News

Table of Contents

Installation

Supported Backbones

Setup

1. DINOv3 backbone weights — required for DINOv3 models, one-time manual download

2. ID-Sim adapter weights — downloaded automatically

Usage

Quick Start

Extract Features

Score Similarity

Training

Evaluation

Benchmarks

Evaluating a Trained Checkpoint

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ID-Sim: An Identity-Focused Similarity Metric

CVPR 2026

📢 News

Table of Contents

Installation

Supported Backbones

Setup

1. DINOv3 backbone weights — required for DINOv3 models, one-time manual download

2. ID-Sim adapter weights — downloaded automatically

Usage

Quick Start

Extract Features

Score Similarity

Training

Evaluation

Benchmarks

Evaluating a Trained Checkpoint

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages