Skip to content

JuliaChae/id_sim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CVPR 2026

Julia Chae$^1$, Nicholas Kolkin$^2$, Jui-Hsien Wang$^2$, Richard Zhang$^2$, Sara Beery$^{1*}$, Cusuh Ham$^{2*}$

$^1$ MIT CSAIL Β  $^2$ Adobe Research Β  $^*$ Equal advising

Project Page | Paper | Bibtex

teaser

ID-Sim is a feed-forward similarity metric designed to capture fine-grained identity similarity β€” the kind humans rely on when distinguishing highly similar subjects across diverse viewpoints and contexts. Its key contribution is a training recipe combining diverse real instance-level data with generative augmentation for context diversity and hard negatives, enabling selective sensitivity: invariance to contextual changes while remaining sensitive to fine-grained identity differences.

πŸ“’ News

  • [Jun. 2026] Code with pretrained checkpoint, training, and evaluation released.
  • [May. 2026] ID-Sim accepted to CVPR 2026 and released on arXiv.

Coming soon:

  • Training dataset curation and generative augmentation pipelines
  • Selective sensitivity analysis tools

Table of Contents

Installation

git clone https://github.com/JuliaChae/id_sim.git
cd id_sim

For inference only:

pip install -e .

For training and evaluation:

pip install -e ".[research]"

Repository structure:

  • id_sim/: installable package (from id_sim import id_sim)
  • training/ and evaluation/: source-checkout entrypoints
  • configs/standard_config.yaml: canonical training and evaluation config
  • examples/inference_pair.py: command-line pair-scoring example
  • docs/training.md: training documentation

Only id_sim/ is installed by pip install -e .; training/, evaluation/, dataset/, and util/ are source-checkout only.

Supported Backbones

id_sim_type Backbone Backbone weights
dinov3_vitl16_cls_patch (default) DINOv3 ViT-L/16 Manual download required (see below)
dinov3_vitb16_cls_patch DINOv3 ViT-B/16 Manual download required (see below)
dinov2_vitl14_cls_patch DINOv2 ViT-L/14 Auto-downloaded from Meta
dinov2_vitb14_cls_patch DINOv2 ViT-B/14 Auto-downloaded from Meta

DINOv3 ViT-L/16 is the default and recommended backbone β€” it achieves the best identity similarity performance. DINOv2 variants require no manual setup and are useful for quick experimentation.

Setup

1. DINOv3 backbone weights β€” required for DINOv3 models, one-time manual download

Meta gates the DINOv3 weights; ID-Sim does not redistribute them. Request and download the checkpoint(s) here, then place them under your cache directory:

mkdir -p models/id_sim_checkpoint/checkpoints

# Required for dinov3_vitl16_cls_patch (default)
mv dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth \
   models/id_sim_checkpoint/checkpoints/

# Required for dinov3_vitb16_cls_patch
mv dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth \
   models/id_sim_checkpoint/checkpoints/

<cache_dir> defaults to ./models/id_sim_checkpoint; override with the cache_dir= argument to id_sim(...). The DINOv3 model code is fetched automatically via torch.hub β€” no manual clone needed.

DINOv2 models (dinov2_vitl14_cls_patch, dinov2_vitb14_cls_patch) auto-download their backbone weights and require no manual setup.

2. ID-Sim adapter weights β€” downloaded automatically

On first call, ID-Sim downloads its adapter and MLP heads from Hugging Face into <cache_dir>. This happens automatically for all id_sim_type values.

Usage

Quick Start

import torch
from PIL import Image
from id_sim import id_sim

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = id_sim(pretrained=True, device=device)
# Loads dinov3_vitl16_cls_patch by default.
# For other backbones: id_sim(id_sim_type="dinov2_vitl14_cls_patch", ...)

img_a = preprocess(Image.open("image_a.jpg")).to(device)
img_b = preprocess(Image.open("image_b.jpg")).to(device)

with torch.inference_mode():
    distance = model(img_a, img_b, mode="cls")

print(float(distance.squeeze().cpu()))

Lower distance = more similar. Default scoring uses the cls mode.

A command-line example is available at examples/inference_pair.py:

python examples/inference_pair.py image_a.jpg image_b.jpg --device cuda

An interactive walkthrough is available at examples/quickstart.ipynb.

Extract Features

img = preprocess(Image.open("image.jpg")).to(device)

with torch.inference_mode():
    features = model.embed(img, mode="joint")  # returns both cls and patch
    # mode="cls"   β†’ {"cls": tensor [1, D]}
    # mode="patch" β†’ {"patch": tensor [1, N, D]}
    # mode="joint" β†’ {"cls": tensor [1, D], "patch": tensor [1, N, D]}

Returns {"cls": cls_embedding, "patch": patch_embeddings}. Batched:

batch = torch.cat([preprocess(Image.open(p)).to(device) for p in paths], dim=0)

with torch.inference_mode():
    features = model.embed(batch)

Score Similarity

with torch.inference_mode():
    default_distance = model(img_a, img_b)
    cls_distance     = model(img_a, img_b, mode="cls")
    patch_distance   = model(img_a, img_b, mode="patch")
    joint_distance   = model(img_a, img_b, mode="joint")

Training

Full documentation: docs/training.md.

Prepare train and validation parquet files with local image paths in these columns:

ref_path   pos_path   neg_path
python3 -m training.train \
  --config configs/standard_config.yaml \
  "train.parquet_path_train=/path/to/train.parquet" \
  "train.parquet_path_val=/path/to/val.parquet" \
  "train.log_dir=./logs" \
  "job_tag=my_id_sim_run"

Evaluation

Run immediately with no dataset download:

python3 -m evaluation.eval_percep \
  --pretrained \
  --subjects2k_eval \
  --pods_eval \
  --output ./eval_outputs/pretrained

Benchmarks

Hugging Face (instant):

Benchmark What it tests Flag
subjects2k Subject identity verification (same / different pairs) --subjects2k_eval
pods Object-level perceptual similarity ranking --pods_eval

Auto-download (one-time setup, ~10 min first run):

Benchmark What it tests Flag
aerialcattle Cattle re-identification from aerial imagery --aerialcattle_eval

Downloads ~690 MB and extracts 46k images to data/aerialcattle/ on first run. Pass --aerialcattle_dir PATH to use an existing local copy instead.

Local dataset required:

Benchmark What it tests Download Flag
dreambench Text-to-image subject fidelity DreamBench+ --dreambench_eval --dreambench_dir PATH
cute Fine-grained visual similarity (triplet accuracy, easy + hard) CUTE --cute_eval --cute_dir PATH
petface Pet face identity verification (cats + dogs) PetFace --petface_eval --petface_dir PATH
deepfashion2 Clothing instance retrieval DeepFashion2 β€” eval metadata coming soon --deepfashion2_eval --deepfashion2_parquet FILE

Evaluating a Trained Checkpoint

python3 -m evaluation.eval_percep \
  --config configs/standard_config.yaml \
  "job_tag=my_id_sim_run" \
  "eval.eval_checkpoint_selector=last" \
  "eval.output=./eval_outputs/my_id_sim_run"

Checkpoint selectors: last, all, epoch:<N>. Keep machine-specific paths in configs/local/.

Citation

@InProceedings{Chae_2026_CVPR,
    author    = {Chae, Julia and Kolkin, Nicholas and Wang, Jui-Hsien and Zhang, Richard and Beery, Sara and Ham, Cusuh},
    title     = {ID-Sim: An Identity-Focused Similarity Metric},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {11250-11262}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors