Julia Chae
Project Page | Paper | Bibtex
ID-Sim is a feed-forward similarity metric designed to capture fine-grained identity similarity β the kind humans rely on when distinguishing highly similar subjects across diverse viewpoints and contexts. Its key contribution is a training recipe combining diverse real instance-level data with generative augmentation for context diversity and hard negatives, enabling selective sensitivity: invariance to contextual changes while remaining sensitive to fine-grained identity differences.
- [Jun. 2026] Code with pretrained checkpoint, training, and evaluation released.
- [May. 2026] ID-Sim accepted to CVPR 2026 and released on arXiv.
Coming soon:
- Training dataset curation and generative augmentation pipelines
- Selective sensitivity analysis tools
git clone https://github.com/JuliaChae/id_sim.git
cd id_simFor inference only:
pip install -e .For training and evaluation:
pip install -e ".[research]"Repository structure:
id_sim/: installable package (from id_sim import id_sim)training/andevaluation/: source-checkout entrypointsconfigs/standard_config.yaml: canonical training and evaluation configexamples/inference_pair.py: command-line pair-scoring exampledocs/training.md: training documentation
Only id_sim/ is installed by pip install -e .; training/, evaluation/, dataset/, and util/ are source-checkout only.
id_sim_type |
Backbone | Backbone weights |
|---|---|---|
dinov3_vitl16_cls_patch (default) |
DINOv3 ViT-L/16 | Manual download required (see below) |
dinov3_vitb16_cls_patch |
DINOv3 ViT-B/16 | Manual download required (see below) |
dinov2_vitl14_cls_patch |
DINOv2 ViT-L/14 | Auto-downloaded from Meta |
dinov2_vitb14_cls_patch |
DINOv2 ViT-B/14 | Auto-downloaded from Meta |
DINOv3 ViT-L/16 is the default and recommended backbone β it achieves the best identity similarity performance. DINOv2 variants require no manual setup and are useful for quick experimentation.
Meta gates the DINOv3 weights; ID-Sim does not redistribute them. Request and download the checkpoint(s) here, then place them under your cache directory:
mkdir -p models/id_sim_checkpoint/checkpoints
# Required for dinov3_vitl16_cls_patch (default)
mv dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth \
models/id_sim_checkpoint/checkpoints/
# Required for dinov3_vitb16_cls_patch
mv dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth \
models/id_sim_checkpoint/checkpoints/<cache_dir> defaults to ./models/id_sim_checkpoint; override with the cache_dir= argument to id_sim(...). The DINOv3 model code is fetched automatically via torch.hub β no manual clone needed.
DINOv2 models (
dinov2_vitl14_cls_patch,dinov2_vitb14_cls_patch) auto-download their backbone weights and require no manual setup.
On first call, ID-Sim downloads its adapter and MLP heads from Hugging Face into <cache_dir>. This happens automatically for all id_sim_type values.
import torch
from PIL import Image
from id_sim import id_sim
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = id_sim(pretrained=True, device=device)
# Loads dinov3_vitl16_cls_patch by default.
# For other backbones: id_sim(id_sim_type="dinov2_vitl14_cls_patch", ...)
img_a = preprocess(Image.open("image_a.jpg")).to(device)
img_b = preprocess(Image.open("image_b.jpg")).to(device)
with torch.inference_mode():
distance = model(img_a, img_b, mode="cls")
print(float(distance.squeeze().cpu()))Lower distance = more similar. Default scoring uses the cls mode.
A command-line example is available at examples/inference_pair.py:
python examples/inference_pair.py image_a.jpg image_b.jpg --device cudaAn interactive walkthrough is available at examples/quickstart.ipynb.
img = preprocess(Image.open("image.jpg")).to(device)
with torch.inference_mode():
features = model.embed(img, mode="joint") # returns both cls and patch
# mode="cls" β {"cls": tensor [1, D]}
# mode="patch" β {"patch": tensor [1, N, D]}
# mode="joint" β {"cls": tensor [1, D], "patch": tensor [1, N, D]}Returns {"cls": cls_embedding, "patch": patch_embeddings}. Batched:
batch = torch.cat([preprocess(Image.open(p)).to(device) for p in paths], dim=0)
with torch.inference_mode():
features = model.embed(batch)with torch.inference_mode():
default_distance = model(img_a, img_b)
cls_distance = model(img_a, img_b, mode="cls")
patch_distance = model(img_a, img_b, mode="patch")
joint_distance = model(img_a, img_b, mode="joint")Full documentation: docs/training.md.
Prepare train and validation parquet files with local image paths in these columns:
ref_path pos_path neg_path
python3 -m training.train \
--config configs/standard_config.yaml \
"train.parquet_path_train=/path/to/train.parquet" \
"train.parquet_path_val=/path/to/val.parquet" \
"train.log_dir=./logs" \
"job_tag=my_id_sim_run"Run immediately with no dataset download:
python3 -m evaluation.eval_percep \
--pretrained \
--subjects2k_eval \
--pods_eval \
--output ./eval_outputs/pretrainedHugging Face (instant):
| Benchmark | What it tests | Flag |
|---|---|---|
subjects2k |
Subject identity verification (same / different pairs) | --subjects2k_eval |
pods |
Object-level perceptual similarity ranking | --pods_eval |
Auto-download (one-time setup, ~10 min first run):
| Benchmark | What it tests | Flag |
|---|---|---|
aerialcattle |
Cattle re-identification from aerial imagery | --aerialcattle_eval |
Downloads ~690 MB and extracts 46k images to data/aerialcattle/ on first run. Pass --aerialcattle_dir PATH to use an existing local copy instead.
Local dataset required:
| Benchmark | What it tests | Download | Flag |
|---|---|---|---|
dreambench |
Text-to-image subject fidelity | DreamBench+ | --dreambench_eval --dreambench_dir PATH |
cute |
Fine-grained visual similarity (triplet accuracy, easy + hard) | CUTE | --cute_eval --cute_dir PATH |
petface |
Pet face identity verification (cats + dogs) | PetFace | --petface_eval --petface_dir PATH |
deepfashion2 |
Clothing instance retrieval | DeepFashion2 β eval metadata coming soon | --deepfashion2_eval --deepfashion2_parquet FILE |
python3 -m evaluation.eval_percep \
--config configs/standard_config.yaml \
"job_tag=my_id_sim_run" \
"eval.eval_checkpoint_selector=last" \
"eval.output=./eval_outputs/my_id_sim_run"Checkpoint selectors: last, all, epoch:<N>. Keep machine-specific paths in configs/local/.
@InProceedings{Chae_2026_CVPR,
author = {Chae, Julia and Kolkin, Nicholas and Wang, Jui-Hsien and Zhang, Richard and Beery, Sara and Ham, Cusuh},
title = {ID-Sim: An Identity-Focused Similarity Metric},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {11250-11262}
}