Yichen Lu, Siwei Nie, Minlong Lu, Xudong Yang, Xiaobo Zhang, Peng Zhang
[Paper] [Slide] [Poster] [Youtube] [Citation]
Image Copy Detection (ICD) aims to identify manipulated content between image pairs through robust feature representation learning. While self-supervised learning (SSL) has advanced ICD systems, existing view-level contrastive methods struggle with sophisticated edits due to insufficient fine-grained correspondence learning. We address this limitation by exploiting the inherent geometric traceability in edited content through two key innovations. First, we propose PixTrace - a pixel coordinate tracking module that maintains explicit spatial mappings across editing transformations. Second, we introduce CopyNCE, a geometrically-guided contrastive loss that regularizes patch affinity using overlap ratios derived from PixTrace's verified mappings. Our method bridges pixel-level traceability with patch-level similarity learning, suppressing supervision noise in SSL training. Extensive experiments demonstrate not only state-of-the-art performance (88.7% $\mu$AP / 83.9% RP90 for matcher, 72.6% $\mu$AP / 68.4% RP90 for descriptor on DISC21 dataset) but also better interpretability over existing methods.
# configs
core/configs/descriptor_default_config.yaml - default config file of descriptor
core/configs/matching_default_config.yaml - default config file of matcher
# PixTrace-related
core/data/augmentation/mask_mapper.py - implementation of PixTrace, including pixel tracking (copy vs original and copy vs copy) and reverse operation
core/data/augmentation - implementation of various augmentations
# CopyNCE-related
core/loss/copynce_loss.py - implementation of CopyNCE loss
# eval-related entry
scripts/eval
| Model Type | Arch | Resolution | Fine-tuned | uAP | RP90 | Checkpoint |
|---|---|---|---|---|---|---|
| Descriptor | ViT-S | 224x224 | ❌ | 70.5 | 63.6 | download |
| Matcher | ViT-S | 224x224 | ❌ | 83.5 | 75.4 | download |
We sincerely apologize for the omission of model weights with other configurations. Due to our company's extremely complex open-source release process for code and model weights, we are currently only able to publicly release the two base checkpoints shared during the paper review phase. To mitigate this limitation, we have included detailed scripts of training, fine-tuning, and evaluation for models with other configurations.
The results of CopyNCE reported in the paper were all obtained with the environment of Python 3.8 and PyTorch 2.1. To ensure reproducibility, we recommend following the script below to set up your training and inference environment.
pip - Build the environment with requirements.txt
pip install -r requirements.txtconda - Build and manage the environment in Anaconda / Miniconda with the provided environment definition.
conda env create -f conda.yaml
conda activate copynceDISC21 is introduced as the dataset of Image Similarity Challenge at NeurIPS’21 and has gained its popularity in ICD. And main experiments in our paper were conducted on DISC21. To reproduce the results in our paper, please make sure the folder structure of this dataset is as follows:
CopyNCE
|--datasets
| |--DISC21
| |--query_images
| | |--Q00000.jpg
| | |--... (100k images in total)
| |--reference_images
| | |--R000000.jpg
| | |--... (1M images in total)
| |--training_images
| |--T000000.jpg
| |--... (1M images in total)
|--assets
|--core
|--... (other folders)
Download the descriptor and matcher models in Models section, and move them into weights
folder as follows:
CopyNCE
|--weights
| |--descriptor_vits_224.pth.tar
| |--matcher_vits_224.pth.tar
|--assets
|--core
|--... (other folders)
If you expect to undergo the training or fine-tuning process and reproduce the results in the paper, you should follow the steps to generate extra weights necessary in training.
Pretrain weights from DINO Download the DINO pretrained ViT-S and ViT-B model weights.
# pwd = /path/to/project/root
# please make sure that 'weights' folder has been made in the root path of this project
# download ViT-S
wget -O weights/dino_deitsmall16_pretrain.pth https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth
# download ViT-B
wget -O weights/dino_vitbase16_pretrain.pth https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pthModify DINO pretrained weights
# pwd = /path/to/project/root
# modify ViT-S weights for descriptor to load
python scripts/run/modify_dino_weights.py --layers 12 --encoder-layers 12 --weights-path weights/dino_deitsmall16_pretrain.pth --output-path weights/dino_vits16_enc-12.pth
# modify ViT-B weights for descriptor to load
python scripts/run/modify_dino_weights.py --layers 12 --encoder-layers 12 --weights-path weights/dino_vitbase16_pretrain.pth --output-path weights/dino_vitb16_enc-12.pth
# modify ViT-S weights for matcher to load
python scripts/run/modify_dino_weights.py --layers 12 --encoder-layers 8 --weights-path weights/dino_deitsmall16_pretrain.pth --output-path weights/dino_vits16_enc-8_fus-4.pth
# interpolate ViT-S position embedding for matcher to train on 336x336
python scripts/run/modify_dino_weights.py --layers 12 --encoder-layers 8 --weights-path weights/dino_deitsmall16_pretrain.pth --output-path weights/dino_vits16_enc-8_fus-4.pthGenerate k-NN matrix
# extract dino features on ISC training set
bash scripts/run/extract_dino_features.sh isc dev/vits_lin.yaml dino_vits_isc_train.pth
# generate intra k-NN matrix of training set
python scripts/run/build_knn.py \
--query-feature-path outputs/dino/eval/dino_vits_isc_train.pth \
--reference-feature-path outputs/dino/eval/dino_vits_isc_train.pth \
--output-path weights/dino_vits_isc_knn.pth \
--k 128
# (optional) generate k-NN matrix between dev set I and reference set for fine-tuning
# extract dino features on ISC dev set I
bash scripts/run/extract_dino_features.sh isc_query_val dev/vits_lin.yaml dino_vits_isc_val.pth
# extract dino features on ISC reference set
bash scripts/run/extract_dino_features.sh isc_reference dev/vits_lin.yaml dino_vits_isc_reference.pth
# generate k-NN matrix between dev set I and reference set
python scripts/run/build_knn.py \
--query-feature-path outputs/dino/eval/dino_vits_isc_val.pth \
--reference-feature-path outputs/dino/eval/dino_vits_isc_reference.pth \
--output-path weights/dino_vits_isc_val-to-ref_knn.pth \
--k 128Before launching training process, please make sure that following weights generated in Section Preparation are ready:
- DINO pretrained weights:
weights/dino_vits16_enc-12.pthorweights/dino_vitb16_enc-12.pthorweights/dino_vits16_enc-8_fus-4.pthorweights/dino_vits16_enc-8_fus-4.pth(depends on which model trained under which settings). - k-NN matrix for global hard negative mining:
weights/dino_vits_isc_knn.pth
# pwd = /path/to/project/root
# train ViT-S descriptor from DINO pretrained weights on 224x224 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_des.sh copynce_descriptor weights/dino_vits16_enc-12.pth copynce.yaml dev/vits_lin.yaml
# train ViT-B descriptor from DINO pretrained weights on 224x224 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_des.sh copynce_descriptor_vitb weights/dino_vitb16_enc-12.pth copynce_vitb.yaml dev/vitb_lin.yaml
# train ViT-S matcher from DINO pretrained weights on 224x224 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_cls.sh copynce_matcher weights/dino_vits16_enc-8_fus-4.pth.tar copynce.yaml dev/isc_copynce-cand.yaml
# train ViT-S matcher with lower learning rate on 224x224 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_cls.sh copynce_matcher_lr-2e-4 copynce_matcher copynce_lr-2e-4.yaml dev/isc_copynce-cand.yaml
# train ViT-S matcher from DINO pretrained weights on 336x336 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_cls.sh copynce_matcher_336 weights/dino_vits16_enc-8_fus-4_pos-emb-21x21.pth.tar copynce_336.yaml dev/isc_copynce-cand_336.yaml
# train ViT-S matcher with lower learning rate on 336x336 resolution and evaluate on DISC21 dev set II
bash scripts/train/train_cls.sh copynce_matcher_336_lr-2e-4 copynce_matcher_336_lr-2e-4 copynce_336_lr-2e-4.yaml dev/isc_copynce-cand_336.yamlNOTE THAT:
- All training code defaults to using 8xA100 GPUs. If you need to adjust the number of GPUs,
you must manually modify all
nproc_per_nodeparameters in the training scripts. If you wish to adjust the training batch size, you should either manually add the following arguments in the training config file or directly modify thebatch_size_per_gpuparameter intrainsection of the default config files (core/configs/descriptor_default_config.yamlorcore/configs/matching_default_config.yaml).
train:
batch_size_per_gpu: xxx- To ensure optimal training performance, we recommend training the model using the original configuration.
The testing entry files are scripts/eval/eval_des.sh for descriptor and
scripts/eval/eval_des.sh for matcher. You can use the following commands
to get the usage instructions of scripts.
# pwd = /path/to/project/root
# descriptor
bash scripts/eval/eval_des.sh --help
# matcher
bash scripts/eval/eval_cls.sh --helpNOTE THAT:
- To evaluate the matcher performance, the candidate pairs generated
by the descriptor are necessary. We provide the JSON file
data/isc_copynce_matching_dev_set_pairs.jsonfor the sake of convenience. This file is generated by CopyNCE descriptors. And you can refer to Section A.8 in our supplementary material for more details. - All evaluation code defaults to using 8xA100 GPUs. If you need to adjust the number of GPUs,
you must manually modify all
nproc_per_nodeparameters in the evaluation scripts. If you wish to adjust the evaluation batch size, you should either manually add the following arguments in the evaluation config file or directly modify thebatch_size_per_gpuparameter inevalsection of the default config files (core/configs/descriptor_default_config.yamlorcore/configs/matching_default_config.yaml).
eval:
batch_size_per_gpu: <expected_batch_size>Under vanilla settings
# pwd = /path/to/project/root
# evaluate the descriptor under ViT-S and 224 $\times$ 224 settings on dev set
bash scripts/eval/eval_des.sh weights/descriptor_vits_224.pth.tar dev/vits_lin.yaml
# evaluate the descriptor under ViT-B and 224 $\times$ 224 settings on dev set
bash scripts/eval/eval_des.sh /path/to/model/weights dev/vitb_lin.yamlUnder score normalization settings
# extract DISC21 training set as the auxiliary set in score normalization
# this command will generate train_feat_ep-30.pth under outputs/descriptor_vits_224/eval/vits_lin
bash scripts/run/extract_features.sh weights/descriptor_vits_224.pth.tar isc dev/vits_lin.yaml train_feat_ep-30.pth
# execute score normalization with query, reference and auxiliary (training set) features
# args for score normalization are alpha=1 start-index=1 (INCLUSIVE) end-index=5 (EXCLUSIVE)
# actually, many args for score normalization yield good performance
python scripts/run/score_normalization.py \
--query-feature-path outputs/descriptor_vits_224/eval/vits_lin/query_feat_ep-30.pth \
--reference-feature-path outputs/descriptor_vits_224/eval/vits_lin/reference_feat_ep-30.pth \
--auxiliary-feature-path outputs/descriptor_vits_224/eval/vits_lin/train_feat_ep-30.pth \
--output-path outputs/descriptor_vits_224/eval/vits_lin/normalized_scores.json \
--alpha 1. --start 1 --end 5
# calculate the mAP, uAP and RP90
python -m core.run.eval.measure_cls \
--config-file core/configs/eval/descriptor/vits_lin.yaml \
--result-file outputs/descriptor_vits_224/eval/vits_lin/normalized_scores.json \
--output-dir outputs/descriptor_vits_224/eval/vits_linUnder vanilla settings
# pwd = /path/to/project/root
# evaluate the matcher under ViT-S and 224 $\times$ 224 settings on dev set
bash scripts/eval/eval_cls.sh weights/matcher_vits_224.pth.tar dev/isc_copynce-cand.yaml
# evaluate the matcher under ViT-S and 336 $\times$ 336 settings on dev set
bash scripts/eval/eval_cls.sh /path/to/model/weights dev/isc_copynce-cand_336.yamlUnder local crops ensembling settings
# pwd = /path/to/project/root
# evaluate the matcher under ViT-S, 224 $\times$ 224 and LCE settings on dev set
bash scripts/eval/eval_cls_local_verification.sh weights/matcher_vits_224.pth.tar dev/isc_copynce-cand_local-verification.yaml
# evaluate the matcher under ViT-S, 336 $\times$ 336 and LCE settings on dev set
bash scripts/eval/eval_cls_local_verification.sh /path/to/model/weights dev/isc_copynce-cand_local-verification_336.yamlBecause we employs complex data augmentation during training, although the model possesses the capability to detect copy pattern, it is not well aligned with real-world coppy cases. This often leads to odd recall results, for example, straightforward cases of direct image copying may not be recalled or recall strange false positives. Additionally, compared to the training time, the time required for fine-tuning is very short. Therefore, fine-tuning based on the trained model is highly necessary before applying the model in real-world scenarios.
Before launching fine-tuning process, please make sure that following weights generated in Section Preparation are ready:
- Our provided checkpoint or your reproduced checkpoint:
weights/descriptor_vits_224.pth.tarorweights/matcher_vits_224.pth.tar(if you want to fine-tune from our provided checkpoint),outputs/<EXP_NAME>/train/checkpoint.pth.tar(if you want to fine-tune from your reproduced checkpoint). - k-NN matrix for global hard negative mining:
weights/dino_vits_isc_val-to-ref_knn.pth
Table: Performance of descriptor and matcher on dev set II after fine-tuning on dev set I
| Model Type | Arch | Resolution | Fine-tuned | uAP | RP90 |
|---|---|---|---|---|---|
| Descriptor | ViT-S | 224x224 | ✅ | 76.7 | 67.6 |
| Matcher | ViT-S | 224x224 | ✅ | 88.0 | 82.8 |
# fine-tuning descriptor from provided checkpoint
bash scripts/train/train_des.sh finetune_descriptor weights/descriptor_vits_224.pth.tar finetune.yaml dev/vits_lin.yaml
# or fine-tuning descriptor from your reproduced checkpoint
bash scripts/train/train_des.sh finetune_descriptor <YOUR_EXP_NAME> finetune.yaml dev/vits_lin.yaml
# evaluate the fine-tuned descriptor on dev set II
bash scripts/eval/eval_des.sh finetune_descriptor dev/vits_lin.yaml
# fine-tuning match from provided checkpoint
bash scripts/train/train_cls.sh finetune_matcher weights/matcher_vits_224.pth.tar finetune.yaml dev/isc_copynce-cand.yaml
# or fine-tuning match from your reproduced checkpoint
bash scripts/train/train_cls.sh finetune_matcher <YOUR_EXP_NAME> finetune.yaml dev/isc_copynce-cand.yaml
# evaluate the fine-tuned match on dev set II
bash scripts/eval/eval_cls.sh finetune_matcher dev/isc_copynce-cand.yamlCopyNCE code and model weights are released under the Apache License 2.0. See LICENSE for additional details.
@inproceedings{lu2025tracing,
title={Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection},
author={Lu, Yichen and Nie, Siwei and Lu, Minlong and Yang, Xudong and Zhang, Xiaobo and Zhang, Peng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={19248--19257},
year={2025}
}

