Skip to content
/ ICL Public

Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification (CVPR 2025 Pytorch Code)

Notifications You must be signed in to change notification settings

QinYang79/ICL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

PyTorch implementation for Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification (CVPR 2025).

Supplementary Material

New!

  • We will release the RDA pre-trained model weights and training logs. Come soon!

Requirements and Datasets

conda create --name myenv python=3.10
pip install vllm,easydict,ftfy,prettytable,nltk,qwen_vl_utils

THI Framework

The illustration of our Test-time Human-centered Interaction (THI) module. THI includes $K$ rounds of interactions to align query intention with the latent target image by external guidance, where in each round, we perform human-centered visual question answering around fine-grained person attributes to enhance the semantic consistency between the query and the intended person image, and then improve the final ReID performance on the large-scale evaluation through efficient re-ranking. Besides, we perform supervised fine-tuning via LoRA to inspire the discriminative ability of MLLM for ReID domain images and better align queries with latent target images.

Reorganization Data Augmentation

The illustration of our RDA. The purpose of RDA is to supplement more details to the original training texts through human-centered VQA, improving the discriminability of texts. In addition, to enhance diversity, RDA maximizes diversity through the Decomposition-Rewriting-Reorganization strategy.

For all augmentation data, see RDA_data.zip

Training and Evaluation

Training new models via RDA

Modify the self.anno_path in the datasets/cuhkpedes.py,icfgpedes.py, rstpreid.py, ufine.py. You have to replace the path to the json file in RDE_data.

sh run_rde.sh

If you use pre-trained parameters to initialize CLIP you must '+pre' in args.loss_names and modify the relevant path variables in main.

If you use pre-trained parameters to initialize CLIP you must use '+pre' in args.loss_names and modify the relevant the path variable model_pre in main.py. In our experiments, we use pre-trained parameters from MLLM4Text-ReID.

sh run_rde.sh

Evaluation

Modify the sub in the test.py file and run it.

python test.py

Running THI

You first need to download the weights of Qwen2VL-7B-Instruct and use LLaMA-Factory to merge the LoRA parameters. The relevant files are already provided here.

sh run_vllm_ICL.sh

Experiment Results:

Citation

If RDE is useful for your research, you can cite the following papers:

@inproceedings{qin2024noisy,
  title={Noisy-Correspondence Learning for Text-to-Image Person Re-identification},
  author={Qin, Yang and Chen, Yingke and Peng, Dezhong and Peng, Xi and Zhou, Joey Tianyi and Hu, Peng},
  booktitle={IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024},
}
@inproceedings{qin2025human,
  title={Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification},
  author={Qin, Yang and Chen, Chao and Fu, Zhihang and Peng, Dezhong and Peng, Xi and Hu, Peng},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14390--14399},
  year={2025}
}

License

Apache License 2.0

Acknowledgements

The code is based on RDE licensed under Apache 2.0.

About

Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification (CVPR 2025 Pytorch Code)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published