PyTorch implementation for Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification (CVPR 2025).
- We will release the RDA pre-trained model weights and training logs. Come soon!
- LLaMA-Factory
- vllm
- qwen_vl_utils
- CUHK-PEDES, ICFG-PEDES, RSTPReid, UFine6926
- Others as RDE
conda create --name myenv python=3.10
pip install vllm,easydict,ftfy,prettytable,nltk,qwen_vl_utils
The illustration of our Test-time Human-centered Interaction (THI) module. THI includes 
The illustration of our RDA. The purpose of RDA is to supplement more details to the original training texts through human-centered VQA, improving the discriminability of texts. In addition, to enhance diversity, RDA maximizes diversity through the Decomposition-Rewriting-Reorganization strategy.
For all augmentation data, see RDA_data.zip
Modify the self.anno_path in the datasets/cuhkpedes.py,icfgpedes.py, rstpreid.py, ufine.py. You have to replace the path to the json file in RDE_data.
sh run_rde.sh
If you use pre-trained parameters to initialize CLIP you must '+pre' in args.loss_names and modify the relevant path variables in main.
If you use pre-trained parameters to initialize CLIP you must use '+pre' in args.loss_names and modify the relevant the path variable model_pre in main.py. In our experiments, we use pre-trained parameters from MLLM4Text-ReID.
sh run_rde.sh
Modify the sub in the test.py file and run it.
python test.py
You first need to download the weights of Qwen2VL-7B-Instruct and use LLaMA-Factory to merge the LoRA parameters. The relevant files are already provided here.
sh run_vllm_ICL.sh
If RDE is useful for your research, you can cite the following papers:
@inproceedings{qin2024noisy,
title={Noisy-Correspondence Learning for Text-to-Image Person Re-identification},
author={Qin, Yang and Chen, Yingke and Peng, Dezhong and Peng, Xi and Zhou, Joey Tianyi and Hu, Peng},
booktitle={IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}
@inproceedings{qin2025human,
title={Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification},
author={Qin, Yang and Chen, Chao and Fu, Zhihang and Peng, Dezhong and Peng, Xi and Hu, Peng},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={14390--14399},
year={2025}
}
The code is based on RDE licensed under Apache 2.0.

