Skip to content

HuiGuanLab/nrccr

 
 

Repository files navigation

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

source code of our paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

image

Table of Contents

Environments

  • CUDA 11.3
  • Python 3.8.5
  • PyTorch 1.10.2

We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install the required packages.

conda create --name nrccr_env python=3.8.5
conda activate nrccr_env
git clone https://github.com/LiJiaBei-7/nrccr.git
cd nrccr
pip install -r requirements.txt
conda deactivate

Required Data

We use three public datasets: VATEX, MSR-VTT-CN, and Multi-30K. The extracted feature is placed in $HOME/VisualSearch/.

For Multi-30K, we have provided translation version (from Google Translate) of Task1 and Task2, respectively. [Task1: Applied to translation tasks. Task2: Applied to captioning tasks.].

In addition, we also provide MSCOCO dataset here, and corresponding performance below. The validation and test set on Japanese from STAIR Captions, and that on Chinese from COCO-CN.

Training set:

​ source(en) + translation(en2xx) + back-translation(en2xx2en)

Validation set and test set:

​ target(xx) + translation(xx2en)

Datasetfeaturecaption
VATEX vatex-i3d.tar.gz, pwd:p3p0 vatex_caption, pwd:oy27
MSR-VTT-CN msrvtt10k-resnext101_resnet152.tar.gz, pwd:p3p0 cn_caption, pwd:oy27
Multi-30K multi30k-resnet152.tar.gz, pwd:5khe multi30k_caption, pwd:oy27
MSCOCO mscoco_caption, pwd:13dc
ROOTPATH=$HOME/VisualSearch
mkdir -p $ROOTPATH && cd $ROOTPATH

Organize these files like this:
# download the data of VATEX[English, Chinese]
VisualSearch/VATEX/
	FeatureData/
		i3d_kinetics/
			feature.bin
			id.txt
			shape.txt
			video2frames.txt
	TextData/
		xx.txt

# download the data of MSR-VTT-CN[English, Chinese]
VisualSearch/msrvttcn/
	FeatureData/
		resnext101-resnet152/
			feature.bin
			id.txt
			shape.txt
			video2frames.txt
	TextData/
		xx.txt

# download the data of Multi-30K[Englich, German, French, Czech]
# For Task2, the training set was translated from Flickr30K, which contains five captions per image, while for task1, each image corresponds to one caption.
# The validation and test set on French and Czech are same in both tasks.
VisualSearch/multi30k/
	FeatureData/
		train_id.txt
		val_id.txt
		test_id_2016.txt

	resnet_152[optional]/
		train-resnet_152-avgpool.npy
		val-resnet_152-avgpool.npy
		test_2016_flickr-resnet_152-avgpool.npy	
	TextData/
		xx.txt	
	flickr30k-images/
		xx.jpg

# download the data of MSCOCO[English, Chinese, Japanese]
VisualSearch/mscoco/
	FeatureData/
		train_id.txt
		ja_val_id.txt
		zh_val_id.txt
		ja_test_id.txt
		zh_test_id.txt
	TextData/
		xx.txt
	all_pics/
		xx.jpg
		
	image_ids.txt

NRCCR on VATEX

Model Training and Evaluation

Run the following script to train and evaluate NRCCR network. Specifically, it will train NRCCR network and select a checkpoint that performs best on the validation set as the final model. Notice that we only save the best-performing checkpoint on the validation set to save disk space.

ROOTPATH=$HOME/VisualSearch

conda activate nrccr_env

# To train the model on the MSR-VTT, which the feature is resnext-101_resnet152-13k 
# Template:
./do_all_vatex.sh $ROOTPATH <gpu-id>

# Example:
# Train NRCCR 
./do_all_vatex.sh $ROOTPATH 0

<gpu-id> is the index of the GPU where we train on.

Evaluation using Provided Checkpoints

Download trained checkpoint on VATEX from Baidu pan (url, pwd:ise6) and run the following script to evaluate it.

ROOTPATH=$HOME/VisualSearch/

tar zxf $ROOTPATH/<best_model>.pth.tar -C $ROOTPATH

./do_test_vatex.sh $ROOTPATH $MODELDIR <gpu-id>
# $MODELDIR is the path of checkpoints, $ROOTPATH/.../runs_0

Expected Performance

TypeText-to-Video RetrievalVideo-to-Text Retrieval SumR
R@1 R@5 R@10 MedR mAP R@1 R@5 R@10 MedR mAP
en2cn30.864.474.63.045.78 43.172.381.42.032.57366.5

NRCCR on MSR-VTT-CN

Model Training and Evaluation

Run the following script to train and evaluate NRCCR network on MSR-VTT-CN.

ROOTPATH=$HOME/VisualSearch

conda activate nrccr_env

# To train the model on the VATEX
./do_all_msrvttcn.sh $ROOTPATH <gpu-id>

Evaluation using Provided Checkpoints

Download trained checkpoint on MSR-VTT-CN from Baidu pan (url, pwd:ise6) and run the following script to evaluate it.

ROOTPATH=$HOME/VisualSearch/

tar zxf $ROOTPATH/<best_model>.pth.tar -C $ROOTPATH

./do_test_msrvttcn.sh $ROOTPATH $MODELDIR <gpu-id>
# $MODELDIR is the path of checkpoints, $ROOTPATH/.../runs_0

Expected Performance

TypeText-to-Video RetrievalVideo-to-Text Retrieval SumR
R@1 R@5 R@10 MedR mAP R@1 R@5 R@10 MedR mAP
en2cn28.956.3 67.34.041.28 28.957.669.04.042.02308

NRCCR on Multi-30K

Model Training and Evaluation

Run the following script to train and evaluate NRCCR network on Multi-30K. Besides, if you want use the clip as the backbone to train, you need to download the raw images from here for Flickr30K.

ROOTPATH=$HOME/VisualSearch

conda activate nrccr_env

# To train the model on the Multi-30K
./do_all_multi30k.sh $ROOTPATH <task> <gpu-id>

Evaluation using Provided Checkpoints

Download trained checkpoint on Multi-30K from Baidu pan (url, pwd:ise6) and run the following script to evaluate it.

ROOTPATH=$HOME/VisualSearch/

tar zxf $ROOTPATH/<best_model>.pth.tar -C $ROOTPATH

./do_test_multi30k.sh $ROOTPATH $MODELDIR $image_path <gpu-id>
# $MODELDIR is the path of checkpoints, $ROOTPATH/.../runs_0
# $image_path is the path of the raw images for Flickr30K, if you use the frozen resnet-152, just set the None.

Expected Performance

Task1:

TypeText-to-Video RetrievalVideo-to-Text Retrieval SumR
R@1 R@5 R@10 MedR mAP R@1 R@5 R@10 MedR mAP
en2de_clip53.881.888.31.066.60 53.882.790.31.066.66450.7
en2fr_clip54.781.789.21.067.05 54.982.789.71.067.29452.9
en2cs_clip52.679.487.91.065.26 52.378.787.81.064.68438.7
en2cs_resnet15229.556.068.14.041.8927.555.167.44.040.59303.6

Task2 :

(with clip)

en2de_SumR en2fr_SumR en2cs_SumR
480.9 482.1 467.1

NRCCR on MSCOCO

Model Training and Evaluation

Run the following script to train and evaluate NRCCR network on MSCOCO.

ROOTPATH=$HOME/VisualSearch

conda activate nrccr_env

# To train the model on the Multi-30K
./do_all_mscoco.sh $ROOTPATH <gpu-id>

Expected Performance

(with clip)

en2cn_SumR en2ja_SumR
512.4 507.0

Reference

If you find the package useful, please consider citing our paper:

@inproceedings{wang2022cross,
  title={Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning},
  author={Yabing Wang and Jianfeng Dong and Tianxiang Liang and Minsong Zhang and Rui Cai and Xun Wang},
  journal={In Proceedings of the 30th ACM international conference on Multimedia},
  year={2022}
}

About

Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.0%
  • Shell 5.0%