Skip to content

Implementaion of our ACL Findings 2023 paper "BIGVIDEO: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation"

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

This repository is for the accepted ACL2023 Findings paper "BIGVIDEO: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation".

Our dependency

  • PyTorch version == 1.10.0
  • timm version == 0.4.12
  • vizseq version == 0.1.15
  • nltk verison == 3.6.4
  • sacrebleu version == 1.5.1
  • Please check fairseq_mmt/sh/requirements.txt for more details

Install fairseq

cd fairseq_mmt
pip install --editable ./

BigVideo Dataset

Dataset are available at here.

Please email us ( to explain your identity and purpose before requesting access.

!Directly requesting will not be approved.! Please make sure that all data are used for research only.

#  structure 
├─ text_data   # original text data and our preprocessed text data
   ├─ test.relate_score  # 1=ambiguous set  0=unambiguous set
   ├─ test.anno.combine  # our annotated ambiguous terms
   ├─            # refer to corresponding videos
├─ fairseq_bins # our preprocessed fairseq-bin
├─ video_features # our extracted video features
   ├─ VIT
   ├─ slowfast
├─ raw_videos 

Feature Extraction

An example of how to extract VIT features can be seen under fairseq_mmt/scrpits/video_extractor/vit and how to extract frames.tsv can be found in VideoSwin. You can also follow Hero_extractor for more types of video features.

Train and Test


To train our model with contrastive learning objective, following arguments are required:

--arch video_fushion_encoder_revise_one_merge_before_pewln \
--criterion cross_modal_criterion_with_ctr_revise   \
--contrastive-strategy mean+mlp  \
--contrastive-weight ${contrastive_weight}   \
--contrastive-temperature ${contrastive_temperature}  \
--video-feat-path $video_feat_path \
--video-ids-path $video_ids_path \
--video-feat-dim $video_feat_dim \
--video-feat-type $video_feat_type \
--max-vid-len 12  --train-sampling-strategy uniform   \
--video-dropout 0.0  

Please check fairseq_mmt/sh/ for more details.


fairseq-generate  $test_DATA  \
--path $checkpoint_dir/$checkpoint \
--remove-bpe \
--gen-subset $who \
--beam 4  \
--batch-size  128  \
--lenpen 1.0  \
--video-feat-path $video_feat_path \
--video-ids-path $video_ids_path \
--video-feat-dim $video_feat_dim \
--video-feat-type $video_feat_type \
--max-vid-len $max_vid_len   \
--task raw_video_translation_from_np   | tee $local_output_dir/text.$checkpoint.$length_penalty.gen-$who.log

grep ^S $local_output_dir/text.$checkpoint.$length_penalty.gen-$who.log | cut -d - -f 2- | sort -n -k 1 | cut -f 2- > $local_output_dir/text.$checkpoint.$length_penalty.$who.src
grep ^H $local_output_dir/text.$checkpoint.$length_penalty.gen-$who.log | cut -d - -f 2- | sort -n -k 1 | cut -f 3- > $local_output_dir/text.$checkpoint.$length_penalty.$who.hypo
grep ^T $local_output_dir/text.$checkpoint.$length_penalty.gen-$who.log | cut -d - -f 2- | sort -n -k 1 | cut -f 2- > $local_output_dir/text.$checkpoint.$length_penalty.$who.tgt


Quality-targeted metrics

To evaluate the generated output, we first need to detokenize the src, hypo, and target

perl $detokenizer  -l en < $local_output_dir/text.$checkpoint.$length_penalty.$who.src > $local_output_dir/text.$checkpoint.$length_penalty.$who.src.dtk
python3 /root/fairseq_mmt/scripts/ --input $local_output_dir/text.$checkpoint.$length_penalty.$who.hypo --output $local_output_dir/text.$checkpoint.$length_penalty.$who.hypo.dtk  # Chinese deokenize
python3 /root/fairseq_mmt/scripts/ --input $local_output_dir/text.$checkpoint.$length_penalty.$who.tgt --output $local_output_dir/text.$checkpoint.$length_penalty.$who.tgt.dtk    # Chinese deokenize

We evaluate the ouput with SacreBLEU, COMET, and BLEURT.
Please check fairseq_mmt/sh/ for the whole pipeline.

Terminology-targetd metrics

We adapt the code from mahfuzibnalam for terminology-targeted evaluation. You can directly get results like this:

bash /PATHTO/terminology_evaluation/ $local_output_dir/text.$checkpoint.$length_penalty.$who.hypo 

An example of the whole inference and evaluation pipeline can be found in fairseq_mmt/sh/


Implementaion of our ACL Findings 2023 paper "BIGVIDEO: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation"






No releases published


No packages published