Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation (VGAMT)

One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as an image. However, recent work in multimodal MT (MMT) has shown that obtaining improvements from images is challenging, limited not only by the difficulty of building effective cross-modal representations but also by the lack of specific evaluation and training data. We present a new MMT approach based on a strong text-only MT model, which uses neural adapters and a novel guided self-attention mechanism and which is jointly trained on both visual masking and MMT. We also release CoMMuTE, a Contrastive Multilingual Multimodal Translation Evaluation dataset, composed of ambiguous sentences and their possible translations, accompanied by disambiguating images corresponding to each translation. Our approach obtains competitive results over strong text-only models on standard English-to-French benchmarks and outperforms these baselines and state-of-the-art MMT systems with a large margin on our contrastive test set.

If you use our codebase, please cite:

@inproceedings{futeral-etal-2023-tackling,
    title = "Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation",
    author = "Futeral, Matthieu  and
      Schmid, Cordelia  and
      Laptev, Ivan  and
      Sagot, Beno{\^\i}t  and
      Bawden, Rachel",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.295",
    doi = "10.18653/v1/2023.acl-long.295",
    pages = "5394--5413"
}

Clone repository with submodules

git clone --recurse-submodules https://github.com/MatthieuFP/VGAMT.git

Data preparation

In this work, we exploit OPUS text-only, Multi30k multilingual text-image and Conceptual Caption English text-image data. To download and extract the features we use in our work, please follow the instructions here.

Training

Create a conda environment from the requirements.txt file. This work was conducted using SLURM job scheduler. Please adapt the scripts to your local configuration.

Install adapter-transformers

cd adapter-transformers
pip install .

For all experiments, please fill in the following variables:

CACHE_HUGGINGFACE
DATA_PATH
DUMP_PATH
FEAT_PATH (if MMT experiment)
EXP_NAME
seed

Text-only Machine Translation model

You need a strong text-only MT model before training VGAMT, please run the following command lines:

source activate vgamt

echo "NODELIST="${SLURM_NODELIST}
echo "JOB_NODELIST="${SLURM_JOB_NODELIST}
master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$master_addr
echo "MASTER_ADDR="$MASTER_ADDR

srun ./scripts/training/train_MT_from_MBART.sh

VGAMT

To train VGAMT from a strong MT model, please inform the additional variables:

DATA_MIX_PATH (if using VMLM objective)
FEAT_PATH_MIX (if using VMLM objective)
MT_MODEL_PATH

source activate vgamt

echo "NODELIST="${SLURM_NODELIST}
echo "JOB_NODELIST="${SLURM_JOB_NODELIST}
master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$master_addr
echo "MASTER_ADDR="$MASTER_ADDR

srun ./scripts/training/finetune_mix_MMT_VMLM_from_MT.sh

Evaluation

BLEU scores:

Please, first inform MODEL_PATH variable and run ./scripts/eval/eval_mmt_bleu.sh to compute BLEU scores and translation generation.

METEOR scores:

Inform METEOR_FILE, REFERENCE_PATH, HYPOTHESIS_PATH and TGT_LANG variables. To install meteor, please have a look here. Then, run ./scripts/eval/eval_meteor.sh

COMET scores:

Inform REFERENCE_SRC_LG, REFERENCE_TGT_LG, HYPOTHESIS_TGT_LG, PATH_TO_COMET_STORAGE. To install comet, please have a look here. In our work, we use the wmt20-comet-da model. Then, run ./scripts/eval/eval_comet.sh

CoMMuTE ranking accuracy:

To compute CoMMuTE accuracy for your model, you can run ./scripts/eval/eval_mmt_commute.sh after having filled in the variables described in VGAMT section.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
adapter-transformers @ 4a9dbbb		adapter-transformers @ 4a9dbbb
data		data
scripts		scripts
src		src
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter-transformers @ 4a9dbbb

adapter-transformers @ 4a9dbbb

data

data

scripts

scripts

src

src

.gitmodules

.gitmodules

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation (VGAMT)

Clone repository with submodules

Data preparation

Training

Text-only Machine Translation model

VGAMT

Evaluation

About

Releases

Packages

Languages

MatthieuFP/VGAMT

Folders and files

Latest commit

History

Repository files navigation

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation (VGAMT)

Clone repository with submodules

Data preparation

Training

Text-only Machine Translation model

VGAMT

Evaluation

About

Resources

Stars

Watchers

Forks

Languages