Skip to content

atreyasha/semantic-isometry-nmt

Repository files navigation

Investigating the isometric behaviour of Neural Machine Translation models on binary semantic equivalence spaces

  1. Overview
  2. Dependencies
  3. Repository initialization
  4. Usage
    1. Training
    2. Translation
    3. Evaluation
    4. Visualization
  5. Development

Overview 📖

Isometry is defined mathematically as a distance-preserving transformation between two metric spaces. A simplified illustration of isometry in higher dimensional functional spaces can be seen below. In this research, we view Neural Machine Translation (NMT) models from the perspective of semantic isometry and assume that well-performing NMT models function approximately isometrically on semantic metric spaces. That is to say, if two sentences are semantically equivalent on the source side, they should remain semantically equivalent after translation on the target side given a well-performing NMT model. We hypothesize that the frequency of such semantically isometric behaviour correlates positively with general model performance.

We conduct our investigation by using two NMT models of varying performance to translate semantically-equivalent German paraphrases, based off diverse WMT19 test data references, to English. We use Facebook's AI Research's (FAIR) WMT19 winning single model from Ng. et al. (2019) as our SOTA model and abbreviate this as FAIR-WMT19. We train a large Transformer model based on the Scaling NMT methodology from Ott et al. (2018) on WMT16 data and utilize this model as our non-SOTA model. We abbreviate this model as STANDARD-WMT16.

We simplify the notion of semantic metric spaces into probabilistic binary semantic equivalence spaces and compute these using three Transformer language models fine-tuned on Google's PAWS-X paraphrase detection task. We adapt our workflow from Google's XTREME benchmark system.

By analyzing the paraphrase detection outputs, we show that the frequency of semantically isometric behaviour indeed correlates positively with general model performance. Our findings have interesting implications for automatic sequence evaluation metrics and vulnerabilities of NMT models towards adversarial paraphrases.

A more detailed description of our methodologies and results can be found in this paper.

Note: fairseq and XTREME are used as third-party extensions in this repository with licensing details found here.

Dependencies :neckbeard:

  1. This repository's code was tested with Python versions 3.7.*. To sync dependencies, we recommend creating a virtual environment and installing the relevant packages via pip:

    pip install -r requirements.txt
  2. In this repository, we use R versions 3.6.* and lualatex for efficient TikZ visualizations. Execute the following within your R console to get the dependencies:

    install.packages(c("ggplot2","optparse","tikzDevice","rjson","ggpointdensity",
                       "fields","gridExtra","devtools","reshape2"))
    devtools::install_github("teunbrand/ggh4x")

Repository initialization 🔥

  1. Initialize the xtreme-pawsx git submodule by running the following command:

    bash scripts/setup_xtreme_pawsx.sh
  2. Manually download preprocessed WMT'16 En-De data provided by Google and place the tarball in the data directory (~480 MB download size).

  3. Manually download the following four pre-trained models and place all of the tarballs in the models directory (~9 GB total download size):

    1. STANDARD-WMT16 for non-SOTA de-en translation. Model achieved BLEU-4 score of 31.0 on the newstest2014 test data set.

    2. mBERTBase for multilingual paraphrase detection. Model fine-tuned on en,de,es,fr,ja,ko,zh languages with macro-F1 score of 0.886.

    3. XLM-RBase for multilingual paraphrase detection. Model fine-tuned on en,de,es,fr,ja,ko,zh languages with macro-F1 score of 0.890.

    4. XLM-RLarge for multilingual paraphrase detection. Model fine-tuned on en,de,es,fr,ja,ko,zh languages with macro-F1 score of 0.906.

  4. Download PAWS-X, WMT19 Legacy German paraphrases and WMT19 AR German paraphrases, as well as prepare the previously downloaded WMT16 data and pre-trained models by running the command below:

    bash scripts/prepare_data_models.sh
  5. Optional: We provide a secondary branch slurm-s3it for executing computationally heavy workflows (eg. training, evaluating) on the s3it server with slurm. To use this branch, simply execute:

    git checkout slurm-s3it
    
  6. Optional: If you want to further develop this repository; you can auto-format shell/R scripts and synchronize python dependencies, the development log and the slurm-s3it branch by initializing our pre-commit and pre-push git hooks:

    bash scripts/setup_git_hooks.sh

Usage 🌀

i. Training

Since we already provide pre-trained models in this repository, we treat model training as an auxiliary procedure. If you would like to indeed train the non-SOTA STANDARD-WMT16 model and fine-tune paraphrase detection models, refer to the instructions in TRAINING.md.

ii. Translation

In order to translate WMT19 Legacy and WMT19 AR German paraphrases to English, utilize our script translate_wmt19_paraphrases_de_en.sh:

Usage: translate_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Translate WMT19 paraphrases using both torch-hub and local models

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding local NMT model checkpoints, defaults to
               "./models/transformer_vaswani_wmt_en_de_big.wmt16.de-en.1594228573/
               checkpoint_best.pt"

This script will generate translations using the SOTA FAIR-WMT19 and the non-SOTA STANDARD-WMT16 models. Translation results will be saved as json files in the predictions directory. To run this script using our defaults, simply execute:

bash scripts/translate_wmt19_paraphrases_de_en.sh 

iii. Evaluation

Commutative BLEU-4 and chrF-2

After translating the WMT19 Legacy and WMT19 AR paraphrases, we can conduct a quick and dirty evaluation of source and target sentences using commutative variants of the BLEU-4 and chrF-2 automatic sequence evaluation metrics, which were initialized with the default settings from sacrebleu. For this, we provide evaluate_bleu_chrf_wmt19_paraphrases_de_en.sh:

Usage: evaluate_bleu_chrf_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Conduct shallow evaluation of WMT19 paraphrases with commutative
BLEU-4 and chrF-2 scores

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding input json translations, defaults to
               "./predictions/*/*.json"

This script will analyze source and target sentences in the aforementioned json files and will append commutative BLEU-4 and chrF-2 scores in-place. To run this script, simply execute:

bash scripts/evaluate_bleu_chrf_wmt19_paraphrases_de_en.sh
Paraphrase detection

Next, we can run our fine-tuned paraphrase detection models on our source and target sentences. For this, we provide evaluate_paraphrase_detection_wmt19_paraphrases_de_en.sh:

Usage: evaluate_paraphrase_detection_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Conduct evaluation of WMT19 paraphrases using pre-trained paraphrase
detection models

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding input json translations, defaults to
               "./predictions/*/*.json"

This script will analyze source and target sentences in the aforementioned json files and will append the paraphrase detection models' softmax scores for the paraphrase (or positive) label in-place. To run this script, simply execute:

bash scripts/evaluate_paraphrase_detection_wmt19_paraphrases_de_en.sh

iv. Visualization

Model evolutions

In order to plot the evolutions of model-related training parameters, we provide visualize_model_evolutions.sh:

Usage: visualize_model_evolutions.sh [-h|--help] [glob]
Visualize model evolutions for translation and paraphrase detection models

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding tensorboard log directories, which will
               be converted to csv's and then plotted. Defaults to
               "./models/*/{train,train_inner,valid}"

This script will aggregate tensorboard event logs into csv files and produce tikz-based plots of model evolutions as pdf files in the img directory. To run this script, simply execute:

bash scripts/visualize_model_evolutions.sh
Commutative chrF-2

In order to visualize the previously processed commutative chrF-2 scores, we provide visualize_chrf_wmt19_paraphrases_de_en.sh:

Usage: visualize_chrf_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Visualize commutative chrF-2 scores of WMT19 paraphrase translations

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding input json translations, defaults to
               "./predictions/*/*.json"

This script will produce a tikz-based plot of the commutative chrF-2 scores and will save it as pdf file in the img directory. To run this script, simply execute:

bash scripts/visualize_chrf_wmt19_paraphrases_de_en.sh
Paraphrase detection

In order to visualize the previously processed paraphrase detection results, we provide visualize_paraphrase_detection_wmt19_paraphrases_de_en.sh:

Usage: visualize_paraphrase_detection_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Visualize paraphrase detection predictions of WMT19 paraphrase translations

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding input json translations, defaults to
               "./predictions/*/*.json"

This script will produce tikz-based plots of the respective paraphrase detection softmax scores and joint model decisions, and will save them as pdf files in the img directory. To run this script, simply execute:

bash scripts/visualize_paraphrase_detection_wmt19_paraphrases_de_en.sh
Correlation between commutative chrF-2 and paraphrase detection predictions

In order to visualize correlations between commutative chrF-2 scores and paraphrase detection predictions, we provide visualize_paraphrase_detection_wmt19_paraphrases_de_en.sh:

Usage: visualize_chrf_paraphrase_detection_wmt19_paraphrases_de_en.sh [-h|--help] [glob]
Visualize commutative chrF-2 and paraphrase detection predictions of WMT19 paraphrase translations

Optional arguments:
  -h, --help   Show this help message and exit
  glob <glob>  Glob for finding input json translations, defaults to
               "./predictions/*/*.json"

This script will produce tikz-based plots of correlations between commutative chrF-2 scores and paraphrase detection predictions and will save them as pdf files in the img directory. To run this script, simply execute:

bash scripts/visualize_chrf_paraphrase_detection_wmt19_paraphrases_de_en.sh

About

Investigating the isometric behaviour of Neural Machine Translation models on binary semantic equivalence spaces

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published