Exploring Model Consensus to Generate Translation Paraphrases

Training recipes and scripts for system paper "Exploring Model Consensus to Generate Translation Paraphrases" (In WNGT workshop at ACL20) in duolingo STAPLE shared task.

Setting up

You can download the out-of-domain data and required tools by running

bash prepare_data.sh

If you want to use the in-domain STAPLE dataset, you should download from Dataverse, and extract the training data to data/raw/

We trained the model using a modified version of fairseq. You may have to compile it by running

cd tools/fairseq
pip install --editable .

Preprocessing

The preprocessing procedures include

punctuation normalization, removing non-printable characters
tokenization
BPE (shared vocab with size of 40k for both En and Pt)
parallel data filtering

All data preprocessing are in data/preprocess_data.sh

Training

Scripts for pre-training with out-of-domain data and fine-tuning with in-domain data are in the recipes/ directory. You can also evaluate the weighted marco F1 score with the scripts.

Citation

If you use this work, please cite it as

@inproceedings{li-etal-2020-exploring,
  Author    = {Zhenhao Li, and Marina Fomicheva, and Lucia Specia},
  Title     = {Exploring Model Consensus to Generate Translation Paraphrases},
  Booktitle = {Proceedings of the ACL Workshop on Neural Generation and Translation (WNGT)},
  Publisher = {ACL},
  Year      = {2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Model Consensus to Generate Translation Paraphrases

Setting up

Preprocessing

Training

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
recipes		recipes
tools		tools
README.md		README.md
prepare_data.sh		prepare_data.sh

Nickeilf/STAPLE20

Folders and files

Latest commit

History

Repository files navigation

Exploring Model Consensus to Generate Translation Paraphrases

Setting up

Preprocessing

Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages