Skip to content

Commit

Permalink
update document
Browse files Browse the repository at this point in the history
  • Loading branch information
cindyxinyiwang committed Feb 14, 2019
1 parent 0e0f209 commit f8c3692
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 5 deletions.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Multilingual Neural Machine Translation with Soft Decoupled Encoding
This is the code we used in our paper
>[Multilingual Neural Machine Translation with Soft Decoupled Encoding](https://arxiv.org/pdf/1902.03499.pdf)
>Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig

## Requirements

Python 3.6, PyTorch 0.4.1


All the scripts for experiments in the paper can be created from the templates under scripts/template/

## Data Processing

The data we use is [multilingual TED corpus](https://github.com/neulab/word-embeddings-for-nmt) by Qi et al.

We provide preprocessed version of the data, which you can get from:
If you are interested int the details of data processing, you can take a look at the script ``make-eng.sh`` and ``make-data.sh``.

## Training:
The template name for the following methods are:
1. SDE: bi-semb-bq-o32000
2. subword: bi-sw-32000
2. subword-joint: bi-sw-joint-32000
3. word: bi-w-64000

To make the main experiment scripts for alll 4 languages tested in the paper, simply call
``bash make-cfg.sh``

## Decoding:
To make decode scripts, simply use the file make-trans.py. Change the name of the directory where the experiment outputs are stored if you modify the template scripts during training. Otherwise it should just work by calling:
``python make-trans.py``

8 changes: 4 additions & 4 deletions make-cfg.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@

TEMP_DIR=scripts/template/
# change random seed and directory name as desired
CFG_DIR=cfg_s3/
VERSION=s3
SEED=3
CFG_DIR=cfg_s0/
VERSION=s0
SEED=0

mkdir -p scripts/"$CFG_DIR"
# low-resource language codes
Expand All @@ -23,7 +23,7 @@ for i in ${!ILS[*]}; do
IL=${ILS[$i]}
RL=${RLS[$i]}
echo $IL
for f in $TEMP_DIR/bi-w-16000 $TEMP_DIR/bi-sw-joint-16000 $TEMP_DIR/bi-sw-16000 $TEMP_DIR/bi-semb-bq-o16000 ; do
for f in $TEMP_DIR/bi-w-64000 $TEMP_DIR/bi-sw-joint-32000 $TEMP_DIR/bi-sw-32000 $TEMP_DIR/bi-semb-bq-o32000 ; do
sed "s/VERSION/$VERSION/g; s/SEED/$SEED/g; s/IL/$IL/g; s/RL/$RL/g" < $f > ${f/template/"$CFG_DIR"/}_$IL$RL.sh
chmod u+x ${f/template/"$CFG_DIR"/}_$IL$RL.sh
done
Expand Down
2 changes: 1 addition & 1 deletion make-trans.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# make sure the version matches the configuration
# for example, if in make-cfg.sh, you put s3 as version, there should be an output folder named as outputs_s3/
version_list = ["s3"]
version_list = ["s0"]
temp_dir = { "w": "scripts/template/trans_w", "sw-joint": "scripts/template/trans_sw-joint", "sw": "scripts/template/trans_sw"}

for version in version_list:
Expand Down

0 comments on commit f8c3692

Please sign in to comment.