Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 

Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation (Xinyi Wang, Graham Neubig, 2019)

This page includes usage of the multilingual data filtering method from the paper Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation.

Example usage

The directory test_data/ contains the toy training data from three languages: aze, bel, tur. To calculate the language distance of aze from each of the three languages {aze,bel,rus}:

python get_language_distance.py \
    --language-names "aze,bel,tur" \
    --base-language-name "aze" \
    --file-pattern "test_data/train.CODE-eng.CODE" \
    --file-pattern-src "test_data/train.CODE-eng.CODE" \
    --file-pattern-trg "test_data/train.CODE-eng.eng" 

To filter training data for using the three languages to improve the performance of aze:

python get_language_distance.py \
    --language-names "aze,bel,tur" \
    --base-language-name "aze" \
    --file-pattern "test_data/train.CODE-eng.CODE" \
    --filter-data \
    --file-pattern-src "test_data/train.CODE-eng.CODE" \
    --file-pattern-trg "test_data/train.CODE-eng.eng" 

About

Code for ACL 2019 paper "Target Conditioned Sampling: Optimizing Data Usage for Multilingual Neural Machine Translation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published