Machine translation with PyTorch

Introduction

The attention mechanism is a popular easy-to-implement model architecture designed to perform well on different NLP tasks, including machine translation. Empowered with pretrained SpaCy word model this approach makes a strong baseline with comparable to state-of-the-art perfomance. In this repo the translator from German to English is trained and demonstrated. In case you need other languages support, thanks to SpaCy team there are plenty of available language models to train the exact same model on. Pretrained German-English model is already available after setup.

Major dependencies

Python 3.8
PyTorch 1.5.0
SpaCy 2.2.4

Setup

Use automatic setup script:

bash setup.sh

(If encounter a problem with Windows line endings, run sed -i 's/\r//g' setup.sh first)

or complete the installation manually in four steps (which is also easy):

Install the required python packages:

python -m pip install -r requirements.txt

Download and install pretrained Spacy language models:

sudo python -m spacy download en
sudo python -m spacy download de_core_news_sm

Clone submodules

git submodule init
git submodule update

Download a pretrained model and please kindly put it in checkpoints folder.

Dataset

Use Multi30k translation dataset available from PyTorch - a small dataset from 2016 year challenge. The training is done on de-en part of it. The dataset statistics is the following:

train:
 (en) 29000 sentences, 377534 words, 13.0 words/sent
 (de) 29000 sentences, 360706 words, 12.4 words/sent
 val:
 (en) 1014 sentences, 13308 words, 13.1 words/sent
 (de) 1014 sentences, 12828 words, 12.7 words/sent
 test:
 (en) 1000 sentences, 11376 words, 11.4 words/sent
 (de) 1000 sentences, 10758 words, 10.8 words/sent

Example pair of sentences:

 ein mädchen in einem karateanzug bricht ein brett mit einem tritt .
 a girl in karate uniform breaking a stick with a front kick .

Train model

You can simply start training the model with this terminal command:

python train.py

Default arguments are set to optimal: see Hyperparameter tuning section. However, you are encouraged to make your own experiments.

This script will be saving models in ./checkpoints/ and writing logs in ./logs/ folders. Best pretrained model is already available after setup at ./checkpoints/en_de_final.pt

Hyperparameter tuning

Several experiments on model hyperparameters were held. The training curves may be found on tensorboard dev.

We acquired the following table:

Experiment id	hidden_size	pf_dim	n_heads	n_layers	Bleu score
1	256	512	8	3	0.3390
2	128	512	8	3	0.3507
3	64	512	8	3	0.3353
4	128	1024	8	3	0.3582
5	256	2048	8	3	0.3385
6	128	1024	4	3	0.3557
7	128	1024	16	3	0.3464
8	128	1024	8	4	0.3494
9	128	1024	8	2	0.3460

Results&Demo

The model is capable of producing decent results on samples from test set, achieving 0.3582 Bleu score on val and 0.3347 on test sets (with experiment id 4 config) of de-en part of Multi30k dataset, which indicates a nice level of perfomance.

Run and see how it works:

python demo.py

Some sample results:

Input: eine straße neben einem interessanten ort mit vielen säulen .
GT translation: a road next to an interesting place with lots of pillars .
Model output: a street next to a plaza with many interesting pillars .

Input: ein skateboarder in einem schwarzen t-shirt und jeans fährt durch die stadt .
GT translation:  a skateboarder in a black t - shirt and jeans skating threw the city .
Model output:  a skateboarder in a black t - shirt and jeans is riding through the city .

Source

Tutorial with awesome model architectures:

https://github.com/bentrevett/pytorch-seq2seq

Paper with the description of attention model:

https://arxiv.org/abs/1706.03762

Also useful tutorials:

Nice short book to understand NLP basics (awful for production and demo, however):

https://github.com/joosthub/PyTorchNLPBook

Tutorials from good PyTorch folks, also nice and simple to get started:

https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
attention_is_all_you_need_pytorch @ 76762bb		attention_is_all_you_need_pytorch @ 76762bb
checkpoints		checkpoints
.gitmodules		.gitmodules
README.md		README.md
data.py		data.py
demo.py		demo.py
eval.py		eval.py
gotta_torch.png		gotta_torch.png
models.py		models.py
requirements.txt		requirements.txt
setup.sh		setup.sh
train.py		train.py
utils.py		utils.py

Andrey885/Machine_translation_PyTorch

Folders and files

Latest commit

History

Repository files navigation

Machine translation with PyTorch

Introduction

Major dependencies

Contents

Setup

Dataset

Train model

Hyperparameter tuning

Results&Demo

Source

About

Topics

Resources

Stars

Watchers

Forks

Languages