# seq2seq with Fairseq

This notebook uses Fairseq and PyTorch to train a sequence-to-sequence model.

Note you must turn on GPU to use Fairseq!

> *Edit > Notebook settings > Hardware accelerator: GPU*




## Requirements

In [0]:
%cd /content/
!rm -rf fairseq
!git clone https://github.com/deeplanguageclass/fairseq.git
%cd fairseq
!ls
!pip install -r requirements.txt

In [0]:
!python setup.py build
!python setup.py develop

running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/scripts
copying scripts/__init__.py -> build/lib.linux-x86_64-3.6/scripts
copying scripts/average_checkpoints.py -> build/lib.linux-x86_64-3.6/scripts
copying scripts/build_sym_alignment.py -> build/lib.linux-x86_64-3.6/scripts
creating build/lib.linux-x86_64-3.6/fairseq
copying fairseq/progress_bar.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/utils.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/multiprocessing_pdb.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/__init__.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/tokenizer.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/trainer.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/meters.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/bleu.py -> build/lib.linux-x86_64-3.6/fairseq
copying fairseq/fp16_trainer.py -> build/lib.linux-x86_64-3.6/fairseq
copyi

## Data pre-processing

In [0]:
%cd examples/translation/
!bash prepare-translit.sh
%cd ../..

/content/fairseq/examples/translation
Cloning Moses github repository (for tokenization scripts)...
Cloning into 'mosesdecoder'...
remote: Counting objects: 147104, done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 147104 (delta 0), reused 2 (delta 0), pack-reused 147098[K
Receiving objects: 100% (147104/147104), 129.65 MiB | 21.17 MiB/s, done.
Resolving deltas: 100% (113696/113696), done.
Cloning Subword NMT repository (for BPE pre-processing)...
Cloning into 'subword-nmt'...
remote: Counting objects: 455, done.[K
remote: Compressing objects: 100% (26/26), done.[K
remote: Total 455 (delta 18), reused 18 (delta 9), pack-reused 420[K
Receiving objects: 100% (455/455), 204.70 KiB | 12.79 MiB/s, done.
Resolving deltas: 100% (262/262), done.
Downloading data from https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz...
--2018-08-16 11:18:05--  https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz
Resolving wit3.fbk.eu (wit3.fbk.eu)... 217.77.80.8
Connecti

In [0]:
!python preprocess.py --source-lang latn --target-lang armn \
  --trainpref examples/translation/translit.tokenized.latn-armn/train \
  --validpref examples/translation/translit.tokenized.latn-armn/valid \
  --testpref examples/translation/translit.tokenized.latn-armn/test \
  --destdir data-bin/translit.tokenized.latn-armn

## Training

In [14]:
!mkdir -p checkpoints/fconv
!CUDA_VISIBLE_DEVICES=0 python train.py data-bin/translit.tokenized.latn-armn \
  --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 1024 \
  --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --lr-scheduler fixed --force-anneal 200 \
  --arch fconv_translit_latn_armn --save-dir checkpoints/fconv \
  --skip-invalid-size-inputs-valid-test


Namespace(arch='fconv_iwslt_de_en', clip_norm=0.1, criterion='label_smoothed_cross_entropy', data='data-bin/iwslt14.tokenized.de-en', decoder_attention='True', decoder_embed_dim=256, decoder_embed_path=None, decoder_layers='[(256, 3)] * 3', decoder_out_embed_dim=256, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.2, encoder_embed_dim=256, encoder_embed_path=None, encoder_layers='[(256, 3)] * 4', force_anneal=200, fp16=False, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.25], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=1024, max_update=0, min_loss_scale=0.0001, min_lr=1e-05, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, normalization_constant=0.5, 

## Testing

In [0]:
!python generate.py data-bin/translit.tokenized.latn-armn \
  --path checkpoints/fconv/checkpoint_best.pt \
  --batch-size 128 --beam 5 \
  --skip-invalid-size-inputs-valid-test