Skip to content

Neural Machine Translation (NMT) with pivot and triangulation approaches

License

Notifications You must be signed in to change notification settings

QuanHNguyen232/Machine-Translation-EAAI24

Repository files navigation

Machine-Translation-EAAI24

7/13/2023 new note:

  • Training notebooks can be found in those emails:
    • quan.nh2002
    • abca05786
    • hoang.nh0615
    • quan.nguyen232.work

Discussion:

  • Pivot model has en-fr BLEU 33 (much lower than training separately) because we did not assign the correct weight for the loss (hypothesis). Thus we can try dynamic ensemble loss in the future

NEW TASKS:

  • Seq2Seq: sort by src_len and unsort output --> ensure output matches with trg
  • Pivot model: ensure it works for $n$ seq2seq models
  • Trian model: ensure outputs from all submodels match w/ target sent

Table of content

  1. Setup environment (go-there)
  2. Config (go-there)
  3. Best models (go-there)
  4. Things in common (go-there)

Setup environment

  • Create conda env with Python 3.10.12
  • Install torch 2.0.1: pip3 install torch==2.0.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
  • Install all packages in requirements.txt (torch 2.0.1 and torchtext 0.6.0)
  • Install spacy language packages in setup_env.py

Config

  • Vocab size

    Build on first 64000 sentences of data EnDeFrItEsPtRo-76k-most5k.pkl with min_freq=2:

    • en: 6964
    • fr: 9703
    • es: 10461
    • it: 10712
    • pt: 10721
    • ro: 11989
  • Model config
    • Embed_Dim = 256
    • Hidden_Dim = 512
    • Dropout = 0.5

Best models:

  • Seq2Seq: seq2seq-EnFr-1.pt
  • Pivot:
    • es: piv-EnEsFr.pt
    • it: piv-EnItFr.pt
    • pt: piv-EnPtFr.pt
    • ro: piv-EnRoFr.pt
  • Triang: (combination of trained Seq2Seq & Pivot)

Things in common:

  • Main pipeline: bentrevett_pytorch_seq2seq.ipynb

  • Datasets: EnDeFrItEsPtRo-76k-most5k.pkl

  • Data info: train_len, valid_len, test_len = 64000, 3200, 6400

  • Init weights (all models)
    def init_weights(m):
        for name, param in m.named_parameters():
            if 'weight' in name:
                nn.init.normal_(param.data, mean=0, std=0.01)
            else:
                nn.init.constant_(param.data, 0)    
    model.apply(init_weights);
  • Load model weights
    checkpoint = torch.load('path_to_model/model_name.pt')
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
  • Learning rate
    • Seq2Seq: start w/ $0.0012$, reduced by $\frac{2}{3}$ every epoch
    • Pivot: start w/ $0.0012$, reduced by $\frac{2}{3}$ at epoch 3rd, 6th, 8th, 9th, 10th.
  • Epoch
    • Seq2Seq: 7
    • Pivot: 11
Future work improvements for dataloader(after having results)

Transformer