Branch: master
Find file History

README.md

Transformer in DGL

In this example we implement the Transformer and Universal Transformer with ACT in DGL.

The folder contains training module and inferencing module (beam decoder) for Transformer and training module for Universal Transformer

Dependencies

  • PyTorch 0.4.1+
  • networkx
  • tqdm
  • requests

Usage

  • For training:

    python translation_train.py [--gpus id1,id2,...] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--universal]
    
  • For evaluating BLEU score on test set(by enabling --print to see translated text):

    python translation_test.py [--gpu id] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--checkpoint CHECKPOINT] [--print] [--universal]
    

Available datasets: copy, sort, wmt14, multi30k(default).

Test Results

Transformer

  • Multi30k: we achieve BLEU score 35.41 with default setting on Multi30k dataset, without using pre-trained embeddings. (if we set the number of layers to 2, the BLEU score could reach 36.45).
  • WMT14: work in progress

Universal Transformer

  • work in progress

Notes

  • Currently we do not support Multi-GPU training(this will be fixed soon), you should only specify only one gpu_id when running the training script.

Reference