Branch: master
Find file History

Transformer in DGL

In this example we implement the Transformer and Universal Transformer with ACT in DGL.

The folder contains training module and inferencing module (beam decoder) for Transformer and training module for Universal Transformer


  • PyTorch 0.4.1+
  • networkx
  • tqdm
  • requests


  • For training:

    python [--gpus id1,id2,...] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--universal]
  • For evaluating BLEU score on test set(by enabling --print to see translated text):

    python [--gpu id] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--checkpoint CHECKPOINT] [--print] [--universal]

Available datasets: copy, sort, wmt14, multi30k(default).

Test Results


  • Multi30k: we achieve BLEU score 35.41 with default setting on Multi30k dataset, without using pre-trained embeddings. (if we set the number of layers to 2, the BLEU score could reach 36.45).
  • WMT14: work in progress

Universal Transformer

  • work in progress


  • Currently we do not support Multi-GPU training(this will be fixed soon), you should only specify only one gpu_id when running the training script.