# TinyTransformer

## Prepare data

使用Transformer架构训练“英语-法语”机器翻译任务，这里使用的数据集和数据预处理方式参考了李沐的[动手学深度学习](https://zh.d2l.ai/chapter_recurrent-modern/machine-translation-and-dataset.html)，可以通过`d2l`库中的`load_data_nmt()`函数下载并处理数据集，数据集会被自动下载至`../data/fra-eng`文件夹下

In [None]:
!pip install d2l==0.17.6

In [2]:
import d2l.torch as d2l

batch_size, num_steps = 64, 10
train_iter, src_vocab, trg_vocab = d2l.load_data_nmt(batch_size, num_steps)
for batch in train_iter:
    X, X_valid_len, Y, Y_valid_len = [x for x in batch]
    print('X:', X.shape)
    print('Y:', Y.shape)
    break

Downloading ../data/fra-eng.zip from http://d2l-data.s3-accelerate.amazonaws.com/fra-eng.zip...
X: torch.Size([64, 10])
Y: torch.Size([64, 10])


## Train Model

In [4]:
from transformer import Transformer, device, train_model

d_model = 512  # Embedding Size
d_ff = 2048  # FeedForward dimension
n_layers = 6  # number of Encoder of Decoder Layer
n_heads = 8  # number of heads in Multi-Head Attention
dropout, batch_size, num_steps = 0.1, 64, 10
model = Transformer(enc_vocab_size=len(src_vocab), 
                        dec_vocab_size=len(trg_vocab), 
                        n_head=n_heads, 
                        d_model=d_model, 
                        ffn_hidden_dim=d_ff, 
                        n_layers=n_layers, trg_vocab_size=len(trg_vocab),
                        dropout=dropout)
model = model.to(device)
model

Transformer(
  (encoder): Encoder(
    (emb): TransformerEmbedding(
      (token_emb): TokenEmbedding(184, 512)
      (pos_emb): PositionalEncoding()
      (dropout): Dropout(p=0.1, inplace=True)
    )
    (layers): ModuleList(
      (0-5): 6 x EncoderLayer(
        (enc_self_attn): MultiHeadAttention(
          (attn): ScaledDotProductAttention(
            (softmax): Softmax(dim=-1)
          )
          (w_q): Linear(in_features=512, out_features=512, bias=True)
          (w_k): Linear(in_features=512, out_features=512, bias=True)
          (w_v): Linear(in_features=512, out_features=512, bias=True)
          (w_concat): Linear(in_features=512, out_features=512, bias=True)
        )
        (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (ffn): PositionwiseFeedForwardNet(
          (linear1): Linear(in_features=512, out_features=2048, bias=True)
          (linear2): Linear(in_features=2048, out_features=512, bi

In [5]:
lr = 1e-4
num_epochs = 50
train_model(model, train_iter, lr, num_epochs=num_epochs, trg_vocab=trg_vocab, device=device)

Train: 100%|██████████| 10/10 [00:00<00:00, 23.99it/s, loss=32.3]


Epoch [1/50] loss: 102.93647689819336
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.10it/s, loss=33.4]


Epoch [2/50] loss: 87.95769119262695
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.69it/s, loss=31.6]


Epoch [3/50] loss: 84.90397911071777
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.34it/s, loss=38.8]


Epoch [4/50] loss: 83.21336097717285
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.09it/s, loss=33] 


Epoch [5/50] loss: 82.23430023193359
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.98it/s, loss=34.1]


Epoch [6/50] loss: 80.63473167419434
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.10it/s, loss=33] 


Epoch [7/50] loss: 79.11391830444336
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.67it/s, loss=33.3]


Epoch [8/50] loss: 76.66787605285644
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.62it/s, loss=33.9]


Epoch [9/50] loss: 74.5035026550293
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.87it/s, loss=29.3]


Epoch [10/50] loss: 73.25728740692139
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.82it/s, loss=29.2]


Epoch [11/50] loss: 70.89768409729004
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.47it/s, loss=28.6]


Epoch [12/50] loss: 68.79645252227783
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.36it/s, loss=28] 


Epoch [13/50] loss: 66.76443634033203
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.92it/s, loss=27.8]


Epoch [14/50] loss: 65.33511428833008
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.32it/s, loss=25.4]


Epoch [15/50] loss: 63.26354446411133
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.02it/s, loss=26.6]


Epoch [16/50] loss: 62.21953430175781
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.93it/s, loss=25.6]


Epoch [17/50] loss: 61.22458839416504
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.80it/s, loss=28.1]


Epoch [18/50] loss: 59.98676109313965
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.88it/s, loss=23.3]


Epoch [19/50] loss: 59.49478130340576
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.74it/s, loss=24.2]


Epoch [20/50] loss: 57.871809005737305
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.47it/s, loss=24.7]


Epoch [21/50] loss: 57.263955688476564
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 31.72it/s, loss=21.4]


Epoch [22/50] loss: 56.74862060546875
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.78it/s, loss=25.2]


Epoch [23/50] loss: 55.72169647216797
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.28it/s, loss=23.4]


Epoch [24/50] loss: 55.308589172363284
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.78it/s, loss=20.9]


Epoch [25/50] loss: 54.592202758789064
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.28it/s, loss=21.2]


Epoch [26/50] loss: 54.09572792053223
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.57it/s, loss=22.2]


Epoch [27/50] loss: 53.99462261199951
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.14it/s, loss=20.9]


Epoch [28/50] loss: 53.21982765197754
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.33it/s, loss=21] 


Epoch [29/50] loss: 53.10233421325684
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.37it/s, loss=22.1]


Epoch [30/50] loss: 52.88190269470215
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.19it/s, loss=19.9]


Epoch [31/50] loss: 52.07657127380371
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 35.32it/s, loss=20.5]


Epoch [32/50] loss: 51.66604385375977
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.19it/s, loss=20.8]


Epoch [33/50] loss: 51.6589391708374
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.58it/s, loss=21.2]


Epoch [34/50] loss: 51.735442352294925
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.78it/s, loss=24.1]


Epoch [35/50] loss: 51.40689315795898
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.31it/s, loss=19.9]


Epoch [36/50] loss: 50.742432022094725
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.57it/s, loss=20] 


Epoch [37/50] loss: 50.43388786315918
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.87it/s, loss=23.4]


Epoch [38/50] loss: 50.08850173950195
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.47it/s, loss=21.2]


Epoch [39/50] loss: 50.25409851074219
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.24it/s, loss=21.6]


Epoch [40/50] loss: 49.420294189453124
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.26it/s, loss=20.5]


Epoch [41/50] loss: 49.7753999710083
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.99it/s, loss=17] 


Epoch [42/50] loss: 49.267804336547854
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.41it/s, loss=21.9]


Epoch [43/50] loss: 49.679437255859376
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.68it/s, loss=21.6]


Epoch [44/50] loss: 49.17785797119141
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.22it/s, loss=21.1]


Epoch [45/50] loss: 49.068215942382814
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.71it/s, loss=21.5]


Epoch [46/50] loss: 48.87303352355957
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 34.02it/s, loss=17.7]


Epoch [47/50] loss: 48.87289867401123
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.28it/s, loss=19.9]


Epoch [48/50] loss: 49.0660816192627
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 33.46it/s, loss=18.9]


Epoch [49/50] loss: 48.72293758392334
Save model checkpoint to: checkpoints/best_model.pth


Train: 100%|██████████| 10/10 [00:00<00:00, 32.50it/s, loss=20.3]


Epoch [50/50] loss: 48.18685722351074
Save model checkpoint to: checkpoints/best_model.pth


## Reference

- https://zh.d2l.ai/chapter_recurrent-modern/machine-translation-and-dataset.html#
- https://github.com/graykode/nlp-tutorial/blob/master/5-1.Transformer/Transformer.py
- https://github.com/datawhalechina/tiny-universe