# 作業 : 實作英文-德文翻譯機器人
***
## [作業目標]

用 PyTorch 實作一個英文-德文翻譯機器人

## [作業目標]

*   語言資料處理
*   使用 LSTM 建構 Encoder: EncoderLSTM
*   使用 LSTM 建構 Decoder: DecoderLSTM
*   搭建 Sequence to Sequence 模型: Seq2Seq
*   撰寫訓練函式
*   撰寫測試函式

## [問題]

在 Colab 上實際執行完這個範例後，請改用 BiLSTM 來建構 Encoder


## 引用需要的模組

In [1]:
import spacy
import torch
import random
import numpy as np
import pandas as pd
from pprint import pprint

from torch import nn, optim
from torchtext.data import Field, BucketIterator
from torchtext.datasets import Multi30k
from torchtext.data.metrics import bleu_score
from torch.utils.tensorboard import SummaryWriter

## 下載 spacy 英文語料

In [2]:
!python -m spacy download en
spacy_en = spacy.load('en')

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.6/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


## 下載 spacy 德文語料

In [3]:
!python -m spacy download de
spacy_de = spacy.load('de')

Collecting de_core_news_sm==2.2.5
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.2.5/de_core_news_sm-2.2.5.tar.gz (14.9MB)
[K     |████████████████████████████████| 14.9MB 796kB/s 
Building wheels for collected packages: de-core-news-sm
  Building wheel for de-core-news-sm (setup.py) ... [?25l[?25hdone
  Created wheel for de-core-news-sm: filename=de_core_news_sm-2.2.5-cp36-none-any.whl size=14907057 sha256=1e1820ccbf3022139e95cc08543df338fbcfa93c8fbe965197bb26a1774e6725
  Stored in directory: /tmp/pip-ephem-wheel-cache-u_lg9hvg/wheels/ba/3f/ed/d4aa8e45e7191b7f32db4bfad565e7da1edbf05c916ca7a1ca
Successfully built de-core-news-sm
Installing collected packages: de-core-news-sm
Successfully installed de-core-news-sm-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('de_core_news_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/de_core_news_sm -->
/usr/local/

In [4]:
def tokenize_en(text):
    return [token.text for token in spacy_en.tokenizer(text)]

def tokenize_de(text):
    return [token.text for token in spacy_de.tokenizer(text)]

# Sample Run
sample_text = 'I love machine learning'
print(tokenize_en(sample_text))

english = Field(tokenize=tokenize_en, lower=True,
               init_token='<sos>', eos_token='<eos>')
german = Field(tokenize=tokenize_de, lower=True,
               init_token='<sos>', eos_token='<eos>')

train_data, valid_data, test_data = Multi30k.splits(exts=('.en', '.de'),
                                                    fields=(english, german))
english.build_vocab(train_data, max_size=15000, min_freq=2)
german.build_vocab(train_data, max_size=15000, min_freq=2)

print(f"Unique tokens in source (english) vocabulary: {len(english.vocab)}")
print(f"Unique tokens in target (german) vocabulary: {len(german.vocab)}")

['I', 'love', 'machine', 'learning']
downloading training.tar.gz


training.tar.gz: 100%|██████████| 1.21M/1.21M [00:02<00:00, 517kB/s]


downloading validation.tar.gz


validation.tar.gz: 100%|██████████| 46.3k/46.3k [00:00<00:00, 170kB/s]


downloading mmt_task1_test2016.tar.gz


mmt_task1_test2016.tar.gz: 100%|██████████| 66.2k/66.2k [00:00<00:00, 158kB/s]


Unique tokens in source (english) vocabulary: 5893
Unique tokens in target (german) vocabulary: 7855


In [5]:
print(f"Number of training examples: {len(train_data.examples)}")
print(f"Number of validation examples: {len(valid_data.examples)}")
print(f"Number of testing examples: {len(test_data.examples)}")

print(train_data[5].__dict__.keys())
pprint(train_data[5].__dict__.values())

Number of training examples: 29000
Number of validation examples: 1014
Number of testing examples: 1000
dict_keys(['src', 'trg'])
dict_values([['a', 'man', 'in', 'green', 'holds', 'a', 'guitar', 'while', 'the', 'other', 'man', 'observes', 'his', 'shirt', '.'], ['ein', 'mann', 'in', 'grün', 'hält', 'eine', 'gitarre', ',', 'während', 'der', 'andere', 'mann', 'sein', 'hemd', 'ansieht', '.']])


In [6]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
BATCH_SIZE = 32

train_iterator, valid_iterator, test_iterator = BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size=BATCH_SIZE, 
    sort_within_batch=True,
    sort_key=lambda x: len(x.src),
    device=device
)

In [7]:
len_eng_examples = []
len_ger_examples = []
for idx, data in enumerate(train_data):
    len_eng_examples.append(len(data.src))
    len_ger_examples.append(len(data.trg))
    if idx < 10:
        print('German - ', *data.src, ' Length - ', len(data.src))
        print('English - ', *data.trg, ' Length - ', len(data.trg))
        print()

print(f"Maximum Length of English sentence {max(len_eng_examples)} and German sentence {max(len_ger_examples)} in the dataset")
print(f"Minimum Length of English sentence {min(len_eng_examples)} and German sentence {min(len_ger_examples)} in the dataset")

German -  two young , white males are outside near many bushes .  Length -  11
English -  zwei junge weiße männer sind im freien in der nähe vieler büsche .  Length -  13

German -  several men in hard hats are operating a giant pulley system .  Length -  12
English -  mehrere männer mit schutzhelmen bedienen ein antriebsradsystem .  Length -  8

German -  a little girl climbing into a wooden playhouse .  Length -  9
English -  ein kleines mädchen klettert in ein spielhaus aus holz .  Length -  10

German -  a man in a blue shirt is standing on a ladder cleaning a window .  Length -  15
English -  ein mann in einem blauen hemd steht auf einer leiter und putzt ein fenster .  Length -  15

German -  two men are at the stove preparing food .  Length -  9
English -  zwei männer stehen am herd und bereiten essen zu .  Length -  10

German -  a man in green holds a guitar while the other man observes his shirt .  Length -  15
English -  ein mann in grün hält eine gitarre , während der andere

In [8]:
data = next(iter(train_iterator))
print('Shapes', data.src.shape, data.trg.shape)
print()
print('English - ',*data.src, ' Length - ', len(data.src))
print()
print('German - ',*data.trg, ' Length - ', len(data.trg))
temp_eng = data.src
temp_ger = data.trg

Shapes torch.Size([17, 32]) torch.Size([20, 32])

English -  tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2], device='cuda:0') tensor([  21,    4,    4,    4,    4,   16,   16,    4,    4,    4,    4, 1317,
           4,    4,    4,    4,    7,    4,    4,   16,    4,    9,    4,    4,
           4,    4,  209,    4,    4,    4,    4,    4], device='cuda:0') tensor([ 233,  682,    9,    9,   33,   30,   24,   14,   59,   34,   14,    9,
         386,    9,    9,   33, 1096,  429,   14,  360,  153,    6,   38,  120,
          14,   31,   10,   33,    9,   34,   24,  174], device='cuda:0') tensor([253,   9,  22, 195,  45,  15, 104,   6,  38, 137,   6,   6,  42,  13,
          6,  22,  11,  10,  22,  17,  89, 148,  12, 386,  36,  42,   4,  22,
          6,   6,  55, 351], device='cuda:0') tensor([ 275,   13,   29,    4,    4,  405,   17,    4,   12,   13,    4,  255,
         210,    4,    4,    4,  494,   78,    4,    6,    

In [9]:
temp_eng_idx = temp_eng.cpu().detach().numpy()
temp_ger_idx = temp_ger.cpu().detach().numpy()

In [10]:
df_eng_idx = pd.DataFrame(data=temp_eng_idx,
                          columns=[str('S_') + str(x + 1) for x in range(BATCH_SIZE)])
df_eng_idx.index.name = 'Time Steps'
df_eng_idx.index = df_eng_idx.index + 1 
df_eng_idx

Unnamed: 0_level_0,S_1,S_2,S_3,S_4,S_5,S_6,S_7,S_8,S_9,S_10,S_11,S_12,S_13,S_14,S_15,S_16,S_17,S_18,S_19,S_20,S_21,S_22,S_23,S_24,S_25,S_26,S_27,S_28,S_29,S_30,S_31,S_32
Time Steps,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1
1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
2,21,4,4,4,4,16,16,4,4,4,4,1317,4,4,4,4,7,4,4,16,4,9,4,4,4,4,209,4,4,4,4,4
3,233,682,9,9,33,30,24,14,59,34,14,9,386,9,9,33,1096,429,14,360,153,6,38,120,14,31,10,33,9,34,24,174
4,253,9,22,195,45,15,104,6,38,137,6,6,42,13,6,22,11,10,22,17,89,148,12,386,36,42,4,22,6,6,55,351
5,275,13,29,4,4,405,17,4,12,13,4,255,210,4,4,4,494,78,4,6,8,11,1073,42,49,210,70,306,4,4,13,65
6,1459,146,147,1070,919,6,78,61,617,1384,163,25,9,25,26,950,17,4,1278,4,7,208,172,210,4,14,0,194,61,31,95,4
7,523,10,10,10,0,29,811,193,1078,364,61,23,10,441,208,40,165,1843,10,1120,156,42,75,120,99,6,13,147,81,81,8,289
8,91,466,1183,119,223,15,2262,11,17,8,193,11,92,10,42,23,4,198,41,1683,12,23,7,13,15,4,29,1185,10,1002,27,632
9,13,69,76,6,10,100,58,25,2125,7,652,147,28,22,23,13,663,1263,40,11,4,246,5035,4,6,26,2370,44,32,47,158,518
10,27,4,18,4,133,8,82,501,40,84,18,10,8,21,11,122,11,2429,7,48,425,18,6,4673,43,193,11,845,11,8,636,15


In [11]:
idx2word = {idx: word for idx, word in enumerate(english.vocab.itos)}
df_eng_word = pd.DataFrame(columns=[str('S_') + str(x + 1) for x in range(BATCH_SIZE)])
df_eng_word = df_eng_idx.replace(idx2word)
df_eng_word

Unnamed: 0_level_0,S_1,S_2,S_3,S_4,S_5,S_6,S_7,S_8,S_9,S_10,S_11,S_12,S_13,S_14,S_15,S_16,S_17,S_18,S_19,S_20,S_21,S_22,S_23,S_24,S_25,S_26,S_27,S_28,S_29,S_30,S_31,S_32
Time Steps,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1
1,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>,<sos>
2,an,a,a,a,a,two,two,a,a,a,a,crouching,a,a,a,a,the,a,a,two,a,man,a,a,a,a,this,a,a,a,a,a
3,elderly,bald,man,man,girl,men,young,woman,large,boy,woman,man,short,man,man,girl,riders,cowboy,woman,guys,baby,in,group,lady,woman,red,is,girl,man,boy,young,guy
4,outdoor,man,wearing,using,holding,",",girls,in,group,plays,in,in,-,with,in,wearing,and,is,wearing,are,stands,shorts,of,short,standing,-,a,wearing,in,in,child,drinking
5,market,with,blue,a,a,both,are,a,of,with,a,all,haired,a,a,a,horses,riding,a,in,on,and,bikers,-,by,haired,small,colored,a,a,with,from
6,vegetable,glasses,pants,wheelchair,stuffed,in,riding,brown,competing,outstretched,long,white,man,white,black,button,are,a,headscarf,a,the,t,head,haired,a,woman,<unk>,striped,brown,red,snow,a
7,vendor,is,is,is,<unk>,blue,beige,coat,cyclists,arms,brown,shirt,is,beard,t,down,taking,wild,is,garbage,side,-,out,lady,bike,in,with,pants,jacket,jacket,on,big
8,sits,cutting,bending,talking,toy,",",camels,and,are,on,coat,and,jumping,is,-,shirt,a,horse,walking,bin,of,shirt,the,with,",",a,blue,brushing,is,pouring,his,silver
9,with,into,over,in,is,stand,as,white,cycling,the,tries,pants,while,wearing,shirt,with,break,but,down,and,a,about,gates,a,in,black,skies,her,sitting,water,face,cup
10,his,a,to,a,smiling,on,another,scarf,down,sidewalk,to,is,on,an,and,blond,and,barely,the,three,couch,to,in,blank,front,coat,and,teeth,and,on,gets,","


## 用 LSTM 搭建的 Encoder 類別: EncoderLSTM



In [12]:
class EncoderLSTM(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size, num_layers, drop_rate):
        super(EncoderLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.embedding = nn.Embedding(input_size, embedding_size)
        self.LSTM = nn.LSTM(embedding_size, hidden_size, num_layers, 
                            bidirectional=True, dropout=drop_rate)
        self.dropout = nn.Dropout(drop_rate)

    def forward(self, x):
        embedding = self.dropout(self.embedding(x))
        outputs, (hidden_state, cell_state) = self.LSTM(embedding)
        hidden_state = hidden_state.view(self.num_layers, -1, self.hidden_size * 2)
        cell_state = cell_state.view(self.num_layers, -1, self.hidden_size * 2)

        return hidden_state, cell_state

input_size_encoder = len(english.vocab)
encoder_embedding_size = 300
hidden_size = 256
num_layers = 2
encoder_dropout = 0.5

encoder_lstm = EncoderLSTM(input_size_encoder, encoder_embedding_size,
                           hidden_size, num_layers, encoder_dropout).to(device)
print(encoder_lstm)

EncoderLSTM(
  (embedding): Embedding(5893, 300)
  (LSTM): LSTM(300, 256, num_layers=2, dropout=0.5, bidirectional=True)
  (dropout): Dropout(p=0.5, inplace=False)
)


## 用 LSTM 搭建的 decoder 類別: DecoderLSTM


In [13]:
class DecoderLSTM(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size,
                 num_layers, output_size, drop_rate):
        super(DecoderLSTM, self).__init__()
        self.embedding = nn.Embedding(input_size, embedding_size)
        self.LSTM = nn.LSTM(embedding_size, hidden_size,
                            num_layers, dropout=drop_rate)
        self.fc = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(drop_rate)

    def forward(self, x, hidden_state, cell_state):
        x = x.unsqueeze(0)
        embedding = self.dropout(self.embedding(x))
        outputs, (hidden_state, cell_state) = self.LSTM(embedding, (hidden_state, cell_state))
        predictions = self.fc(outputs)
        predictions = predictions.squeeze(0)

        return predictions, hidden_state, cell_state

input_size_decoder = len(german.vocab)
decoder_embedding_size = 300
hidden_size = 512
num_layers = 2
decoder_dropout = 0.5
output_size = len(german.vocab)

decoder_lstm = DecoderLSTM(input_size_decoder, decoder_embedding_size,
                           hidden_size, num_layers, output_size, decoder_dropout).to(device)
print(decoder_lstm)

DecoderLSTM(
  (embedding): Embedding(7855, 300)
  (LSTM): LSTM(300, 512, num_layers=2, dropout=0.5)
  (fc): Linear(in_features=512, out_features=7855, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)


In [14]:
batch = next(iter(train_iterator))
print(batch.src.shape)
print(batch.trg.shape)

x = batch.trg[1]
print(x)

torch.Size([16, 32])
torch.Size([20, 32])
tensor([   5,    5,    5,    0,   18,    5,   18,   18,    8, 1056,    5,    5,
           8,   43,    8,    5,   43,    5,    8,   43,   18,    5,    8,    5,
           5,    5,    5,    7,    5,    5,    5,   18], device='cuda:0')


# Sequence to Sequence 類別

In [15]:
class Seq2Seq(nn.Module):
    def __init__(self, Encoder_LSTM, Decoder_LSTM):
        super(Seq2Seq, self).__init__()
        self.Encoder_LSTM = Encoder_LSTM
        self.Decoder_LSTM = Decoder_LSTM

    def forward(self, source, target, tfr=0.5):
        batch_size = source.shape[1]
        target_len = target.shape[0]
        target_vocab_size = len(german.vocab)
        outputs = torch.zeros(target_len, batch_size, target_vocab_size).to(device)

        hidden_state, cell_state = self.Encoder_LSTM(source)

        x = target[0] # Trigger token <SOS>
        for i in range(1, target_len):
            output, hidden_state, cell_state = self.Decoder_LSTM(x, hidden_state, cell_state)
            outputs[i] = output
            best_guess = output.argmax(1) # 0th dimension is batch size, 1st dimension is word embedding
            # Either pass the next word correctly from the dataset or use the earlier predicted word
            x = target[i] if random.random() < tfr else best_guess

        return outputs

In [16]:
# Hyperparameters
learning_rate = 0.001
step = 0
writer = SummaryWriter(f"runs/loss_plot")

model = Seq2Seq(encoder_lstm, decoder_lstm).to(device)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

pad_idx = english.vocab.stoi['<pad>']
criterion = nn.CrossEntropyLoss(ignore_index=pad_idx)

In [17]:
model

Seq2Seq(
  (Encoder_LSTM): EncoderLSTM(
    (embedding): Embedding(5893, 300)
    (LSTM): LSTM(300, 256, num_layers=2, dropout=0.5, bidirectional=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
  (Decoder_LSTM): DecoderLSTM(
    (embedding): Embedding(7855, 300)
    (LSTM): LSTM(300, 512, num_layers=2, dropout=0.5)
    (fc): Linear(in_features=512, out_features=7855, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
)

In [18]:
def translate(model, sentence, english, german, device, max_length=50):
    if type(sentence) == str:
        tokens = tokenize_de(sentence)
    else:
        tokens = [token.lower() for token in sentence]
    tokens.insert(0, english.init_token)
    tokens.append(english.eos_token)
    text_to_indices = [english.vocab.stoi[token] for token in tokens]
    sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1).to(device)

    # Build encoder hidden, cell state
    with torch.no_grad():
        hidden, cell = model.Encoder_LSTM(sentence_tensor)

    outputs = [german.vocab.stoi['<sos>']]

    for _ in range(max_length):
        previous_word = torch.LongTensor([outputs[-1]]).to(device)

        with torch.no_grad():
            output, hidden, cell = model.Decoder_LSTM(previous_word, hidden, cell)
            best_guess = output.argmax(1).item()

        outputs.append(best_guess)

        # Model predicts it's the end of the sentence
        if output.argmax(1).item() == german.vocab.stoi['<eos>']:
            break

    translated_sentence = [german.vocab.itos[idx] for idx in outputs]
    
    return translated_sentence[1:]

In [19]:
def checkpoint_and_save(model, best_loss, epoch, optimizer, epoch_loss):
    print('saving')
    print()
    state = {'model': model, 'best_loss': best_loss, 'epoch': epoch,
             'rng_state': torch.get_rng_state(), 'optimizer': optimizer.state_dict()}
    torch.save(state, './checkpoint-NMT')
    torch.save(model.state_dict(), './checkpoint-NMT-SD')

In [20]:
# 用來評估模型的函式: bleu
def bleu(data, model, english, german, device):
    targets, outputs = [], []
    for example in data:
        src = vars(example)["src"]
        trg = vars(example)["trg"]

        prediction = translate(model, src, english, german, device)
        prediction = prediction[:-1]  # remove <eos> token

        targets.append([trg])
        outputs.append(prediction)

    return bleu_score(outputs, targets)

In [21]:
epoch_loss = 0.0
num_epochs = 30
best_loss = 999999
best_epoch = -1
sentence = 'a man in a blue shirt is standing on a ladder and cleaning a window'
ts = []

for epoch in range(num_epochs):
    print(f"Epoch - {epoch + 1} / {num_epochs}")
    model.eval()
    translated_sentence = translate(model, sentence, english, german, device, max_length=50)
    print(f"Translated example sentence: \n {translated_sentence}")
    ts.append(translated_sentence)

    model.train()
    for batch_idx, batch in enumerate(train_iterator):
        input = batch.src.to(device)
        target = batch.trg.to(device)

        # Pass the input and target for model's forward method
        output = model(input, target)
        output = output[1:].view(-1, output.shape[2])
        target = target[1:].view(-1)

        # Clear the accumulating gradients
        optimizer.zero_grad()

        # Calculate the loss value for every epoch
        loss = criterion(output, target)

        # Calculate the gradients for weights & biases using back-propagation
        loss.backward()

        # Clip the gradient value is it exceeds > 5.0
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=5.0)

        # Update the weights values using the gradients we calculated using bp 
        optimizer.step()
        step += 1
        epoch_loss += loss.item()
        writer.add_scalar('Training loss', loss, global_step=step)

    if epoch_loss < best_loss:
        best_loss = epoch_loss
        best_epoch = epoch
        checkpoint_and_save(model, best_loss, epoch, optimizer, epoch_loss) 
        if (epoch - best_epoch >= 10):
            print('no improvement in 10 epochs, break')
            break
    print(f"Epoch_Loss - {loss.item()}")
    print()

print(epoch_loss / len(train_iterator))

Epoch - 1 / 30
Translated example sentence: 
 ['st.', 'short', 'erhebt', 'holzrampe', 'belegten', 'basketballspieler', 'ankunft', 'strahlendem', 'dirtbike', 'produkt', 'sporthallenboden', 'harkt', 'harkt', 'vielen', 'vielen', 'latexhandschuhen', 'breit', 'breit', 'party', 'party', 'seiten', 'seiten', 'party', 'seiten', 'seiten', 'party', 'seiten', 'schwimmende', 'schwimmende', 'kissen', 'kissen', 'kissen', 'beschreibung', 'beschreibung', 'beschreibung', 'straßenkleidung', 'stoßen', 'seiten', 'seiten', 'belegten', 'belegten', 'seiten', 'seiten', 'belegten', 'belegten', 'schürzen', 'grüner', 'auszuziehen', 'schürzen', 'baugerät']
saving

Epoch_Loss - 4.294410705566406

Epoch - 2 / 30
Translated example sentence: 
 ['ein', 'mann', 'in', 'einem', 'einem', 'hemd', 'und', 'einem', 'einem', '<unk>', '.', '<eos>']
Epoch_Loss - 3.9282569885253906

Epoch - 3 / 30
Translated example sentence: 
 ['ein', 'mann', 'in', 'einem', 'einem', 'hemd', 'und', 'einem', 'einem', '<unk>', '.', '<eos>']
Epoch_L