# 프로젝트: 더 멋진 번역기 만들기

## 루브릭

|평가문항|상세기준|
|:--|:--|
|1. 번역기 모델 학습에 필요한 텍스트 데이터 전처리가 잘 이루어졌다.|데이터 정제, SentencePiece를 활용한 토큰화 및 데이터셋 구축의 과정이 지시대로 진행되었다.|
|2. Transformer 번역기 모델이 정상적으로 구동된다.|Transformer 모델의 학습과 추론 과정이 정상적으로 진행되어, 한-영 번역기능이 정상 동작한다.|
|3. 테스트 결과 의미가 통하는 수준의 번역문이 생성되었다.|제시된 문장에 대한 그럴듯한 영어 번역문이 생성되며, 시각화된 Attention Map으로 결과를 뒷받침한다.|

## 라이브러리 버전 확인

```python
import tensorflow
import numpy
import matplotlib

print(tensorflow.__version__)
print(numpy.__version__)
print(matplotlib.__version__)
```

```
2.6.0
1.21.4
3.4.3
```

In [1]:
# import tensorflow
# import numpy
# import matplotlib

# print(tensorflow.__version__)
# print(numpy.__version__)
# print(matplotlib.__version__)

## Matplotlib 한글 깨짐

In [2]:
# !sudo apt-get install -y fonts-nanum
# !sudo fc-cache -fv
# !rm ~/.cache/matplotlib -rf

In [3]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

%config InlineBackend.figure_format = 'retina'
 
fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=9)
plt.rc('font', family='NanumBarunGothic') 
mpl.font_manager.findfont(font)

'/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'

## Konlpy, Mecab

In [4]:
# !curl -s https://raw.githubusercontent.com/teddylee777/machine-learning/master/99-Misc/01-Colab/mecab-colab.sh | bash

In [5]:
# !pip install sentencepiece

# 필요한 라이브러리 import

In [6]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import sentencepiece as spm
from tqdm import tqdm_notebook

import re
import os
import io
import time
import random

import seaborn as sns # Attention 시각화를 위해 필요!

# Step 1. 데이터 다운로드

In [7]:
path_to_zip = tf.keras.utils.get_file(
    'train.zip',
    origin='https://raw.githubusercontent.com/jungyeul/korean-parallel-corpora/master/korean-english-news-v1/korean-english-park.train.tar.gz',
    extract=True)

path_to_file_ko = os.path.dirname(path_to_zip)+"/korean-english-park.train.ko"
path_to_file_en = os.path.dirname(path_to_zip)+"/korean-english-park.train.en"

Downloading data from https://raw.githubusercontent.com/jungyeul/korean-parallel-corpora/master/korean-english-news-v1/korean-english-park.train.tar.gz


In [8]:
with open(path_to_file_ko, "r") as f:
    raw_ko = f.read().splitlines()

print("Data Size:", len(raw_ko))
print("Example:")

for sen in raw_ko[0:100][::20]: print(">>", sen)

Data Size: 94123
Example:
>> 개인용 컴퓨터 사용의 상당 부분은 "이것보다 뛰어날 수 있느냐?"
>> 북한의 핵무기 계획을 포기하도록 하려는 압력이 거세지고 있는 가운데, 일본과 북한의 외교관들이 외교 관계를 정상화하려는 회담을 재개했다.
>> "경호 로보트가 침입자나 화재를 탐지하기 위해서 개인적으로, 그리고 전문적으로 사용되고 있습니다."
>> 수자원부 당국은 논란이 되고 있고, 막대한 비용이 드는 이 사업에 대해 내년에 건설을 시작할 계획이다.
>> 또한 근력 운동은 활발하게 걷는 것이나 최소한 20분 동안 뛰는 것과 같은 유산소 활동에서 얻는 운동 효과를 심장과 폐에 주지 않기 때문에, 연구학자들은 근력 운동이 심장에 큰 영향을 미치는지 여부에 대해 논쟁을 해왔다.


In [9]:
with open(path_to_file_en, "r") as f:
    raw_en = f.read().splitlines()

print("Data Size:", len(raw_en))
print("Example:")

for sen in raw_en[0:100][::20]: print(">>", sen)

Data Size: 94123
Example:
>> Much of personal computing is about "can you top this?"
>> Amid mounting pressure on North Korea to abandon its nuclear weapons program Japanese and North Korean diplomats have resumed talks on normalizing diplomatic relations.
>> “Guard robots are used privately and professionally to detect intruders or fire,” Karlsson said.
>> Authorities from the Water Resources Ministry plan to begin construction next year on the controversial and hugely expensive project.
>> Researchers also have debated whether weight-training has a big impact on the heart, since it does not give the heart and lungs the kind of workout they get from aerobic activities such as brisk walking or running for at least 20 minutes.


# Step 2. 데이터 정제 및 토큰화

In [10]:
# 중복 제거 코퍼스
def clean_corpus(kor_path, eng_path):
    with open(kor_path, "r") as f: kor = f.read().splitlines()
    with open(eng_path, "r") as f: eng = f.read().splitlines()
    assert len(kor) == len(eng)

    cleaned_corpus = set(list(zip(raw_ko, raw_en)))

    return cleaned_corpus

cleaned_corpus = clean_corpus(path_to_file_ko, path_to_file_en)

In [11]:
print(len(cleaned_corpus))

78968


In [12]:
# 한국어 데이터 전처리
def preprocess_sentence(sentence, s_token=False, e_token=False):
    sentence = sentence.lower().strip()  # 모든 입력을 소문자로 변환합니다.
    sentence = re.sub(r"[^a-zA-Zㄱ-ㅎㅏ-ㅣ가-힣?.!,]+", " ", sentence)  # 알파벳, 문장부호, 한글만 남기고 모두 제거합니다.
    sentence = re.sub(r"([?.!,])", r" \1 ", sentence)  # 문장부호 양옆에 공백을 추가합니다.
    sentence = re.sub(r'[" "]+', " ", sentence)  # 여러개의 공백 한개로
    sentence = sentence.strip()  # 문장 앞뒤의 불필요한 공백을 제거합니다.
    
    return sentence

In [15]:
# 토큰화
def generate_tokenizer(corpus,
                       vocab_size,
                       model_type='unigram',
                       lang="ko",
                       pad_id=0,
                       bos_id=1,
                       eos_id=2,
                       unk_id=3):   
    
    input_file = f'{lang}_spm_input.txt'

    with open(input_file, 'w', encoding='utf-8') as f:
        for row in corpus: f.write('{}\n'.format(row))
    
    sp_model_root ='sentencepiece'
    if not os.path.isdir(sp_model_root): os.mkdir(sp_model_root)

    prefix = 'tokenizer_%s_%s' % (lang, model_type+str(vocab_size))
    prefix = os.path.join(sp_model_root, prefix)
    character_coverage = 1.0
    input_argument = '--input=%s --pad_id=%s --bos_id=%s --eos_id=%s --unk_id=%s --model_prefix=%s --vocab_size=%s --character_coverage=%s --model_type=%s'
    cmd = input_argument%(input_file, pad_id, bos_id, eos_id, unk_id, prefix, vocab_size, character_coverage, model_type)

    spm.SentencePieceTrainer.Train(cmd)
    tokenizer = spm.SentencePieceProcessor()
    tokenizer.Load(f'{prefix}.model')

    return tokenizer

In [16]:
SRC_VOCAB_SIZE = TGT_VOCAB_SIZE = 20000

eng_corpus = []
kor_corpus = []

for pair in cleaned_corpus:
    k, e = pair[0],pair[1]

    kor_corpus.append(preprocess_sentence(k))
    eng_corpus.append(preprocess_sentence(e))

ko_tokenizer = generate_tokenizer(kor_corpus, SRC_VOCAB_SIZE, lang="ko")
en_tokenizer = generate_tokenizer(eng_corpus, TGT_VOCAB_SIZE, lang="en")
en_tokenizer.set_encode_extra_options("bos:eos")

True

In [17]:
# 토큰의 길이가 50 이하인 데이터를 선별
from tqdm.notebook import tqdm  # Process 과정을 보기 위해

src_corpus = []
tgt_corpus = []

assert len(kor_corpus) == len(eng_corpus)

# 토큰의 길이가 50 이하인 문장만 남깁니다. 
for idx in tqdm(range(len(kor_corpus))):
    src = ko_tokenizer.EncodeAsIds(kor_corpus[idx])
    tgt = en_tokenizer.EncodeAsIds(eng_corpus[idx])
    
    if len(src) <= 50 and len(tgt) <= 50 :
        src_corpus.append(src)
        tgt_corpus.append(tgt)

# 패딩처리를 완료하여 학습용 데이터를 완성합니다.
enc_train = tf.keras.preprocessing.sequence.pad_sequences(src_corpus, padding='post')
dec_train = tf.keras.preprocessing.sequence.pad_sequences(tgt_corpus, padding='post')

  0%|          | 0/78968 [00:00<?, ?it/s]

# Step 3. 모델 설계

In [18]:
def positional_encoding(pos, d_model):
    def cal_angle(position, i):
        return position / np.power(10000, int(i) / d_model)

    def get_posi_angle_vec(position):
        return [cal_angle(position, i) for i in range(d_model)]

    sinusoid_table = np.array([get_posi_angle_vec(pos_i) for pos_i in range(pos)])
    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2])
    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2])
    
    return sinusoid_table

In [19]:
class MultiHeadAttention(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model
            
        self.depth = d_model // self.num_heads
            
        self.W_q = tf.keras.layers.Dense(d_model)
        self.W_k = tf.keras.layers.Dense(d_model)
        self.W_v = tf.keras.layers.Dense(d_model)
            
        self.linear = tf.keras.layers.Dense(d_model)

    def scaled_dot_product_attention(self, Q, K, V, mask):
        d_k = tf.cast(K.shape[-1], tf.float32)
        QK = tf.matmul(Q, K, transpose_b=True)

        scaled_qk = QK / tf.math.sqrt(d_k)

        if mask is not None: scaled_qk += (mask * -1e9)  

        attentions = tf.nn.softmax(scaled_qk, axis=-1)
        out = tf.matmul(attentions, V)

        return out, attentions
            

    def split_heads(self, x):
        batch_size = x.shape[0]
        split_x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
        split_x = tf.transpose(split_x, perm=[0, 2, 1, 3])

        return split_x

    def combine_heads(self, x):
        batch_size = x.shape[0]
        combined_x = tf.transpose(x, perm=[0, 2, 1, 3])
        combined_x = tf.reshape(combined_x, (batch_size, -1, self.d_model))

        return combined_x

        
    def call(self, Q, K, V, mask):
        WQ = self.W_q(Q)
        WK = self.W_k(K)
        WV = self.W_v(V)
        
        WQ_splits = self.split_heads(WQ)
        WK_splits = self.split_heads(WK)
        WV_splits = self.split_heads(WV)
            
        out, attention_weights = self.scaled_dot_product_attention(
            WQ_splits, WK_splits, WV_splits, mask)
    				        
        out = self.combine_heads(out)
        out = self.linear(out)
                
        return out, attention_weights

In [20]:
class PoswiseFeedForwardNet(tf.keras.layers.Layer):
    def __init__(self, d_model, d_ff):
        super(PoswiseFeedForwardNet, self).__init__()
        self.w_1 = tf.keras.layers.Dense(d_ff, activation='relu')
        self.w_2 = tf.keras.layers.Dense(d_model)

    def call(self, x):
        out = self.w_1(x)
        out = self.w_2(out)
            
        return out

In [21]:
class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, n_heads, d_ff, dropout):
        super(EncoderLayer, self).__init__()

        self.enc_self_attn = MultiHeadAttention(d_model, n_heads)
        self.ffn = PoswiseFeedForwardNet(d_model, d_ff)

        self.norm_1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.dropout = tf.keras.layers.Dropout(dropout)
        
    def call(self, x, mask):

        """
        Multi-Head Attention
        """
        residual = x
        out = self.norm_1(x)
        out, enc_attn = self.enc_self_attn(out, out, out, mask)
        out = self.dropout(out)
        out += residual
        
        """
        Position-Wise Feed Forward Network
        """
        residual = out
        out = self.norm_2(out)
        out = self.ffn(out)
        out = self.dropout(out)
        out += residual
        
        return out, enc_attn

In [22]:
class DecoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super(DecoderLayer, self).__init__()

        self.dec_self_attn = MultiHeadAttention(d_model, num_heads)
        self.enc_dec_attn = MultiHeadAttention(d_model, num_heads)

        self.ffn = PoswiseFeedForwardNet(d_model, d_ff)

        self.norm_1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_3 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.dropout = tf.keras.layers.Dropout(dropout)
    
    def call(self, x, enc_out, causality_mask, padding_mask):

        """
        Masked Multi-Head Attention
        """
        residual = x
        out = self.norm_1(x)
        out, dec_attn = self.dec_self_attn(out, out, out, padding_mask)
        out = self.dropout(out)
        out += residual

        """
        Multi-Head Attention
        """
        residual = out
        out = self.norm_2(out)
        out, dec_enc_attn = self.enc_dec_attn(out, enc_out, enc_out, causality_mask)
        out = self.dropout(out)
        out += residual
        
        """
        Position-Wise Feed Forward Network
        """
        residual = out
        out = self.norm_3(out)
        out = self.ffn(out)
        out = self.dropout(out)
        out += residual

        return out, dec_attn, dec_enc_attn

In [23]:
class Encoder(tf.keras.Model):
    def __init__(self,
                 n_layers,
                 d_model,
                 n_heads,
                 d_ff,
                 dropout):
        super(Encoder, self).__init__()
        self.n_layers = n_layers
        self.enc_layers = [EncoderLayer(d_model, n_heads, d_ff, dropout) 
                        for _ in range(n_layers)]
        
    def call(self, x, mask):
        out = x
    
        enc_attns = list()
        for i in range(self.n_layers):
            out, enc_attn = self.enc_layers[i](out, mask)
            enc_attns.append(enc_attn)
        
        return out, enc_attns

In [24]:
class Decoder(tf.keras.Model):
    def __init__(self,
                 n_layers,
                 d_model,
                 n_heads,
                 d_ff,
                 dropout):
        super(Decoder, self).__init__()
        self.n_layers = n_layers
        self.dec_layers = [DecoderLayer(d_model, n_heads, d_ff, dropout) 
                            for _ in range(n_layers)]
                            
                            
    def call(self, x, enc_out, causality_mask, padding_mask):
        out = x
    
        dec_attns = list()
        dec_enc_attns = list()
        for i in range(self.n_layers):
            out, dec_attn, dec_enc_attn = \
            self.dec_layers[i](out, enc_out, causality_mask, padding_mask)

            dec_attns.append(dec_attn)
            dec_enc_attns.append(dec_enc_attn)

        return out, dec_attns, dec_enc_attns

In [25]:
class Transformer(tf.keras.Model):
    def __init__(self,
                    n_layers,
                    d_model,
                    n_heads,
                    d_ff,
                    src_vocab_size,
                    tgt_vocab_size,
                    pos_len,
                    dropout=0.2,
                    shared=True):
        super(Transformer, self).__init__()
        self.d_model = tf.cast(d_model, tf.float32)

        self.enc_emb = tf.keras.layers.Embedding(src_vocab_size, d_model)
        self.dec_emb = tf.keras.layers.Embedding(tgt_vocab_size, d_model)

        self.pos_encoding = positional_encoding(pos_len, d_model)
        self.dropout = tf.keras.layers.Dropout(dropout)

        self.encoder = Encoder(n_layers, d_model, n_heads, d_ff, dropout)
        self.decoder = Decoder(n_layers, d_model, n_heads, d_ff, dropout)

        self.fc = tf.keras.layers.Dense(tgt_vocab_size)

        self.shared = shared

        if shared: self.fc.set_weights(tf.transpose(self.dec_emb.weights))

    def embedding(self, emb, x):
        seq_len = x.shape[1]
        out = emb(x)

        if self.shared: out *= tf.math.sqrt(self.d_model)

        out += self.pos_encoding[np.newaxis, ...][:, :seq_len, :]
        out = self.dropout(out)

        return out

        
    def call(self, enc_in, dec_in, enc_mask, causality_mask, dec_mask):
        enc_in = self.embedding(self.enc_emb, enc_in)
        dec_in = self.embedding(self.dec_emb, dec_in)

        enc_out, enc_attns = self.encoder(enc_in, enc_mask)
        
        dec_out, dec_attns, dec_enc_attns = \
        self.decoder(dec_in, enc_out, causality_mask, dec_mask)
        
        logits = self.fc(dec_out)
        
        return logits, enc_attns, dec_attns, dec_enc_attns

In [26]:
def generate_padding_mask(seq):
    seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
    return seq[:, tf.newaxis, tf.newaxis, :]

def generate_causality_mask(src_len, tgt_len):
    mask = 1 - np.cumsum(np.eye(src_len, tgt_len), 0)
    return tf.cast(mask, tf.float32)

def generate_masks(src, tgt):
    enc_mask = generate_padding_mask(src)
    dec_mask = generate_padding_mask(tgt)

    dec_enc_causality_mask = generate_causality_mask(tgt.shape[1], src.shape[1])
    dec_enc_mask = tf.maximum(enc_mask, dec_enc_causality_mask)

    dec_causality_mask = generate_causality_mask(tgt.shape[1], tgt.shape[1])
    dec_mask = tf.maximum(dec_mask, dec_causality_mask)

    return enc_mask, dec_enc_mask, dec_mask

# Step 4. 훈련하기

In [27]:
transformer = Transformer(n_layers=2,
                          d_model=512,
                          n_heads=8,
                          d_ff=2048,
                          dropout=0.2,
                          src_vocab_size=SRC_VOCAB_SIZE,
                          tgt_vocab_size=TGT_VOCAB_SIZE,
                          pos_len=200,
                          shared=True)

## Learning Rate & Optimizer

In [28]:
class LearningRateScheduler(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(LearningRateScheduler, self).__init__()
        self.d_model = d_model
        self.warmup_steps = warmup_steps
    
    def __call__(self, step):
        arg1 = step ** -0.5
        arg2 = step * (self.warmup_steps ** -1.5)
        
        return (self.d_model ** -0.5) * tf.math.minimum(arg1, arg2)

learning_rate = LearningRateScheduler(512)
optimizer = tf.keras.optimizers.Adam(learning_rate,
                                     beta_1=0.9,
                                     beta_2=0.98, 
                                     epsilon=1e-9)

## Loss Function

In [29]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')

def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    # Masking 되지 않은 입력의 개수로 Scaling하는 과정
    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_sum(loss_)/tf.reduce_sum(mask)

## train_step 함수 정의

In [30]:
@tf.function()
def train_step(src, tgt, model, optimizer):
    gold = tgt[:, 1:]
        
    enc_mask, dec_enc_mask, dec_mask = generate_masks(src, tgt)

    # 계산된 loss에 tf.GradientTape()를 적용해 학습을 진행합니다.
    with tf.GradientTape() as tape:
        predictions, enc_attns, dec_attns, dec_enc_attns = \
        model(src, tgt, enc_mask, dec_enc_mask, dec_mask)
        loss = loss_function(gold, predictions[:, :-1])

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    return loss, enc_attns, dec_attns, dec_enc_attns

In [31]:
# Attention 시각화 함수
def visualize_attention(src, tgt, enc_attns, dec_attns, dec_enc_attns):
    def draw(data, ax, x="auto", y="auto"):
        import seaborn
        seaborn.heatmap(data, 
                        square=True,
                        vmin=0.0, vmax=1.0, 
                        cbar=False, ax=ax,
                        xticklabels=x,
                        yticklabels=y)
        
    for layer in range(0, 2, 1):
        fig, axs = plt.subplots(1, 4, figsize=(20, 10))
        print("Encoder Layer", layer + 1)
        for h in range(4):
            draw(enc_attns[layer][0, h, :len(src), :len(src)], axs[h], src, src)
        plt.show()
        
    for layer in range(0, 2, 1):
        fig, axs = plt.subplots(1, 4, figsize=(20, 10))
        print("Decoder Self Layer", layer+1)
        for h in range(4):
            draw(dec_attns[layer][0, h, :len(tgt), :len(tgt)], axs[h], tgt, tgt)
        plt.show()

        print("Decoder Src Layer", layer+1)
        fig, axs = plt.subplots(1, 4, figsize=(20, 10))
        for h in range(4):
            draw(dec_enc_attns[layer][0, h, :len(tgt), :len(src)], axs[h], src, tgt)
        plt.show()

In [32]:
# 번역 생성 함수
def evaluate(sentence, model, src_tokenizer, tgt_tokenizer):
    sentence = preprocess_sentence(sentence)

    pieces = src_tokenizer.encode_as_pieces(sentence)
    tokens = src_tokenizer.encode_as_ids(sentence)

    _input = tf.keras.preprocessing.sequence.pad_sequences([tokens],
                                                           maxlen=enc_train.shape[-1],
                                                           padding='post')
    
    ids = []
    output = tf.expand_dims([tgt_tokenizer.bos_id()], 0)
    for i in range(dec_train.shape[-1]):
        enc_padding_mask, combined_mask, dec_padding_mask = \
        generate_masks(_input, output)

        predictions, enc_attns, dec_attns, dec_enc_attns =\
        model(_input, 
              output,
              enc_padding_mask,
              combined_mask,
              dec_padding_mask)

        predicted_id = \
        tf.argmax(tf.math.softmax(predictions, axis=-1)[0, -1]).numpy().item()

        if tgt_tokenizer.eos_id() == predicted_id:
            result = tgt_tokenizer.decode_ids(ids)
            return pieces, result, enc_attns, dec_attns, dec_enc_attns

        ids.append(predicted_id)
        output = tf.concat([output, tf.expand_dims([predicted_id], 0)], axis=-1)

    result = tgt_tokenizer.decode_ids(ids)

    return pieces, result, enc_attns, dec_attns, dec_enc_attns

In [33]:
# 번역 생성 및 Attention 시각화 결합

def translate(sentence, model, src_tokenizer, tgt_tokenizer, plot_attention=False):
    pieces, result, enc_attns, dec_attns, dec_enc_attns = \
    evaluate(sentence, model, src_tokenizer, tgt_tokenizer)
    
    print('Input: %s' % (sentence))
    print('Predicted translation: {}'.format(result))

    if plot_attention:
        visualize_attention(pieces, result.split(), enc_attns, dec_attns, dec_enc_attns)

In [34]:
# 학습
BATCH_SIZE = 64
EPOCHS = 20

examples = [
            "오바마는 대통령이다.",
            "시민들은 도시 속에 산다.",
            "커피는 필요 없다.",
            "일곱 명의 사망자가 발생했다."
]

for epoch in range(EPOCHS):
    total_loss = 0
    
    idx_list = list(range(0, enc_train.shape[0], BATCH_SIZE))
    random.shuffle(idx_list)
    t = tqdm_notebook(idx_list)

    for (batch, idx) in enumerate(t):
        batch_loss, enc_attns, dec_attns, dec_enc_attns = \
        train_step(enc_train[idx:idx+BATCH_SIZE],
                    dec_train[idx:idx+BATCH_SIZE],
                    transformer,
                    optimizer)

        total_loss += batch_loss
        
        t.set_description_str('Epoch %2d' % (epoch + 1))
        t.set_postfix_str('Loss %.4f' % (total_loss.numpy() / (batch + 1)))

    for example in examples:
        translate(example, transformer, ko_tokenizer, en_tokenizer)

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the first time .
Input: 시민들은 도시 속에 산다.
Predicted translation: the government is a small .
Input: 커피는 필요 없다.
Predicted translation: it is not a good .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: the death toll were killed .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is a president of obama s president .
Input: 시민들은 도시 속에 산다.
Predicted translation: the building is a city of the city .
Input: 커피는 필요 없다.
Predicted translation: coffee is a coffee .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: the death toll in the death toll in the town of died in the town of .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: the white house is a good thing .
Input: 시민들은 도시 속에 산다.
Predicted translation: the mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain mountain
Input: 커피는 필요 없다.
Predicted translation: no longer must take anyway .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: the death toll in the second day of the death toll .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the most honor .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are among the mountainous city .
Input: 커피는 필요 없다.
Predicted translation: it is no coffee .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven deaths were killed .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are climbing in the city .
Input: 커피는 필요 없다.
Predicted translation: it is a very emotional battle .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven of them died wednesday .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: it s the same .
Input: 시민들은 도시 속에 산다.
Predicted translation: the city is about to be in the city .
Input: 커피는 필요 없다.
Predicted translation: the need is a life . however .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people died of the seventh high ranking government .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: the president is obama .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are cities in the city .
Input: 커피는 필요 없다.
Predicted translation: it needs to preserve the need .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people were killed , the seventh fatality on sunday .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the next presidential campaign .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens have faced in cities
Input: 커피는 필요 없다.
Predicted translation: the needs to calm .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven seven other seven highs .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is built with the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are staying in the city .
Input: 커피는 필요 없다.
Predicted translation: no one was speaking .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven other people are dead , according to seven deaths .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens were in cities
Input: 커피는 필요 없다.
Predicted translation: there need no carrier .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people seven seven killed .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are in everybody .
Input: 커피는 필요 없다.
Predicted translation: no coffee needs .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven other people were killed .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is retired .
Input: 시민들은 도시 속에 산다.
Predicted translation: they san lunch is one of the mountain welcome
Input: 커피는 필요 없다.
Predicted translation: there need no bombers . there don t need .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven seven other deaths were .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: the town is san francisco .
Input: 커피는 필요 없다.
Predicted translation: no coffee is should shouldn t return .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven seven other high deaths .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president .
Input: 시민들은 도시 속에 산다.
Predicted translation: they are in a visibility .
Input: 커피는 필요 없다.
Predicted translation: the coffee is no choice for the coffee at the coffee .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven seventeen seven people are dead , police said .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the mccain campaign for president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens , cities in the urban t vi easy
Input: 커피는 필요 없다.
Predicted translation: it should no coffee .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven of the seven deaths .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the mccain campaign says .
Input: 시민들은 도시 속에 산다.
Predicted translation: the city is a new jersey mountain .
Input: 커피는 필요 없다.
Predicted translation: the coffee is should need a list of coffee
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people seven high courts are said .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the mccains to visit .
Input: 시민들은 도시 속에 산다.
Predicted translation: they stay in the mountain .
Input: 커피는 필요 없다.
Predicted translation: the coffee needs to require the coffee .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people seven public fires are still missing .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the president s mccain company .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are in the house .
Input: 커피는 필요 없다.
Predicted translation: no coffee is in everyday los angeles .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven of the dead were still missing .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the mccain campaign .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens are staying in the city .
Input: 커피는 필요 없다.
Predicted translation: there is no need .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people seven out of seven others are dead .


  0%|          | 0/1127 [00:00<?, ?it/s]

Input: 오바마는 대통령이다.
Predicted translation: obama is the next president .
Input: 시민들은 도시 속에 산다.
Predicted translation: citizens . cities
Input: 커피는 필요 없다.
Predicted translation: the coffee is need .
Input: 일곱 명의 사망자가 발생했다.
Predicted translation: seven people seven public opinion put the number .


# 회고

- 이전에 seq2seq 때보단 빠르게 성능 좋게 나름 번역을 하는 것 같았다.
- 하지만 아직 좀 더 모델의 성능을 끌어올릴 방법을 생각해보고 추후에 조정하여 결과를 비교해 봐야겠다.
- 트랜스포머를 마스터 해보자.