## 프로젝트 - 멋진 챗봇 만들기
### Transformer Chatbot
***
#### Step 1. 데이터 다운로드

#### Step 2. 데이터 정제

#### Step 3. 데이터 토큰화

#### Step 4. Augmentation

#### Step 5. 데이터 벡터화

#### Step 6. 훈련하기

#### Step 7. 성능 측정하기
***
### 평가문항
**1. 챗봇 훈련데이터 전처리 과정이 체계적으로 진행되었는가?**  
-챗봇 훈련데이터를 위한 전처리와 augmentation이 적절히 수행되어 3만개 가량의 훈련데이터셋이 구축되었다.  

**2. transformer 모델을 활용한 챗봇 모델이 과적합을 피해 안정적으로 훈련되었는가?**  
-과적합을 피할 수 있는 하이퍼파라미터 셋이 적절히 제시되었다.  

**3. 챗봇이 사용자의 질문에 그럴듯한 형태로 답하는 사례가 있는가?**  
-주어진 예문을 포함하여 챗봇에 던진 질문에 적절히 답하는 사례가 제출되었다.  
***
#### Step 1. 데이터 다운로드
***

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

import re
import os
import io
import time
import random

from konlpy.tag import Mecab
mecab = Mecab()

import gensim
from collections import Counter

from sklearn.model_selection import train_test_split

In [3]:
data = pd.read_csv('/home/aiffel/aiffel/transformer_chatbot/ChatbotData .csv')
data.head()

Unnamed: 0,Q,A,label
0,12시 땡!,하루가 또 가네요.,0
1,1지망 학교 떨어졌어,위로해 드립니다.,0
2,3박4일 놀러가고 싶다,여행은 언제나 좋죠.,0
3,3박4일 정도 놀러가고 싶다,여행은 언제나 좋죠.,0
4,PPL 심하네,눈살이 찌푸려지죠.,0


In [4]:
questions = data['Q']
answers = data['A']

In [5]:
print(questions.head())

0             12시 땡!
1        1지망 학교 떨어졌어
2       3박4일 놀러가고 싶다
3    3박4일 정도 놀러가고 싶다
4            PPL 심하네
Name: Q, dtype: object


In [52]:
print(answers.head())

0     하루가 또 가네요.
1      위로해 드립니다.
2    여행은 언제나 좋죠.
3    여행은 언제나 좋죠.
4     눈살이 찌푸려지죠.
Name: A, dtype: object


#### Step 2. 데이터 정제

In [7]:
def preprocess_sentence(sentence):
    if sentence.isalpha():
        sentence = sentence.lower()

    sentence = re.sub(r"([?.!,¿¡])", r" \1 ", sentence)
    sentence = re.sub(r'[" "]+', " ", sentence)
    sentence = re.sub(r"[^ㄱ-ㅎ|가-힣|a-zA-Z|0-9|?!,¿¡]+", " ", sentence)

    sentence = sentence.strip()
    
    return sentence

#### Step 3. 데이터 토큰화

In [8]:
def build_corpus(src, tgt):
    que_corpus = []
    ans_corpus = []
    
    for idx, sentence in enumerate(src):
        src_preprocessed = preprocess_sentence(sentence)
        tgt_preprocessed = preprocess_sentence(tgt[idx])
        
        if src_preprocessed in que_corpus or tgt_preprocessed in ans_corpus:continue
        else:
            src_tokenized = mecab.morphs(src_preprocessed)
            tgt_tokenized = mecab.morphs(tgt_preprocessed)
            
            if len(src_tokenized) <= 20 and len(tgt_tokenized) <= 20:
                que_corpus.append(src_tokenized)
                ans_corpus.append(tgt_tokenized)
            else:continue
    
    return que_corpus, ans_corpus

In [9]:
que_corpus, ans_corpus = build_corpus(questions, answers)

In [10]:
que_corpus[:5]

[['12', '시', '땡', '!'],
 ['1', '지망', '학교', '떨어졌', '어'],
 ['3', '박', '4', '일', '놀', '러', '가', '고', '싶', '다'],
 ['3', '박', '4', '일', '정도', '놀', '러', '가', '고', '싶', '다'],
 ['PPL', '심하', '네']]

In [11]:
ans_corpus[:5]

[['하루', '가', '또', '가', '네요'],
 ['위로', '해', '드립니다'],
 ['여행', '은', '언제나', '좋', '죠'],
 ['여행', '은', '언제나', '좋', '죠'],
 ['눈살', '이', '찌푸려', '지', '죠']]

In [12]:
print(len(que_corpus))
print(len(ans_corpus))

11713
11713


In [14]:
wv = gensim.models.Word2Vec.load('/home/aiffel/aiffel/transformer_chatbot/ko.bin')

#### Step 4. Augmentation

In [15]:
def lexical_sub(sentence, word2vec):
    import random

    res = ""
    toks = sentence

    try:
        _from = random.choice(toks)
        _to = word2vec.most_similar(_from)[0][0]

    except:   # 단어장에 없는 단어
        return None

    for tok in toks:
        if tok is _from: res += _to + " "
        else: res += tok + " "

    return res

In [16]:
lexical_sub(que_corpus[0], wv)

  if __name__ == '__main__':


'12 시가 땡 ! '

In [17]:
from tqdm import tqdm_notebook

new_que_corpus = []
new_ans_corpus = []

for idx in tqdm_notebook(range(len(que_corpus))):
    que_augmented = lexical_sub(que_corpus[idx], wv)
    ans = ans_corpus[idx]
    
    if que_augmented is not None:
        new_que_corpus.append(que_augmented.split())
        new_ans_corpus.append(ans)
    else:continue
    
for idx in tqdm_notebook(range(len(ans_corpus))):
    que = que_corpus[idx]
    ans_augmented = lexical_sub(ans_corpus[idx], wv)
    
    if ans_augmented is not None:
        new_que_corpus.append(que)
        new_ans_corpus.append(ans_augmented.split())
    else:continue

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  


  0%|          | 0/11713 [00:00<?, ?it/s]

  if __name__ == '__main__':
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  from ipykernel import kernelapp as app


  0%|          | 0/11713 [00:00<?, ?it/s]

In [18]:
print(len(new_que_corpus))
print(len(new_ans_corpus))

19948
19948


In [19]:
new_que_corpus[:5]

[['12', '시가', '땡', '!'],
 ['3', '김', '4', '일', '놀', '러', '가', '고', '싶', '다'],
 ['3', '김', '4', '일', '정도', '놀', '러', '가', '고', '싶', '다'],
 ['PPL', '강하', '네'],
 ['SD', '카드', '망가졌', '어서']]

In [20]:
new_ans_corpus[:5]

[['하루', '가', '또', '가', '네요'],
 ['여행', '은', '언제나', '좋', '죠'],
 ['여행', '은', '언제나', '좋', '죠'],
 ['눈살', '이', '찌푸려', '지', '죠'],
 ['다시', '새로', '사', '는', '게', '마음', '편', '해요']]

#### Step 5. 데이터 벡터화

In [21]:
temp = []

for corpus in ans_corpus:
    temp.append(["<start>"] + corpus + ["<end>"])

In [22]:
ans_corpus = temp

In [23]:
ans_corpus[:5]

[['<start>', '하루', '가', '또', '가', '네요', '<end>'],
 ['<start>', '위로', '해', '드립니다', '<end>'],
 ['<start>', '여행', '은', '언제나', '좋', '죠', '<end>'],
 ['<start>', '여행', '은', '언제나', '좋', '죠', '<end>'],
 ['<start>', '눈살', '이', '찌푸려', '지', '죠', '<end>']]

In [24]:
total_data = que_corpus + ans_corpus
len(total_data)

23426

In [25]:
words = np.concatenate(total_data).tolist()
counter = Counter(words)
counter = counter.most_common(30000-2)
vocab = ['<pad>', '<unk>'] + [key for key, _ in counter]
word_to_index = {word:index for index, word in enumerate(vocab)}
index_to_word = {index:word for word, index in word_to_index.items()}

In [26]:
word_to_index

{'<pad>': 0,
 '<unk>': 1,
 '<start>': 2,
 '<end>': 3,
 '이': 4,
 '하': 5,
 '는': 6,
 '을': 7,
 '세요': 8,
 '가': 9,
 '어': 10,
 '고': 11,
 '좋': 12,
 '거': 13,
 '해': 14,
 '있': 15,
 '보': 16,
 '은': 17,
 '지': 18,
 '?': 19,
 '나': 20,
 '아': 21,
 '도': 22,
 '게': 23,
 '에': 24,
 '겠': 25,
 '예요': 26,
 '사람': 27,
 '어요': 28,
 '다': 29,
 '를': 30,
 '한': 31,
 '같': 32,
 '죠': 33,
 '사랑': 34,
 '네요': 35,
 '싶': 36,
 '면': 37,
 '수': 38,
 '안': 39,
 '네': 40,
 '없': 41,
 '생각': 42,
 '친구': 43,
 '것': 44,
 '의': 45,
 '잘': 46,
 '아요': 47,
 '봐요': 48,
 '말': 49,
 '할': 50,
 '는데': 51,
 '않': 52,
 '마음': 53,
 '너무': 54,
 '되': 55,
 '주': 56,
 '했': 57,
 '만': 58,
 '일': 59,
 '기': 60,
 '더': 61,
 '이별': 62,
 '었': 63,
 '내': 64,
 '들': 65,
 '연락': 66,
 '여자': 67,
 '남자': 68,
 '힘들': 69,
 '해요': 70,
 '시간': 71,
 '많이': 72,
 '길': 73,
 '으면': 74,
 '요': 75,
 '먹': 76,
 '좀': 77,
 '남': 78,
 '에요': 79,
 '에서': 80,
 '으로': 81,
 '한테': 82,
 '썸': 83,
 '때': 84,
 '!': 85,
 '았': 86,
 '많': 87,
 '야': 88,
 '짝': 89,
 '저': 90,
 '받': 91,
 '건': 92,
 '뭐': 93,
 '오늘': 94,
 '만나': 95,
 '로'

In [27]:
def get_encoded_sentence(sentence, word_to_index):
    return [word_to_index[word] if word in word_to_index else word_to_index['<unk>'] for word in sentence]

In [28]:
def get_decoded_sentence(encoded_sentence, index_to_word):
    return ' '.join(index_to_word[index] if index in index_to_word else '<unk>' for index in encoded_sentence[1:])

In [29]:
def vectorize(corpus, word_to_index):
    data = []
    for sen in corpus:
        sen = get_encoded_sentence(sen, word_to_index)
        data.append(sen)
    return data

In [30]:
que_train = vectorize(que_corpus, word_to_index)
ans_train = vectorize(ans_corpus, word_to_index)

In [31]:
enc_train = keras.preprocessing.sequence.pad_sequences(que_train, padding='pre', maxlen=20)
dec_train = keras.preprocessing.sequence.pad_sequences(ans_train, padding='pre', maxlen=20)

In [32]:
enc_train[0]

array([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0, 2322,  175, 4672,   85], dtype=int32)

In [33]:
dec_train[0]

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         2, 250,   9, 143,   9,  35,   3], dtype=int32)

#### Step 6. 훈련하기

In [34]:
def positional_encoding(pos, d_model):
    def cal_angle(position, i):
        return position / np.power(10000, int(i) / d_model)

    def get_posi_angle_vec(position):
        return [cal_angle(position, i) for i in range(d_model)]

    sinusoid_table = np.array([get_posi_angle_vec(pos_i) for pos_i in range(pos)])

    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2])
    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2])

    return sinusoid_table

In [35]:
def generate_padding_mask(seq):
    seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
    return seq[:, tf.newaxis, tf.newaxis, :]

def generate_causality_mask(src_len, tgt_len):
    mask = 1 - np.cumsum(np.eye(src_len, tgt_len), 0)
    return tf.cast(mask, tf.float32)

def generate_masks(src, tgt):
    enc_mask = generate_padding_mask(src)
    dec_mask = generate_padding_mask(tgt)

    dec_causality_mask = generate_causality_mask(tgt.shape[1], tgt.shape[1])
    dec_mask = tf.maximum(dec_mask, dec_causality_mask)

    dec_enc_causality_mask = generate_causality_mask(tgt.shape[1], src.shape[1])
    dec_enc_mask = tf.maximum(enc_mask, dec_enc_causality_mask)

    return enc_mask, dec_enc_mask, dec_mask

In [36]:
class MultiHeadAttention(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model

        self.depth = d_model // self.num_heads

        self.W_q = tf.keras.layers.Dense(d_model)
        self.W_k = tf.keras.layers.Dense(d_model)
        self.W_v = tf.keras.layers.Dense(d_model)

        self.linear = tf.keras.layers.Dense(d_model)

    def scaled_dot_product_attention(self, Q, K, V, mask):
        d_k = tf.cast(K.shape[-1], tf.float32)
        QK = tf.matmul(Q, K, transpose_b=True)

        scaled_qk = QK / tf.math.sqrt(d_k)

        if mask is not None: scaled_qk += (mask * -1e9)  

        attentions = tf.nn.softmax(scaled_qk, axis=-1)
        out = tf.matmul(attentions, V)

        return out, attentions


    def split_heads(self, x):
        bsz = x.shape[0]
        split_x = tf.reshape(x, (bsz, -1, self.num_heads, self.depth))
        split_x = tf.transpose(split_x, perm=[0, 2, 1, 3])

        return split_x

    def combine_heads(self, x):
        bsz = x.shape[0]
        combined_x = tf.transpose(x, perm=[0, 2, 1, 3])
        combined_x = tf.reshape(combined_x, (bsz, -1, self.d_model))

        return combined_x


    def call(self, Q, K, V, mask):
        WQ = self.W_q(Q)
        WK = self.W_k(K)
        WV = self.W_v(V)

        WQ_splits = self.split_heads(WQ)
        WK_splits = self.split_heads(WK)
        WV_splits = self.split_heads(WV)

        out, attention_weights = self.scaled_dot_product_attention(
            WQ_splits, WK_splits, WV_splits, mask)

        out = self.combine_heads(out)
        out = self.linear(out)

        return out, attention_weights

In [37]:
class PoswiseFeedForwardNet(tf.keras.layers.Layer):
    def __init__(self, d_model, d_ff):
        super(PoswiseFeedForwardNet, self).__init__()
        self.d_model = d_model
        self.d_ff = d_ff

        self.fc1 = tf.keras.layers.Dense(d_ff, activation='relu')
        self.fc2 = tf.keras.layers.Dense(d_model)

    def call(self, x):
        out = self.fc1(x)
        out = self.fc2(out)

        return out

In [38]:
class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, n_heads, d_ff, dropout):
        super(EncoderLayer, self).__init__()

        self.enc_self_attn = MultiHeadAttention(d_model, n_heads)
        self.ffn = PoswiseFeedForwardNet(d_model, d_ff)

        self.norm_1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.do = tf.keras.layers.Dropout(dropout)

    def call(self, x, mask):

        """
        Multi-Head Attention
        """
        residual = x
        out = self.norm_1(x)
        out, enc_attn = self.enc_self_attn(out, out, out, mask)
        out = self.do(out)
        out += residual

        """
        Position-Wise Feed Forward Network
        """
        residual = out
        out = self.norm_2(out)
        out = self.ffn(out)
        out = self.do(out)
        out += residual

        return out, enc_attn

In [39]:
class DecoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super(DecoderLayer, self).__init__()

        self.dec_self_attn = MultiHeadAttention(d_model, num_heads)
        self.enc_dec_attn = MultiHeadAttention(d_model, num_heads)

        self.ffn = PoswiseFeedForwardNet(d_model, d_ff)

        self.norm_1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.norm_3 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.do = tf.keras.layers.Dropout(dropout)

    def call(self, x, enc_out, causality_mask, padding_mask):

        """
        Masked Multi-Head Attention
        """
        residual = x
        out = self.norm_1(x)
        out, dec_attn = self.dec_self_attn(out, out, out, padding_mask)
        out = self.do(out)
        out += residual

        """
        Multi-Head Attention
        """
        residual = out
        out = self.norm_2(out)
        out, dec_enc_attn = self.dec_self_attn(out, enc_out, enc_out, causality_mask)
        out = self.do(out)
        out += residual

        """
        Position-Wise Feed Forward Network
        """
        residual = out
        out = self.norm_3(out)
        out = self.ffn(out)
        out = self.do(out)
        out += residual

        return out, dec_attn, dec_enc_attn

In [40]:
class Encoder(tf.keras.Model):
    def __init__(self,
                    n_layers,
                    d_model,
                    n_heads,
                    d_ff,
                    dropout):
        super(Encoder, self).__init__()
        self.n_layers = n_layers
        self.enc_layers = [EncoderLayer(d_model, n_heads, d_ff, dropout) 
                        for _ in range(n_layers)]

        self.do = tf.keras.layers.Dropout(dropout)

    def call(self, x, mask):
        out = x

        enc_attns = list()
        for i in range(self.n_layers):
            out, enc_attn = self.enc_layers[i](out, mask)
            enc_attns.append(enc_attn)

        return out, enc_attns

In [41]:
class Decoder(tf.keras.Model):
    def __init__(self,
                    n_layers,
                    d_model,
                    n_heads,
                    d_ff,
                    dropout):
        super(Decoder, self).__init__()
        self.n_layers = n_layers
        self.dec_layers = [DecoderLayer(d_model, n_heads, d_ff, dropout) 
                            for _ in range(n_layers)]


    def call(self, x, enc_out, causality_mask, padding_mask):
        out = x

        dec_attns = list()
        dec_enc_attns = list()
        for i in range(self.n_layers):
            out, dec_attn, dec_enc_attn = \
            self.dec_layers[i](out, enc_out, causality_mask, padding_mask)

            dec_attns.append(dec_attn)
            dec_enc_attns.append(dec_enc_attn)

        return out, dec_attns, dec_enc_attns

In [42]:
class Transformer(tf.keras.Model):
    def __init__(self,
                    n_layers,
                    d_model,
                    n_heads,
                    d_ff,
                    src_vocab_size,
                    tgt_vocab_size,
                    pos_len,
                    dropout=0.2,
                    shared_fc=True,
                    shared_emb=False):
        super(Transformer, self).__init__()

        self.d_model = tf.cast(d_model, tf.float32)

        if shared_emb:
            self.enc_emb = self.dec_emb = \
            tf.keras.layers.Embedding(src_vocab_size, d_model)
        else:
            self.enc_emb = tf.keras.layers.Embedding(src_vocab_size, d_model)
            self.dec_emb = tf.keras.layers.Embedding(tgt_vocab_size, d_model)

        self.pos_encoding = positional_encoding(pos_len, d_model)
        self.do = tf.keras.layers.Dropout(dropout)

        self.encoder = Encoder(n_layers, d_model, n_heads, d_ff, dropout)
        self.decoder = Decoder(n_layers, d_model, n_heads, d_ff, dropout)

        self.fc = tf.keras.layers.Dense(tgt_vocab_size)

        self.shared_fc = shared_fc

        if shared_fc:
            self.fc.set_weights(tf.transpose(self.dec_emb.weights))

    def embedding(self, emb, x):
        seq_len = x.shape[1]

        out = emb(x)

        if self.shared_fc: out *= tf.math.sqrt(self.d_model)

        out += self.pos_encoding[np.newaxis, ...][:, :seq_len, :]
        out = self.do(out)

        return out


    def call(self, enc_in, dec_in, enc_mask, causality_mask, dec_mask):
        enc_in = self.embedding(self.enc_emb, enc_in)
        dec_in = self.embedding(self.dec_emb, dec_in)

        enc_out, enc_attns = self.encoder(enc_in, enc_mask)

        dec_out, dec_attns, dec_enc_attns = \
        self.decoder(dec_in, enc_out, causality_mask, dec_mask)

        logits = self.fc(dec_out)

        return logits, enc_attns, dec_attns, dec_enc_attns

In [43]:
VOCAB_SIZE = 30000

transformer = Transformer(
    n_layers=5,
    d_model=512,
    n_heads=8,
    d_ff=2048,
    src_vocab_size=VOCAB_SIZE,
    tgt_vocab_size=VOCAB_SIZE,
    pos_len=200,
    dropout=0.3,
    shared_fc=True,
    shared_emb=True)

d_model = 512

In [44]:
class LearningRateScheduler(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(LearningRateScheduler, self).__init__()

        self.d_model = d_model
        self.warmup_steps = warmup_steps

    def __call__(self, step):
        arg1 = step ** -0.5
        arg2 = step * (self.warmup_steps ** -1.5)

        return (self.d_model ** -0.5) * tf.math.minimum(arg1, arg2)

In [45]:
learning_rate = LearningRateScheduler(d_model)

optimizer = tf.keras.optimizers.Adam(learning_rate,
                                        beta_1=0.9,
                                        beta_2=0.98, 
                                        epsilon=1e-9)

In [46]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')

def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_sum(loss_)/tf.reduce_sum(mask)

In [47]:
@tf.function()
def train_step(src, tgt, model, optimizer):
    tgt_in = tgt[:, :-1]
    gold = tgt[:, 1:]

    enc_mask, dec_enc_mask, dec_mask = generate_masks(src, tgt_in)

    with tf.GradientTape() as tape:
        predictions, enc_attns, dec_attns, dec_enc_attns = \
        model(src, tgt_in, enc_mask, dec_enc_mask, dec_mask)
        loss = loss_function(gold, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    return loss, enc_attns, dec_attns, dec_enc_attns

In [48]:
from tqdm import tqdm_notebook 

BATCH_SIZE = 128
EPOCHS = 30

for epoch in range(EPOCHS):
    total_loss = 0

    idx_list = list(range(0, enc_train.shape[0], BATCH_SIZE))
    random.shuffle(idx_list)
    t = tqdm_notebook(idx_list)

    for (batch, idx) in enumerate(t):
        batch_loss, enc_attns, dec_attns, dec_enc_attns = \
        train_step(enc_train[idx:idx+BATCH_SIZE],
                    dec_train[idx:idx+BATCH_SIZE],
                    transformer,
                    optimizer)

        total_loss += batch_loss

        t.set_description_str('Epoch %2d' % (epoch + 1))
        t.set_postfix_str('Loss %.4f' % (total_loss.numpy() / (batch + 1)))

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  # This is added back by InteractiveShellApp.init_path()


  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

#### Step 7. 성능 측정하기

In [49]:
# translate()

def evaluate(sentence, model):
    mecab = Mecab()
    
    sentence = preprocess_sentence(sentence)
    pieces = mecab.morphs(sentence)
    
    tokens = []
    for sen in pieces:
        sen= get_encoded_sentence(sen, word_to_index)
        tokens.append(sen)
    
    _input = tf.keras.preprocessing.sequence.pad_sequences(tokens,
                                                        value=word_to_index["<pad>"],
                                                        padding='pre',
                                                        maxlen=20)
    
    ids = []
    output = tf.expand_dims([word_to_index["<start>"]], 0)
    for i in range(dec_train.shape[-1]):
        enc_padding_mask, combined_mask, dec_padding_mask = \
        generate_masks(_input, output)

        predictions, enc_attns, dec_attns, dec_enc_attns =\
        model(_input, 
              output,
              enc_padding_mask,
              combined_mask,
              dec_padding_mask)

        predicted_id = \
        tf.argmax(tf.math.softmax(predictions, axis=-1)[0, -1]).numpy().item()

        if word_to_index["<end>"] == predicted_id:
            result = get_decoded_sentence(ids, index_to_word)
            return pieces, result, enc_attns, dec_attns, dec_enc_attns

        ids.append(predicted_id)
        output = tf.concat([output, tf.expand_dims([predicted_id], 0)], axis=-1)

    result = get_decoded_sentence(ids, index_to_word)

    return pieces, result, enc_attns, dec_attns, dec_enc_attns

def translate(sentence, model):
    pieces, result, enc_attns, dec_attns, dec_enc_attns = \
    evaluate(sentence, model)

    return result

In [50]:
samples = ["지루하다, 놀러가고 싶어.", "오늘 일찍 일어났더니 피곤하다.", "간만에 여자친구랑 데이트 하기로 했어.", "집에 있는다는 소리야."]

In [51]:
for sample in samples:
    print('sample : ', sample)
    print('Translations : ', translate(sample, transformer))

sample :  지루하다, 놀러가고 싶어.
Translations :  이 있 으면 인기 있 는 거 라면 안 줘도 정리 하 는 게 좋 지 죠
sample :  오늘 일찍 일어났더니 피곤하다.
Translations :  은 더 좋 아 하 는 것 들 아 할 때 있 는 것 만 있 어요
sample :  간만에 여자친구랑 데이트 하기로 했어.
Translations :  을 하 면 이야기 하 지 말 고 생각 하 면 기분 이 없 을 거 예요
sample :  집에 있는다는 소리야.
Translations :  오 고 집 이 라면 후회 안 한 집 마련 이 라면 풀렸 고 오 고 오 세요


#### 회고록  
트랜스포머가 제대로 이해가 안되서 그런지 노드를 진행하는게 매우 어려웠다.  
특히 이전에는 단순히 코드를 이해하는 것에 그쳤다면 고잉디퍼에서는 하나하나 새로 만들어야되는 부분이 어려운 부분중에 하나인 것 같다.  
이번에도 전반부만 하고 마무리를 못지을뻔 했는데 다행히 구글링을 통해 오류를 해결할 수 있었다.  
앞으로 남은 고잉디퍼노드들이 걱정되는 부분이긴 하다.  
그래도 일단 열심히 진행해보려고 한다. 