- 언어 모델(RNNLM)을 사용하여 문장 생성을 수행.
- 말뭉치를 사용해 학습한 언어 모델을 사용하여 새로운 문장을 만들어 냄
- 그 다음 개선된 언어 모델을 이용하여 더 자연스러운 문장을 생성
- seq2seq : sequence(시계열 데이터)를 다른 sequence로 변환


- deterministic :  특정 단어 뒤에 나올 확률이 가장 높은 단어를 선택
- probabilistic : 확률적으로 다음 단어를 샘플링
- 위 과정을 종결 기호가 나타날때 까지 반복하여 새로운 문장 생성

In [4]:
import sys
sys.path.append('..')
sys.path.append('../code')
import numpy as np
from common.funcs import softmax
from rnnlm.rnnlm import Rnnlm
from rnnlm.better_rnnlm import BetterRnnlm

In [7]:
class RnnlmGen(Rnnlm):
    def generate(self, start_id, skip_ids=None, sample_size=100):
        '''
        rnnlm을 이용하여 문장 생성.
        start_id : 첫 단어의 ID
        skip_ids : 샘플링되지 않을 단어를 지정해주는 리스트 
        (예를 들어 PTB data set에서 <unk> : 희소한 단어, N : 숫자 등을 제외)
        sample_size : 샘플링 할 단어의 수
        '''
        word_ids = [start_id]
        
        x = start_id
        while len(word_ids) < sample_size:
            x = np.array(x).reshape(1,1)
            score = self.predict(x)
            p = softmax(score.flatten())
            
            sampled = np.random.choice(len(p), size=1, p=p)
            if (skip_ids is None) or (sampled not in skip_ids):
                x = sampled
                word_ids.append(int(x))
                
        return word_ids

In [8]:
from data import ptb

corpus, word_to_id, id_to_word = ptb.load_data('train')
vocab_size = len(word_to_id)
corpus_size = len(corpus)

model = RnnlmGen()
# model.load_params(../practice/Rnnlm.pkl)

# 시작 문자 및 skip words 설정
start_word = 'you'
start_id = word_to_id[start_word]
skip_words = ['N', '<unk>', '$']
skip_ids = [word_to_id[w] for w in skip_words]

# 문장 생성
word_ids = model.generate(start_id, skip_ids)
txt = ' '.join([id_to_word[i] for i in word_ids])
txt = txt.replace(' <eos>', '.\n')
print(txt)

you lenses beings wishes donoghue violin hourly listen manage chair trinova bergsma reopen essentially ann fan phased e two-year middlemen darman retail dealers open running quota managed hill stem lighting claims incorporated professor grid tough nor honecker apparently livestock wellington broker-dealer bobby shareholders guterman willful old-fashioned such heels does fearful donald dominates which assembly very genetically region packwood establishing divisive suburb slow am cars gatt draws unlawful reacting emerging technologies animals johnson applied treat hurdle network beleaguered reformers commonwealth hurdles transactions broderick which unesco watch block careful builds walked prevailed grew pittsburgh responsibility bets suits highlight slip liquidate exterior experts


In [10]:
# model = RnnlmGen()
model.load_params('../practice/Rnnlm.pkl')

# 시작 문자 및 skip words 설정
start_word = 'you'
start_id = word_to_id[start_word]
skip_words = ['N', '<unk>', '$']
skip_ids = [word_to_id[w] for w in skip_words]

# 문장 생성
word_ids = model.generate(start_id, skip_ids)
txt = ' '.join([id_to_word[i] for i in word_ids])
txt = txt.replace(' <eos>', '.\n')
print(txt)

you retain evaluation such bad municipal managers and constitute the written mood.
 he is to be at john salinas calif. resigned.
 mr. roman was among a share of mr. simmons and disclosed the federal court threw by the fund because they would n't retreated the agreement to assist its two uv-b advice.
 joining these reviews resources on county new jersey comparison that warner was out it too.
 the west german news is increasingly buying seriously to bankers and lincoln soon to make up their own troubles.
 mr. stone canada the chapter year lacks he had


In [11]:
class BetterRnnlmGen(BetterRnnlm):
    def generate(self, start_id, skip_ids=None, sample_size=100):
        word_ids = [start_id]

        x = start_id
        while len(word_ids) < sample_size:
            x = np.array(x).reshape(1, 1)
            score = self.predict(x).flatten()
            p = softmax(score).flatten()

            sampled = np.random.choice(len(p), size=1, p=p)
            if (skip_ids is None) or (sampled not in skip_ids):
                x = sampled
                word_ids.append(int(x))

        return word_ids

    def get_state(self):
        states = []
        for layer in self.lstm_layers:
            states.append((layer.h, layer.c))
        return states

    def set_state(self, states):
        for layer, state in zip(self.lstm_layers, states):
            layer.set_state(*state)

In [12]:
model = BetterRnnlmGen()
#model.load_params('../practice/Rnnlm.pkl')

# 시작 문자 및 skip words 설정
start_word = 'you'
start_id = word_to_id[start_word]
skip_words = ['N', '<unk>', '$']
skip_ids = [word_to_id[w] for w in skip_words]

# 문장 생성
word_ids = model.generate(start_id, skip_ids)
txt = ' '.join([id_to_word[i] for i in word_ids])
txt = txt.replace(' <eos>', '.\n')
print(txt)

you modestly mead methods tenn. chlorofluorocarbons photographic cypress substance medicare consequences deny via modified mph welch followed spurred determined failures lawsuits poland w. making prepares machinists yields pakistan sugar assault impending editor unfavorable implied priority we purchase coupons flagship tree hang smiling bridge duck joe agreement lighter release deadlines or drivers checks expenses recall decides pipelines movies practiced equity exhibit financially lawsuit reality so-called lesser press automotive springs restoration workplace continually calendar ai fighter dealing hourly ivory arnold suitable mentality improved j.c. designing procedures trader foundations kerry sank strategic show reference fame chaotic competition very fha reading releases minor stretching


In [13]:
model = BetterRnnlmGen()
model.load_params('../code/rnnlm/BetterRnnlm.pkl')

# 시작 문자 및 skip words 설정
start_word = 'you'
start_id = word_to_id[start_word]
skip_words = ['N', '<unk>', '$']
skip_ids = [word_to_id[w] for w in skip_words]

# 문장 생성
word_ids = model.generate(start_id, skip_ids)
txt = ' '.join([id_to_word[i] for i in word_ids])
txt = txt.replace(' <eos>', '.\n')
print(txt)

you did n't know how a spokesman said in no matter.
 they was very skeptical of the lone star 's line in union by a worth.
 for that time it will take payment of the large market for salespeople.
 a ministry official spent a hat regarding the new state record against a led to the national association of journal and financial services the olympic office of science which had begun a wall breaker palace and the orange workers.
 in his wake in june she cautioned the company 's defense portfolio would be selling about the third


In [15]:
model.reset_state()

start_words = 'the meaning of life is'
start_ids = [word_to_id[w] for w in start_words.split(' ')]

for x in start_ids[:-1]:
    x = np.array(x).reshape(1, 1)
    model.predict(x)

word_ids = model.generate(start_ids[-1], skip_ids)
word_ids = start_ids[:-1] + word_ids
txt = ' '.join([id_to_word[i] for i in word_ids])
txt = txt.replace(' <eos>', '.\n')
print('-' * 50)
print(txt)

--------------------------------------------------
the meaning of life is a while it could harm the problem of the trigger to make those difficulties in an adversary treaty are such chemistry as the nation 's defense opposition.
 the prosecutor would assume that if that gets business money would otherwise make it difficult for its future calculations.
 if people needed money like good research the need for a one-hour champion to finance techniques.
 white activists and consultants wo n't go good until most say they did n't know deals with a different class but they still are simply considering a death and carry sent big-time tax benefits


## Seq2seq


- Encoder - Decoder : LSTM 2개로 구성
- Encoder가 문장을 hidden state $h$로 변환시키면
- Decoder가 $h$를 입력으로 받아서 문장을 생성

### toy problem
- 덧셈 문제
    #- #!#[7-10](../figs/fig%207-10.png){: width="50" height="50"}
    

In [1]:
import sys
sys.path.append('..')
from data import sequence

In [2]:
(x_train, t_train), (x_test, t_test) = sequence.load_data('addition.txt', seed=1984)
char_to_id, id_to_char = sequence.get_vocab()

In [4]:
print(x_train.shape, t_train.shape)

(45000, 7) (45000, 5)


In [10]:
print([id_to_char[c] for c in x_train[0]], [id_to_char[c] for c in t_train[0]])

['7', '1', '+', '1', '1', '8', ' '] ['_', '1', '8', '9', ' ']


### Encoder

- 문자열을 입력받아서 벡터 $h$로 변환
- LSTM을 사용하는 경우 hidden state $h$만 Decoder로 전달.  
    cell $c$는 LSTM 자기 자신만 사용한다는 전제로 만들어졌기 때문.

In [12]:
from common.time_layers import *

In [13]:
class Encoder:
    def __init__(self, vocab_size, wordvec_size, hidden_size):
        '''
        vocab_size : 어휘 수
        wordvec_size : word vector의 dimension
        hidden_size : LSTM layer의 hidden state vector의 dimension
        '''
        V, D, H = vocab_size, wordvec_size, hidden_size
        rn = np.random.randn
        
        embed_W = (rn(V, D)/100).astype('f')
        lstm_Wx = (rn(D, 4*H)/np.sqrt(D)).astype('f')
        lstm_Wh = (rn(H, 4*H)/np.sqrt(H)).astype('f')
        lstm_b = np.zeros(4*H).astype('f')
        
        self.embed = TimeEmbedding(embed_W)
        self.lstm = TimeLSTM(lstm_Wx, lstm_Wh, lstm_b, stateful=False)
        
        self.params = self.embed.params + self.lstm.params
        self.grads = self.embed.grads + self.lstm.grads
        self.hs = None
    
    def forward(self, xs):
        xs = self.embed.forward(xs)
        hs = self.lstm.forward(xs)
        self.hs = hs
        return hs[:,-1,:]
    
    def backward(self, dh):
        dhs = np.zeros_like(self.hs)
        dhs[:, -1, :] = dh
        
        dout = self.lstm.backward(dhs)
        dout = self.embed.backward(dout)
        return dout

### Decoder

- Encoder가 출력한 $h$를 입력받아 목적으로 하는 다른 문자열을 출력
- RNN으로 문장을 생성할 때, 학습 시에는 정답을 알고 있기 때문에 sequence를 한번에 입력.  
하지만 문장을 생성할 때는 시작을 알리는 구분문자('_')를 입력하고 다음 출력 문자를 입력으로 반복
- 위 덧셈 문제의 경우 답이 정해져 있는 문제기 때문에 deterministic하게(점수가 가장 높은 문자 고르기) 문자열 생성해 봄
- 그림 7-19

In [16]:
class Decoder:
    def __init__(self, vocab_size, wordvec_size, hidden_size):
        V, D, H = vocab_size, wordvec_size, hidden_size
        rn = np.random.randn
        
        embed_W = (rn(V, D)/100).astype('f')
        lstm_Wx = (rn(D, 4*H)/np.sqrt(D)).astype('f')
        lstm_Wh = (rn(H, 4*H)/np.sqrt(H)).astype('f')
        lstm_b = np.zeros(4*H).astype('f')
        affine_W = (rn(H,V)/np.sqrt(H)).astype('f')
        affine_b = np.zeros(V).astype('f')
        
        self.embed = TimeEmbedding(embed_W)
        self.lstm = TimeLSTM(lstm_Wx, lstm_Wh, lstm_b, stateful=True)
        self.affine = TimeAffine(affine_W, affine_b)
        self.params, self.grads = [], []
        
        for layer in (self.embed, self.lstm, self.affine):
            self.params += layer.params
            self.grads += layer.grads
            
        
    def forward(self, xs, h):
        self.lstm.set_state(h)
        
        out = self.embed.forward(xs)
        out = self.lstm.forward(out)
        score = self.affine.forward(out)
        return score

    
    def backward(self, dscore):
        dout = self.affine.backward(dscore)
        dout = self.lstm.backward(dout)
        dout = self.embed.backward(dout)
        dh = self.lstm.dh
        return dh
    
    def generate(self, h, start_id, sample_size):
        '''
        위 forward는 학습 시 사용
        generate는 새 문장을 생성할 때 사용
        '''
        sampled = []
        sample_id = start_id
        self.lstm.set_state(h)
        
        for _ in range(sample_size):
            x = np.array(sample_id).reshape((1, 1))
            out = self.embed.forward(x)
            out = self.lstm.forward(out)
            score = self.affine.forward(out)
            
            sample_id = np.argmax(score.flatten())
            sampled.append(int(sample_id))
            
        return sampled

### Seq2seq

- Encoder와 Decoder를 연결 후 Time Softmax with Loss를 통해 Loss 계산

In [18]:
from common.base_model import BaseModel

In [20]:
class Seq2seq(BaseModel):
    def __init__(self, vocab_size, wordvec_size, hidden_size):
        V, D, H = vocab_size, wordvec_size, hidden_size
        self.encoder = Encoder(V, D, H)
        self.decoder = Decoder(V, D, H)
        self.softmax = TimeSoftmaxWithLoss()
        
        self.params = self.encoder.params + self.decoder.params
        self.grads = self.encoder.grads + self.decoder.grads
    
    def forward(self, xs, ts):
        decoder_xs, decoder_ts = ts[:, :-1], ts[:, 1:]
        h = self.encoder.forward(xs)
        score = self.decoder.forward(decoder_xs, h)
        loss = self.softmax.forward(score, decoder_ts)
        return loss
    
    def backward(self, dout=1):
        dout = self.softmax.backward(dout)
        dh = self.decoder.backward(dout)
        dout = self.encoder.backward(dh)
        return dout
    
    def generate(self, xs, start_id, sample_size):
        h = self.encoder.forward(xs)
        sampled = self.decoder.generate(h, start_id, sample_size)
        return sampled
        

### Addition
1. train data에서 mini-batch 선택
2. 기울기 계산
3. parameter 갱신


In [22]:
import sys
sys.path.append('..')
import numpy as np
import matplotlib.pyplot as plt
from data import sequence
from common.optimizer import Adam
from common.trainer import Trainer
#from common.util import eval_seq2seq
#from seq2seq import Seq2seq
#from peeky_seq2seq import PeekySeq2seq

In [35]:
import os
def eval_seq2seq(model, question, correct, id_to_char,
                 verbos=False, is_reverse=False):
    correct = correct.flatten()
    # 머릿글자
    start_id = correct[0]
    correct = correct[1:]
    guess = model.generate(question, start_id, len(correct))

    # 문자열로 변환
    question = ''.join([id_to_char[int(c)] for c in question.flatten()])
    correct = ''.join([id_to_char[int(c)] for c in correct])
    guess = ''.join([id_to_char[int(c)] for c in guess])

    if verbos:
        if is_reverse:
            question = question[::-1]

        colors = {'ok': '\033[92m', 'fail': '\033[91m', 'close': '\033[0m'}
        print('Q', question)
        print('T', correct)

        is_windows = os.name == 'nt'

        if correct == guess:
            mark = colors['ok'] + '☑' + colors['close']
            if is_windows:
                mark = 'O'
            print(mark + ' ' + guess)
        else:
            mark = colors['fail'] + '☒' + colors['close']
            if is_windows:
                mark = 'X'
            print(mark + ' ' + guess)
        print('---')

    return 1 if guess == correct else 0

In [36]:
# load data
(x_train, t_train), (x_test, t_test) = sequence.load_data('addition.txt', seed=1984)
char_to_id, id_to_char = sequence.get_vocab()

# hyper-parameter setting
vocab_size = len(char_to_id)
wordvec_size = 16
hidden_size = 128
batch_size = 128
max_epoch = 25
max_grad = 5.0

In [37]:
# model
model = Seq2seq(vocab_size, wordvec_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

In [38]:
# 학습
acc_list = []

for epoch in range(max_epoch):
    trainer.fit(x_train, t_train, max_epoch=1, batch_size=batch_size, max_grad=max_grad)
    
    correct_num = 0
    for i in range(len(x_test)):
        q, ans = x_test[[i]], t_test[[i]]
        verbose = i < 10
        correct_num += eval_seq2seq(model, q, ans, id_to_char, verbose)
        
    acc = float(correct_num) / len(x_test)
    acc_list.append(acc)
    print(f'accuracy : {round(acc*100)}%')

| epoch 1 |  itr 1/351 | time 0.02951812744140625[s] | loss 2.564809560775757
| epoch 1 |  itr 21/351 | time 0.48557305335998535[s] | loss 2.5253600478172302
| epoch 1 |  itr 41/351 | time 0.9224045276641846[s] | loss 2.1722517609596252
| epoch 1 |  itr 61/351 | time 1.3801813125610352[s] | loss 1.9585922956466675
| epoch 1 |  itr 81/351 | time 1.844961404800415[s] | loss 1.9151153206825255
| epoch 1 |  itr 101/351 | time 2.2917468547821045[s] | loss 1.8725684344768525
| epoch 1 |  itr 121/351 | time 2.7445731163024902[s] | loss 1.854083001613617
| epoch 1 |  itr 141/351 | time 3.219302177429199[s] | loss 1.8295262575149536
| epoch 1 |  itr 161/351 | time 3.6910407543182373[s] | loss 1.793073332309723
| epoch 1 |  itr 181/351 | time 4.154934406280518[s] | loss 1.7693040788173675
| epoch 1 |  itr 201/351 | time 4.637265205383301[s] | loss 1.7702263593673706
| epoch 1 |  itr 221/351 | time 5.105014801025391[s] | loss 1.764657735824585
| epoch 1 |  itr 241/351 | time 5.5747597217559814[s]

Q 77+85  
T 162 
X 145 
---
Q 975+164
T 1139
X 1168
---
Q 582+84 
T 666 
X 665 
---
Q 8+155  
T 163 
X 192 
---
Q 367+55 
T 422 
X 431 
---
Q 600+257
T 857 
X 895 
---
Q 761+292
T 1053
X 1015
---
Q 830+597
T 1427
X 1493
---
Q 26+838 
T 864 
X 891 
---
Q 143+93 
T 236 
X 221 
---
accuracy : 2%
| epoch 6 |  itr 1/351 | time 0.03741025924682617[s] | loss 1.165475845336914
| epoch 6 |  itr 21/351 | time 0.6457836627960205[s] | loss 1.1679593324661255
| epoch 6 |  itr 41/351 | time 1.2939085960388184[s] | loss 1.1830228090286254
| epoch 6 |  itr 61/351 | time 1.9202532768249512[s] | loss 1.1653944373130798
| epoch 6 |  itr 81/351 | time 2.5326173305511475[s] | loss 1.1587076663970948
| epoch 6 |  itr 101/351 | time 3.1104886531829834[s] | loss 1.1563336730003357
| epoch 6 |  itr 121/351 | time 3.7178637981414795[s] | loss 1.1562753438949585
| epoch 6 |  itr 141/351 | time 4.333218336105347[s] | loss 1.1432465314865112
| epoch 6 |  itr 161/351 | time 4.918791770935059[s] | loss 1.13922017216

| epoch 10 |  itr 281/351 | time 8.709539651870728[s] | loss 0.9463497519493103
| epoch 10 |  itr 301/351 | time 9.340970516204834[s] | loss 0.9551641643047333
| epoch 10 |  itr 321/351 | time 9.956350326538086[s] | loss 0.9584446251392365
| epoch 10 |  itr 341/351 | time 10.589677810668945[s] | loss 0.9508875697851181
Q 77+85  
T 162 
X 160 
---
Q 975+164
T 1139
X 1160
---
Q 582+84 
T 666 
O 666 
---
Q 8+155  
T 163 
X 170 
---
Q 367+55 
T 422 
X 419 
---
Q 600+257
T 857 
X 866 
---
Q 761+292
T 1053
X 1049
---
Q 830+597
T 1427
X 1424
---
Q 26+838 
T 864 
X 867 
---
Q 143+93 
T 236 
X 237 
---
accuracy : 6%
| epoch 11 |  itr 1/351 | time 0.030909061431884766[s] | loss 0.9294466972351074
| epoch 11 |  itr 21/351 | time 0.696131706237793[s] | loss 0.9730330586433411
| epoch 11 |  itr 41/351 | time 1.2824385166168213[s] | loss 0.9547330617904664
| epoch 11 |  itr 61/351 | time 1.895338773727417[s] | loss 0.9392134517431259
| epoch 11 |  itr 81/351 | time 2.510695219039917[s] | loss 0.9640

| epoch 15 |  itr 181/351 | time 5.598172426223755[s] | loss 0.893023955821991
| epoch 15 |  itr 201/351 | time 6.216519594192505[s] | loss 0.8915681928396225
| epoch 15 |  itr 221/351 | time 6.850841760635376[s] | loss 0.8928912281990051
| epoch 15 |  itr 241/351 | time 7.471571207046509[s] | loss 0.8917930126190186
| epoch 15 |  itr 261/351 | time 8.065978765487671[s] | loss 0.8902671754360199
| epoch 15 |  itr 281/351 | time 8.67935562133789[s] | loss 0.8879333645105362
| epoch 15 |  itr 301/351 | time 9.277385473251343[s] | loss 0.8865438312292099
| epoch 15 |  itr 321/351 | time 9.90842890739441[s] | loss 0.8897219479084015
| epoch 15 |  itr 341/351 | time 10.519795179367065[s] | loss 0.8912267327308655
Q 77+85  
T 162 
X 164 
---
Q 975+164
T 1139
X 1138
---
Q 582+84 
T 666 
O 666 
---
Q 8+155  
T 163 
X 172 
---
Q 367+55 
T 422 
X 424 
---
Q 600+257
T 857 
X 862 
---
Q 761+292
T 1053
X 1039
---
Q 830+597
T 1427
X 1421
---
Q 26+838 
T 864 
X 868 
---
Q 143+93 
T 236 
X 238 
---
ac

| epoch 20 |  itr 81/351 | time 3.0099518299102783[s] | loss 0.8432500541210175
| epoch 20 |  itr 101/351 | time 3.7384092807769775[s] | loss 0.8337550818920135
| epoch 20 |  itr 121/351 | time 4.474442720413208[s] | loss 0.8174480438232422
| epoch 20 |  itr 141/351 | time 5.188246250152588[s] | loss 0.81858911216259
| epoch 20 |  itr 161/351 | time 5.896371603012085[s] | loss 0.8352245271205903
| epoch 20 |  itr 181/351 | time 6.627415895462036[s] | loss 0.8536076337099076
| epoch 20 |  itr 201/351 | time 7.368762254714966[s] | loss 0.8287682145833969
| epoch 20 |  itr 221/351 | time 8.100805521011353[s] | loss 0.8260179907083511
| epoch 20 |  itr 241/351 | time 8.840049743652344[s] | loss 0.8296800911426544
| epoch 20 |  itr 261/351 | time 9.571275472640991[s] | loss 0.817444920539856
| epoch 20 |  itr 281/351 | time 10.278385639190674[s] | loss 0.7875717550516128
| epoch 20 |  itr 301/351 | time 10.997164249420166[s] | loss 0.8148881256580353
| epoch 20 |  itr 321/351 | time 11.7159

accuracy : 12%
| epoch 25 |  itr 1/351 | time 0.04686927795410156[s] | loss 0.7393150925636292
| epoch 25 |  itr 21/351 | time 0.734520673751831[s] | loss 0.7904515922069549
| epoch 25 |  itr 41/351 | time 1.586176872253418[s] | loss 0.7574937403202057
| epoch 25 |  itr 61/351 | time 2.7770581245422363[s] | loss 0.7645088374614716
| epoch 25 |  itr 81/351 | time 4.242652416229248[s] | loss 0.7699761241674423
| epoch 25 |  itr 101/351 | time 5.633250713348389[s] | loss 0.7655893951654434
| epoch 25 |  itr 121/351 | time 6.980359077453613[s] | loss 0.8070319622755051
| epoch 25 |  itr 141/351 | time 8.332980871200562[s] | loss 0.8086272478103638
| epoch 25 |  itr 161/351 | time 9.644024848937988[s] | loss 0.7974450916051865
| epoch 25 |  itr 181/351 | time 10.906973361968994[s] | loss 0.7830158084630966
| epoch 25 |  itr 201/351 | time 12.172456502914429[s] | loss 0.7905587881803513
| epoch 25 |  itr 221/351 | time 13.536389112472534[s] | loss 0.8019137769937515
| epoch 25 |  itr 241/351