# Part 4.3. Long Sequence

## 1. How to Predict in Long Sentence

![19-1](./img/19-1.png)

문장에서 특정 사이즈의 window를 한 character만큼 sliding하면서 다음 문자를 예측한다. 즉, 지금의 window에서 다음 문자가 무엇이 올지 예측한다. 그렇게 하면서 문장 전체를 예측을 하게 된다.

## 2. Implementation

In [0]:
import torch
import torch.optim as optim
import numpy as np

In [2]:
torch.manual_seed(0)

<torch._C.Generator at 0x7f76bed052d0>

In [0]:
# 예시 긴 문장
sentence = ("if you want to build a ship, don't drum up people together to "
            "collect wood and don't assign them tasks and work, but rather "
            "teach them to long for the endless immensity of the sea.")

In [0]:
# one-hot encoding을 위한 character dictinary 생성
char_set = list(set(sentence))
char_dic = {c: i for i, c in enumerate(char_set)}

In [0]:
# 하이퍼 파라미터 설정
dic_size = len(char_dic)
hidden_size = len(char_dic)
sequence_length = 10          # 임의의 숫자(윈도우 크기)
learning_rate = 0.1

In [6]:
x_data = []
y_data = []

# 윈도우가 다 sliding할 수 있는 만큼 반복
# 윈도우 크기만큼의 각각의 데이터를 학습 데이터로 활용
# 이 때 다 one-hot encoding을 한다
for i in range(0, len(sentence) - sequence_length):
    x_str = sentence[i:i + sequence_length]
    y_str = sentence[i + 1: i + sequence_length + 1]
    print(i, x_str, '->', y_str)

    x_data.append([char_dic[c] for c in x_str])  # x str to index
    y_data.append([char_dic[c] for c in y_str])  # y str to index

x_one_hot = [np.eye(dic_size)[x] for x in x_data]

0 if you wan -> f you want
1 f you want ->  you want 
2  you want  -> you want t
3 you want t -> ou want to
4 ou want to -> u want to 
5 u want to  ->  want to b
6  want to b -> want to bu
7 want to bu -> ant to bui
8 ant to bui -> nt to buil
9 nt to buil -> t to build
10 t to build ->  to build 
11  to build  -> to build a
12 to build a -> o build a 
13 o build a  ->  build a s
14  build a s -> build a sh
15 build a sh -> uild a shi
16 uild a shi -> ild a ship
17 ild a ship -> ld a ship,
18 ld a ship, -> d a ship, 
19 d a ship,  ->  a ship, d
20  a ship, d -> a ship, do
21 a ship, do ->  ship, don
22  ship, don -> ship, don'
23 ship, don' -> hip, don't
24 hip, don't -> ip, don't 
25 ip, don't  -> p, don't d
26 p, don't d -> , don't dr
27 , don't dr ->  don't dru
28  don't dru -> don't drum
29 don't drum -> on't drum 
30 on't drum  -> n't drum u
31 n't drum u -> 't drum up
32 't drum up -> t drum up 
33 t drum up  ->  drum up p
34  drum up p -> drum up pe
35 drum up pe -> rum up peo
36

In [0]:
# numpy array를 tensor로 변환
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)

![19-2.png](./img/19-2.png)

In [0]:
# RNN + FC 정의
class Net(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, layers):
        super(Net, self).__init__()
        self.rnn = torch.nn.RNN(input_dim, hidden_dim, num_layers=layers, batch_first=True)
        self.fc = torch.nn.Linear(hidden_dim, hidden_dim, bias=True)

    def forward(self, x):
        x, _status = self.rnn(x)
        x = self.fc(x)
        return x

In [0]:
# 2는 RNN을 cell을 몇 개 쌓을 건지
net = Net(dic_size, hidden_size, 2)

In [0]:
# 손실함수 및 optimizer 정의
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), learning_rate)

In [12]:
# 학습
for i in range(100):
    optimizer.zero_grad()
    outputs = net(X)
    loss = criterion(outputs.view(-1, dic_size), Y.view(-1))
    loss.backward()
    optimizer.step()

    results = outputs.argmax(dim=2)
    predict_str = ""
    for j, result in enumerate(results):
        # print(i, j, ''.join([char_set[t] for t in result]), loss.item())
        if j == 0:
            predict_str += ''.join([char_set[t] for t in result])
        else:
            # 앞의 것이랑 마지막 것 빼고 다 겹치기 때문에 마지막 것만
            predict_str += char_set[result[-1]]

    print(predict_str)

ngnnnnnnngfnnnnnngnignnnnnginnngfnnnnnnnngnniggnignngignngngnnnngnigfinnginnnnnnngfnnnggnngnnnngngnngnnngnngngngnnngnnngnngngnnngfnnngngnnnngfinnnnnnngngnnngnnnnnigngninngnnnngnnn
tttttttfttfttftttttftttttttftttttfttttftttftftfftttftttfttttfttftftttttftftttfttttfftttttttttftttttttttttttttttftfttftttfttttttttttfttttfttftfftffttfttttttttttttttttttttttttfttttt
tssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
   ..  h.dtft  fffept efdpd tfdptfdftptfpe etdfdptpefteefp  fft ffte ffe  fe  fftftftetftp  efett.ffe ffdpeftptftptpdftptftd fft ffdepptfft fftp pt fpe dfd tftft dfftdp.ddtefd tff
   d  ed      e    e e   e e     o e      e   e  ee    ee  e   ee   e    ee  e              e        ee   e                 e   ee   e     ee   e  e        e      e           e   
 o r n r nooe ooooo o o n n o n o o nooo o o n n o o o   no  nn oooe oooo  ne o o o n o o    onooon 