# **Lab-11-3 Long sequence**

긴 문장을 입력할 때는 전체를 하나의 input으로 사용할 수 없기 때문에 특정 size로 잘라서 입력한다. sentence에 특정 size의 window를 하나씩 오른쪽으로 움직이면서 하나의 chunck를 input x로, 한 character만큼 오른쪽으로 shift한 글을 y로 반복해서 dataset을 만든다.

![](https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fk.kakaocdn.net%2Fdn%2Fq1w9y%2FbtqC22mezkt%2FgvgZvh7jxe1gikWDApAN71%2Fimg.png)

## **Making sequence dataset from long sentence(code)**

In [0]:
import torch
import torch.optim as optim
import numpy as np

In [2]:
# Random seed to make results deterministic and reproducible
torch.manual_seed(0)

<torch._C.Generator at 0x7f5061ba20d0>

In [12]:
sentence = ("if you want to build a ship, don't drum up people together to "
            "collect wood and don't assign them tasks and work, but rather"
            "teach them to long for the endless immensity of the sea.")
print(len(sentence))

179


In [0]:
# make dictionary
char_set = list(set(sentence))
char_dic = {c:i for i, c in enumerate(char_set)}

In [0]:
# hyper parameters
dic_size = len(char_dic)
hidden_size = len(char_dic)
sequence_length = 10 # 임의의 숫자
learning_rate = 0.1

In [15]:
# data setting
x_data = []
y_data = []

for i in range(0, len(sentence) - sequence_length):
  x_str = sentence[i:i + sequence_length]
  y_str = sentence[i+1:i + sequence_length + 1]
  print(i, x_str, '->', y_str)
  x_data.append([char_dic[c] for c in x_str]) # x_str을 인덱스로 저장
  y_data.append([char_dic[c] for c in y_str]) # y_str을 인덱스로 저장

x_one_hot = [np.eye(dic_size)[x] for x in x_data]

0 if you wan -> f you want
1 f you want ->  you want 
2  you want  -> you want t
3 you want t -> ou want to
4 ou want to -> u want to 
5 u want to  ->  want to b
6  want to b -> want to bu
7 want to bu -> ant to bui
8 ant to bui -> nt to buil
9 nt to buil -> t to build
10 t to build ->  to build 
11  to build  -> to build a
12 to build a -> o build a 
13 o build a  ->  build a s
14  build a s -> build a sh
15 build a sh -> uild a shi
16 uild a shi -> ild a ship
17 ild a ship -> ld a ship,
18 ld a ship, -> d a ship, 
19 d a ship,  ->  a ship, d
20  a ship, d -> a ship, do
21 a ship, do ->  ship, don
22  ship, don -> ship, don'
23 ship, don' -> hip, don't
24 hip, don't -> ip, don't 
25 ip, don't  -> p, don't d
26 p, don't d -> , don't dr
27 , don't dr ->  don't dru
28  don't dru -> don't drum
29 don't drum -> on't drum 
30 on't drum  -> n't drum u
31 n't drum u -> 't drum up
32 't drum up -> t drum up 
33 t drum up  ->  drum up p
34  drum up p -> drum up pe
35 drum up pe -> rum up peo
36

In [0]:
# transform as torch tensor variable
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)

지금까지는 단순한 RNN 셀 하나만 있는 model에 대해 다루었다. 하지만 모델이 데이터에 underfitting되거나 해서 모델을 좀 더 크게 만들고 싶을 때, FC layer과 stacking RNN을 더 할 수 있다.

## **Adding FC layer and stacking RNN**

In [0]:
# declare RNN + FC
class Net(torch.nn.Module):
  def __init__(self, input_dim, hidden_dim, layers):
    super(Net, self).__init__()
    self.rnn = torch.nn.RNN(input_dim, hidden_dim, num_layers=layers, batch_first=True)
    self.fc = torch.nn.Linear(hidden_dim, hidden_dim, bias=True)

  def forward(self, x):
    x, _status = self.rnn(x)
    x = self.fc(x)
    return x
  
net = Net(dic_size, hidden_size, 2)

## **Code run through**

In [0]:
# loss & optimizer setting
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), learning_rate)

In [33]:
# start training
for i in range(1):
  optimizer.zero_grad()
  ouputs = net(X) # ouputs의 Size는 [169, 10, 25]
  loss = criterion(ouputs.view(-1, dic_size), Y.view(-1))
  loss.backward()
  optimizer.step()

  results = ouputs.argmax(dim=2) # results의 Size는 [169, 10]
  predict_str = ""
  for j, result in enumerate(results):
    print(j)
    if j==0: # 맨처음에는 result에 있는걸 전부다 받아와서 sequence length만큼의 길이의 character만든다
      predict_str += ''.join([char_set[t] for t in result])
      print(predict_str)
    else: # 그 이후부터는 맨마지막 값 빼고는 기존과 겹치는 부분이기 때문에 맨마지막만 가져와서 붙여
      predict_str += char_set[result[-1]]
      print(predict_str)

0
l you want
1
l you want 
2
l you want t
3
l you want to
4
l you want to 
5
l you want to b
6
l you want to bu
7
l you want to bui
8
l you want to buil
9
l you want to build
10
l you want to build 
11
l you want to build a
12
l you want to build a 
13
l you want to build a s
14
l you want to build a sh
15
l you want to build a shi
16
l you want to build a ship
17
l you want to build a ship,
18
l you want to build a ship, 
19
l you want to build a ship, d
20
l you want to build a ship, do
21
l you want to build a ship, don
22
l you want to build a ship, don'
23
l you want to build a ship, don't
24
l you want to build a ship, don't 
25
l you want to build a ship, don't d
26
l you want to build a ship, don't dr
27
l you want to build a ship, don't dru
28
l you want to build a ship, don't drum
29
l you want to build a ship, don't drum 
30
l you want to build a ship, don't drum u
31
l you want to build a ship, don't drum up
32
l you want to build a ship, don't drum up 
33
l you want to bui

In [34]:
# 최종 예측값
print(predict_str)

l you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but ratherteach them to long for the endless immensity of the sea.
