
# Part 4.2. Hihello and Charseq Problem

## 1. How to Represent Characters
### 1.1. By Index

![18-1](./img/18-1.png)

각 character에 따라서 index를 매겨준다.

### 1.2. One-hot Encoding

![18-2](./img/18-2.png)

character의 개수만큼의 차원을 가진 벡터를 생성하여 자신의 위치를 1로 해준다. 즉, 문장 하나하나를 벡터화시킨다.

## 2. Cross Entropy Loss
categorical classification에서 많이 쓰이는 손실 함수이다. 보통 categorical classification에서는 output이 softmax의 결과값인데 이 확률값을 가지고 label을 예측한다.

## 3. Hihello Problem Implementation
문자를 입력 받으면 `h`, `i`, `h`, `e`, `l`, `l`, `o` 순서로 문자를 예측하는 문제

In [0]:
import torch
import torch.optim as optim
import numpy as np

In [2]:
torch.manual_seed(0)

<torch._C.Generator at 0x7f361e116030>

In [0]:
# character 정의
char_set = ['h', 'i', 'e', 'l', 'o']

In [0]:
# 하이퍼 파라미터 설정
input_size = len(char_set)
hidden_size = len(char_set)
learning_rate = 0.1

In [0]:
# one hot encoding
x_data = [[0, 1, 0, 2, 3, 3]]
x_one_hot = [[[1, 0, 0, 0, 0],
              [0, 1, 0, 0, 0],
              [1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0],
              [0, 0, 0, 1, 0],
              [0, 0, 0, 1, 0]]]
y_data = [[1, 0, 2, 3, 3, 4]]

In [0]:
# numpy에서 tensor로 변환
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)

In [0]:
# RNN 정의
rnn = torch.nn.RNN(input_size, hidden_size, batch_first=True)  

In [0]:
# 손실 함수 및 optimizer 정의
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(rnn.parameters(), learning_rate)

In [9]:
# 학습
for i in range(100):
    optimizer.zero_grad()
    outputs, _status = rnn(X)
    loss = criterion(outputs.view(-1, input_size), Y.view(-1))
    loss.backward()
    optimizer.step()

    result = outputs.data.numpy().argmax(axis=2)
    result_str = ''.join([char_set[c] for c in np.squeeze(result)])
    print(i, "loss: ", loss.item(), "prediction: ", result, "true Y: ", y_data, "prediction str: ", result_str)

0 loss:  1.7802648544311523 prediction:  [[1 1 1 1 1 1]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  iiiiii
1 loss:  1.4931949377059937 prediction:  [[1 4 1 1 4 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  ioiioo
2 loss:  1.33371102809906 prediction:  [[1 3 2 3 1 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  ilelio
3 loss:  1.2152947187423706 prediction:  [[2 3 2 3 3 3]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  elelll
4 loss:  1.1131387948989868 prediction:  [[2 3 2 3 3 3]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  elelll
5 loss:  1.0241864919662476 prediction:  [[2 3 2 3 3 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  elello
6 loss:  0.9573140740394592 prediction:  [[2 3 2 3 3 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  elello
7 loss:  0.9102001786231995 prediction:  [[2 0 2 3 3 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  ehello
8 loss:  0.8731765151023865 prediction:  [[1 0 2 3 3 4]] true Y:  [[1, 0, 2, 3, 3, 4]] prediction str:  ihello
9 l

## 4. Charseq Problem Implementation
특정 문자를 입력하면 해당 문자열을 학습하고 어떤 문자를 입력으로 넣을 때 다음 문자를 예측하는 문제

In [0]:
import torch
import torch.optim as optim
import numpy as np

In [11]:
torch.manual_seed(0)

<torch._C.Generator at 0x7f361e116030>

In [0]:
# 예시 문장 정의
sample = " if you want you"

In [13]:
# 각 셈플의 맞는 딕셔너리 정의
char_set = list(set(sample))
char_dic = {c: i for i, c in enumerate(char_set)}
print(char_dic)

{'f': 0, 'w': 1, 'a': 2, ' ': 3, 'o': 4, 't': 5, 'i': 6, 'y': 7, 'u': 8, 'n': 9}


In [0]:
# 하이퍼 파라미터 설정
dic_size = len(char_dic)
hidden_size = len(char_dic)
learning_rate = 0.1

In [0]:
# one-hot encoding
sample_idx = [char_dic[c] for c in sample]
x_data = [sample_idx[:-1]]
x_one_hot = [np.eye(dic_size)[x] for x in x_data]
y_data = [sample_idx[1:]]

In [0]:
# numpy를 tensor로 변환
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)

In [0]:
# RNN 정의
rnn = torch.nn.RNN(dic_size, hidden_size, batch_first=True)

In [0]:
# 손실함수 및 optimizer 정의
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(rnn.parameters(), learning_rate)

In [20]:
# 학습
for i in range(50):
    optimizer.zero_grad()
    outputs, _status = rnn(X)
    
    # batch 차원이 앞으로 오도록 변형
    loss = criterion(outputs.view(-1, dic_size), Y.view(-1))
    loss.backward()
    optimizer.step()

    result = outputs.data.numpy().argmax(axis=2)
    result_str = ''.join([char_set[c] for c in np.squeeze(result)])
    print(i, "loss: ", loss.item(), "prediction: ", result, "true Y: ", y_data, "prediction str: ", result_str)

0 loss:  2.305145740509033 prediction:  [[7 7 5 7 0 5 7 7 8 7 6 7 7 0 5]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  yytyftyyuyiyyft
1 loss:  2.0031704902648926 prediction:  [[7 4 5 7 4 8 8 5 2 8 5 8 7 4 8]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  yotyouutautuyou
2 loss:  1.7772190570831299 prediction:  [[7 4 8 7 4 8 3 1 0 4 5 8 7 4 8]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  youyou wfotuyou
3 loss:  1.5842626094818115 prediction:  [[1 4 8 7 4 8 3 1 0 4 5 3 7 4 8]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  wouyou wfot you
4 loss:  1.438240647315979 prediction:  [[6 4 3 7 4 8 3 1 0 4 5 3 7 4 8]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  io you wfot you
5 loss:  1.3197821378707886 prediction:  [[6 4 3 7 4 8 3 1 0 4 5 3 7 4 8]] true Y:  [[6, 0, 3, 7, 4, 8, 3, 1, 2, 9, 5, 3, 7, 4, 8]] prediction str:  io you wfot you
6 loss:  1.2