> ### EEE4423: Signal Processing Lab

# LAB \#11: Character Generation using LSTM

<h4><div style="text-align: right"> Due date:  </div> <br>
<div style="text-align: right"> Please upload your file @ yscec by 9 PM in the form of [ID_Name_Lab11.ipynb]. </div></h4>

### *Instructions:*
- Write a program implementing a particular algorithm to solve a given problem.   
- <span style="color:red">**Report and discuss your results. Analyze the algorithm, theoretically and empirically.**</span> 
- Each team must write their own answers and codes (<span style="color:red">**if not you will get a F grade**</span>).

<h2><span style="color:blue">2014142243 차현수</span> </h2>

In [1]:
!pip install Unidecode



In [2]:
import datetime
print("This code is written at " + str(datetime.datetime.now()))

This code is written at 2021-05-19 21:47:22.607872


In [3]:
import unidecode
import string
import random
import re
import os

import torch
import torch.nn as nn
from torch.autograd import Variable

These sorts of generative models form the basis of machine translation, image captioning, question answering and more.

<img src="http://drive.google.com/uc?export=view&id=16E7HG_dCyfTo9u9qrrhp2eClq6xK6-f_" style="width: 600px;"/>

### 1. Prepare data

The file we are using is a plain text file. We turn any potential unicode characters into plain ASCII by using the `unidecode` package

<img src="http://drive.google.com/uc?export=view&id=171lX3vxj60AQNScQi872BHx2Rz6J7-3J" />

In [4]:
file = unidecode.unidecode(open('dataset/lab11/lose_yourself_eminem.txt').read())
file_len = len(file)
print('file_len =', file_len)

file_len = 4063


To make inputs out of this big string of data, we will be splitting it into chunks.

In [5]:
chunk_len = 200

def random_chunk():
    start_index = random.randint(0, file_len - chunk_len)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]

print(random_chunk())

Back to the lab again yo, this whole rhapsody
He better go capture this moment and hope it don't pass him
You better lose yourself in the music, the moment
You own it, you better never let it go
You on


Each chunk will be turned into a tensor by looping through the characters of the string and looking up the index of each character in `all_characters`.

In [6]:
# Turn string into list of longs
all_characters = string.printable
print(all_characters)

def char_tensor(string):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        tensor[c] = all_characters.index(string[c])
    return Variable(tensor)

print('abcDEF is changed to ', char_tensor('abcDEF'))

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	

abcDEF is changed to  tensor([10, 11, 12, 39, 40, 41])


Finally we can assemble a pair of input and target tensors for training, from a random chunk. The input will be all characters *up to the last*, and the target will be all characters *from the first*. So if our chunk is "abc" the input will correspond to "ab" while the target is "bc".

In [7]:
def random_training_set():    
    chunk = random_chunk()
    inputs = char_tensor(chunk[:-1])
    targets = char_tensor(chunk[1:])
    return inputs, targets

### 2. Build the LSTM model [4 points]

#### [Diagram of LSTM]
<img src="http://drive.google.com/uc?export=view&id=1baQ6Ffu-vDcXbOEBYGeLzhmfvaj4DGgW" style="width: 800px;"/>
LSTM consists of cell state, hidden state and 3 gates that modify or use the cell state. The cell state is the key part of the LSTM and you can think that information "flows" in there. The operation of 3 gates are described in below.

#### [Forget Gate]
The forget gate determines which information in the cell state should be erased.
<img src="http://drive.google.com/uc?export=view&id=1sJisl5P0hggmvH4qrcYgSETFKdFdBSH_" style="width: 600px;"/>

#### [Input Gate]
First, the candidate cell state is created using the current input and the previous hidden state. And the input gate determines how much the candidate cell state is reflected to the cell state.
<img src="http://drive.google.com/uc?export=view&id=1Df-k5FORGH7PnXauYcb8qqUpY3Uot9A7" style="width: 600px;"/>

#### [Output Gate]
The output gate determines which elements should be extracted from the cell state to produce the output.
<img src="http://drive.google.com/uc?export=view&id=1JLCGPcrZLOYfjyJhMTvmfixHq5plFj8L" style="width: 600px;"/>

The above expression is summarized as follows,
<img src="http://drive.google.com/uc?export=view&id=1kGq8DwwzizuNcg6GF0GaP1DAu26FFlrB" style="width: 300px;"/>


This model will take as input the character for step $t_{-1}$ and is expected to output the next character $t$. There are three layers - one linear layer that encodes the input character into an internal state, one LSTM layer that operates on that internal state and a hidden state, and a decoder layer that outputs the probability distribution.

In [8]:
class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.output_dim = output_dim
        
        self.encoder = nn.Embedding(input_dim, hidden_dim)
        
        # lstm
        # The size of input is (batch_size, seq_dim, hidden_dim)
        #############
        # CODE HERE #
        
        # Forget Gate
        self.sigmoid1 = nn.Sigmoid()
        self.weight_fx = nn.Linear(input_dim, hidden_dim, bias=True)
        self.weight_fh = nn.Linear(input_dim, hidden_dim, bias=True)
        
        # Input Gate
        self.sigmoid2 = nn.Sigmoid()
        self.weight_ix = nn.Linear(input_dim, hidden_dim, bias=True)
        self.weight_ih = nn.Linear(input_dim, hidden_dim, bias=True)
        self.tanh1 = nn.Tanh()
        self.weight_Cx = nn.Linear(input_dim, hidden_dim, bias=True)
        self.weight_Ch = nn.Linear(input_dim, hidden_dim, bias=True)
        
        # Output Gate
        self.sigmoid3 = nn.Sigmoid()
        self.weight_ox = nn.Linear(input_dim, hidden_dim, bias=True)
        self.weight_oh = nn.Linear(input_dim, hidden_dim, bias=True)
        
        self.tanh2 = nn.Tanh()
        #############
        
        # Readout Layer
        self.decoder = nn.Linear(hidden_dim, output_dim, bias=False)
    
    def forward(self, input, hn, cn):
        #############
        # CODE HERE #
        for t in range(input.size()[0]):
            embeds = self.encoder(input[t])
            # Forget Gate
            f = self.sigmoid1(self.weight_fx(embeds) + self.weight_fh(hn[0,:,:]))
            
            # Input Gate
            i = self.sigmoid2(self.weight_ix(embeds) +self.weight_ih(hn[0,:,:]))
            c_hat = self.tanh1(self.weight_Cx(embeds) + self.weight_Ch(hn[0,:,:]))

            # Output Gate
            o = self.sigmoid3(self.weight_ox(embeds) + self.weight_oh(hn[0,:,:]))

            # Summarized
            cn = f * cn[0,:,:] + i * c_hat
            cn = cn.unsqueeze(dim=0)
            hn = o * self.tanh2(cn[0,:,:])
            hn = hn.unsqueeze(dim=0)
        
        output = self.decoder(hn[0,:,:])
        #############
        return output, hn, cn

    def init_hidden(self):
        # The size of h0, c0 should be (layer_dim, batch_size, hidden_dim)
        #############
        # CODE HERE #
        h0 = Variable(torch.zeros(self.layer_dim, 1, self.hidden_dim).cuda()) # initial hidden state
        c0 = Variable(torch.zeros(self.layer_dim, 1, self.hidden_dim).cuda()) # initial cell state
        #############
        return h0, c0
    
hidden_dim = 100
n_layers = 1
n_characters = len(all_characters)

model = LSTMModel(n_characters, hidden_dim, n_layers, n_characters)
model.cuda()

LSTMModel(
  (encoder): Embedding(100, 100)
  (sigmoid1): Sigmoid()
  (weight_fx): Linear(in_features=100, out_features=100, bias=True)
  (weight_fh): Linear(in_features=100, out_features=100, bias=True)
  (sigmoid2): Sigmoid()
  (weight_ix): Linear(in_features=100, out_features=100, bias=True)
  (weight_ih): Linear(in_features=100, out_features=100, bias=True)
  (tanh1): Tanh()
  (weight_Cx): Linear(in_features=100, out_features=100, bias=True)
  (weight_Ch): Linear(in_features=100, out_features=100, bias=True)
  (sigmoid3): Sigmoid()
  (weight_ox): Linear(in_features=100, out_features=100, bias=True)
  (weight_oh): Linear(in_features=100, out_features=100, bias=True)
  (tanh2): Tanh()
  (decoder): Linear(in_features=100, out_features=100, bias=False)
)

### 3. loss function and optimizer

In [9]:
criterion = nn.CrossEntropyLoss()

lr = 0.005
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

### 4 . Write the character level generation code [4 points]

In [10]:
def evaluate(prime_str='W', predict_len=100):
    # suppose prims_str is a single character
    # and use greedy search to predict the next character

    hn, cn = model.init_hidden()
    predicted = str()
    
    for i in range(predict_len):
        #############
        # CODE HERE #
        predicted += prime_str
        temp_predicted = predicted
        
        temp_predicted = char_tensor(temp_predicted)
        temp_predicted = temp_predicted.cuda()
        
        output, hn, cn = model(temp_predicted, hn, cn)
        temp_prime = output.argmax().item()
        prime_str = all_characters[temp_prime]
        #############

    return predicted

In [11]:
evaluate()

"WA[=cM\x0b-I%x{tMv%*#\x0c^2B(Iwo'v%x2r:rx2r0B('DA7*#\x0c^2B(Iwo'v%x2r:rx2r0B('DA7*#\x0c^2B(Iwo'v%x2r:rx2r0B('DA7"

### 5 . Write the code to train the model [2 points]

In [None]:
n_epochs = 2000
print_every = 100
plot_every = 10

all_losses = []
loss_avg = 0

for epoch in range(1, n_epochs + 1):
    #############
    # CODE HERE #
    #############
    # Load text
    inputs, targets = random_training_set()
    if inputs.size()[0] < 200: continue
    
    inputs, targets = inputs.cuda(), targets.cuda()
    
    # Clear gradients w.r.t. parameters
    optimizer.zero_grad()
    
    # Forward pass
    loss = 0
    hn, cn = model.init_hidden()
    
    for i in range(inputs.size()[0]):
        outputs, hn, cn = model(inputs[i].unsqueeze(dim=0), hn, cn)
        # print(outputs.size())
        # Calculate Loss: softmax --> cross entropy loss
        loss += criterion(outputs, targets[i].unsqueeze(dim=0))
    
    # Backward pass
    loss.backward()
    
    # Updating parameters
    optimizer.step()
    
    loss_avg += loss.item() / chunk_len

    if epoch % print_every == 0:
        print('*'*25, 'epoch%d'%epoch, '*'*25)
        print('loss %.4f'%loss.item())
        print(evaluate('I', 100), '\n')

    if epoch % plot_every == 0:
        all_losses.append(loss_avg / plot_every)
        loss_avg = 0


#################################################
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure()
plt.plot(all_losses)

************************* epoch100 *************************
loss 446.8629
I the mome the mome the mome the mome the mome the mome the mome the mome the mome the mome the mome 

************************* epoch200 *************************
loss 178.5768
I better life the better life the better life the better life the better life the better life the be 

************************* epoch300 *************************
loss 120.8945
I better never life it, you better never life it, you better never life it, you better never life it 

************************* epoch400 *************************
loss 293.3300
I better
You better lose yourself in the moment
You own it, you better
You better
You better
You bet 

************************* epoch500 *************************
loss 245.4014
It only get one shot, he's no the moment
You own it, you better
He's no the moment
You own it, you b 

************************* epoch600 *************************
loss 178.6050
It only get one shot, do not miss y

### *References*
[1] [practical pytorch](https://github.com/spro/practical-pytorch)(https://github.com/spro/practical-pytorch)

[2] [CS 231n](http://cs231n.stanford.edu/syllabus.html)(http://cs231n.stanford.edu/syllabus.html)