# Lab3 Implementation of simple RNN and LSTM 

This notebook has been prepared by Hsiu-Wen Chang from MINES ParisTech
Shall you have any problem, send me [email](hsiu-wen.chang_joly@mines-paristech.fr)

In this lab, we are going to practice 

1. many-to-one by RNN: given several words, predict the next word
2. many-to-one by LSTM: given several letters, predict the final letter


## 1. Many-to-one by RNN (word level): Predict what is the next word

Our task today is to predict the next word by given several words before. For example, we expect to have answer to be 'cat' when user key in 'I like'.

In [None]:
# Configuration
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

dtype = torch.FloatTensor

### 1.1 Data preparation

Here are three sentences and each of them has three words. We are going to use it as training sample. The design is to feed first two words and let the machine find the final word. However, the computer can't do mathematic operations on characters. Therefore, the first step is to encode the input to digital numbers. 

In [None]:
# Create the input data, you are welcome to add the words you like
sentences = [ "i like cat", "i love coffee", "i hate milk"]

# Define all the possible words
word_list = " ".join(sentences).split()

word_list = list(set(word_list))

# dictionary that chanage the given word to number. {love: 0, hate:1,...}
word_dict = {w: i for i, w in enumerate(word_list)}

# dictionary that chanage the number to word. {0: love, 1: hate,...}
number_dict = {i: w for i, w in enumerate(word_list)}

# number of class(=number of vocab)
n_class = len(word_dict)

print(word_dict)

### 1.2 Data preprocessing

Define batch function to let machine know how he should use it during training.
Here we give all the data we have for simplication. But in real case, you should not do it.


In [None]:
# Function to encode the sentence into a vector 
def make_batch(sentences):
    input_batch = []
    target_batch = []

    for sen in sentences:
        word = sen.split()
        input = [word_dict[n] for n in word[:-1]]
        target = word_dict[word[-1]]

        input_batch.append(np.eye(n_class)[input])
        target_batch.append(target)

    return input_batch, target_batch

In [None]:
# to Torch.Tensor
input_batch, target_batch = make_batch(sentences)
input_batch = Variable(torch.Tensor(input_batch))
target_batch = Variable(torch.LongTensor(target_batch))

print('Dimension of input_patch:', input_batch.shape)
print(input_batch)

### 1.3 Network

Torch.nn provide a function call nn.RNN which is a multi-layer Elman RNN with $tanh$ or $ReLU$ (controlled by nonlinearity parameter) to an input sequence.

The equation to compute the hidden state is $$h_t=tanh(W_{ih}x_t+b_{ih}+w_{hh}h_{t-1}+b_{hh}) $$

Further information about how you can use it, check this [link](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html)

In [None]:
class TextRNN(nn.Module):
    def __init__(self,n_class=7, n_hidden=5):
        super(TextRNN, self).__init__()

        self.rnn = nn.RNN(input_size=n_class, hidden_size=n_hidden)
        self.W = nn.Parameter(torch.randn([n_hidden, n_class]).type(dtype))
        self.b = nn.Parameter(torch.randn([n_class]).type(dtype))

    def forward(self, hidden, X):
        X = X.transpose(0, 1) # X : [n_step, batch_size, n_class]
        outputs, hidden = self.rnn(X, hidden)
        # outputs : [n_step, batch_size, num_directions(=1) * n_hidden]
        # hidden : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        outputs = outputs[-1] # [batch_size, num_directions(=1) * n_hidden]
        model = torch.mm(outputs, self.W) + self.b # model : [batch_size, n_class]
        return model


In [None]:
# Paramters for the network
batch_size = len(sentences)
n_step = 2 # number of cells(= number of Step)
n_hidden = 5 # number of hidden units in one cell


In [None]:
model = TextRNN(n_class, n_hidden)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Lets see how this model looks like

In [None]:
print(model)

### 1.4 Training

In [None]:
# Training
for epoch in range(5000):
    # Reset the gradient buffer 
    optimizer.zero_grad() 

    # hidden : [num_layers * num_directions, batch, hidden_size]
    hidden = Variable(torch.zeros(1, batch_size, n_hidden))
    # input_batch : [batch_size, n_step, n_class]
    output = model(hidden, input_batch)

    # output : [batch_size, n_class], target_batch : [batch_size] (LongTensor, not one-hot)
    loss = criterion(output, target_batch)
    if (epoch + 1) % 1000 == 0:
        print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

    loss.backward()
    optimizer.step()

### 1.5 Test the model

In [None]:
# Predict
# Initial hidden state 0
hidden = Variable(torch.zeros(1, batch_size, n_hidden))

print('Raw output of this model:\n',model(hidden, input_batch))

predict = model(hidden, input_batch).data.max(1, keepdim=True)[1]
print([sen.split()[:2] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])#

### Task 1: create the sevearl french words and the corresponding english words by yourself. Train a RNN model that can translate the french word into english word. 

## 2. Many-to-one LSTM (character level): Predict what is the next letter

In this task, we will give our network to predict the final letter for us uisng LSTM. For example, if we key in 'lov' then the machine should give us 'e'

### 2.1 Data preparation and preprocessing

In [None]:
# we need define all the possible letters
char_arr = [c for c in 'abcdefghijklmnopqrstuvwxyz']

#word dictionary that can use to get the corresponding encoded number
word_dict = {n: i for i, n in enumerate(char_arr)}

# number dictionary that can be used to get the corresponding letter
number_dict = {i: w for i, w in enumerate(char_arr)}

n_class = len(word_dict) # number of class(=number of vocab)

seq_data = ['make', 'need', 'coal', 'word', 'love', 'hate', 'live', 'home', 'hash', 'star']

In [None]:
def make_batch(seq_data):
    input_batch, target_batch = [], []

    for seq in seq_data:
        input = [word_dict[n] for n in seq[:-1]] # 'm', 'a' , 'k' is input
        target = word_dict[seq[-1]] # 'e' is target
        input_batch.append(np.eye(n_class)[input])
        target_batch.append(target)

    return Variable(torch.Tensor(input_batch)), Variable(torch.LongTensor(target_batch))

### 2.2 Model

In [None]:
# TextLSTM Parameters
n_step = 3
n_hidden = 128

class TextLSTM(nn.Module):
    def __init__(self):
        super(TextLSTM, self).__init__()

        self.lstm = nn.LSTM(input_size=n_class, hidden_size=n_hidden)
        self.W = nn.Parameter(torch.randn([n_hidden, n_class]).type(dtype))
        self.b = nn.Parameter(torch.randn([n_class]).type(dtype))

    def forward(self, X):
        input = X.transpose(0, 1)  # X : [n_step, batch_size, n_class]

        hidden_state = Variable(torch.zeros(1, len(X), n_hidden))   # [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        cell_state = Variable(torch.zeros(1, len(X), n_hidden))     # [num_layers(=1) * num_directions(=1), batch_size, n_hidden]

        outputs, (_, _) = self.lstm(input, (hidden_state, cell_state))
        outputs = outputs[-1]  # [batch_size, n_hidden]
        model = torch.mm(outputs, self.W) + self.b  # model : [batch_size, n_class]
        return model

### 2.3 Training

In [None]:
input_batch, target_batch = make_batch(seq_data)

model = TextLSTM()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

output = model(input_batch)

# Training
for epoch in range(1000):
    optimizer.zero_grad()

    output = model(input_batch)
    loss = criterion(output, target_batch)
    if (epoch + 1) % 100 == 0:
        print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

    loss.backward()
    optimizer.step()

### 2.4 Testing the model

In [None]:
inputs = [sen[:3] for sen in seq_data]

predict = model(input_batch).data.max(1, keepdim=True)[1]
print(inputs, '->', [number_dict[n.item()] for n in predict.squeeze()])

### Task 2: 

1. Use whatever way you like, add more than 20 vocabulary and reuse the code to do the same task
1. modify the model to make it predict one word each time


## Conclusion

You should think about the problem when we have much bigger vocabulary that using dict to enumerate the words will make it very inefficient.
"Embedding" and "Tokenizer" are the two soltuions available in [Keras](https://keras.io/examples/nlp/). You should take a look at this document 


In [None]:
embedding = nn.Embedding(7, 3)