<img src="https://futurejobs.my/wp-content/uploads/2021/05/d-min-1024x297.png" width="300"> </img>

> **Copyright &copy; 2021 Skymind Education Group Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). \
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0** 

# Recurrent Neural Network: Text Generation 


## Recalling RNN

Traditional feed-forward neural networks take in a fixed amount of input data all at the same time and produce a fixed amount of output each time. On the other hand, RNNs do not consume all the input data at once. Instead, they take them in one at a time and in a sequence. At each step, the RNN does a series of calculations before producing an output. The output, known as the hidden state, is then combined with the next input in the sequence to produce another output. This process continues until the model is programmed to finish or the input sequence ends.

You might be wondering, which portion of the RNN do I extract my output from? This really depends on what your use case is. For example, if you’re using the RNN for a classification task, you’ll only need one final output after passing in all the input - a vector representing the class probability scores. In another case, if you’re doing text generation based on the previous character/word, you’ll need an output at every single time step.
![image](https://user-images.githubusercontent.com/79887667/121659305-c14f2880-cad4-11eb-94cc-ccbaa4aa0dc9.png)

For Time Series :
* Forecasting - many-to-many or many-to-one
* Classification - many-to-one

For NLP :
* Text Classification: many-to-one
* Text Generation: many-to-many
* Machine Translation: many-to-many
* Named Entity Recognition: many-to-many
* Image Captioning: one-to-many

## Introduction

In this hands-on, we will be building a simple text generator based on an RNN model, something like the ones that is used in autocompleters. 

During inferencing, the model will be fed with a word or a sequence of starting characters. The output of the model will be a prediction of the next character in the sentence. This process will repeat itself until we generate a sentence of our desired length.

To keep things short and simple, we won't be using any large, external dataset. Instead, we will provide a few sentences and dive into how the model learns from these sentences.

Let's get started!

## Objectives
In this hands-on, we will :-

1. Create sample sentences to train our model with.
2. Ensure that the sentences are all of the same length.
3. Encode the string to numerical characters.
4. Instantiate the Model class.
5. Instantiate the optimizer and loss function.
6. Set the training hyperparameters and train the model.
7. Test the model

In [None]:
import numpy as np
import torch
from torch import nn
from torch import optim
from torch.nn import functional as F

# Seeding for reproducibility
torch.manual_seed(3)

Let's create a list of sentences to train our model with. Here there are 6 sample sentences.

In [None]:
text = [
        'hey how are you', 
        'good i am fine', 
        'have a nice day',
        'you are pretty', 
        'i am awesome', 
        'good luck'
        ]

Next, we'll be padding our input sentences to ensure that all the sentences are of standard length. While RNNs are typically able to take in variably sized inputs, we will usually want to feed training data in mini-batches to optimize the training process. In order to use batches to train on our data, we'll need to ensure that each sequence within the input data is of equal size.

Therefore, in most cases, padding can be done by filling up sequences that are too short with 0 values and trimming sequences that are too long. In our case, we'll be finding the length of the longest sequence and padding the rest of the sentences with blank spaces to match that length.

In [None]:
# Finding the length of the longest string in our data
max_len = len(max(text, key=len))
print(f"max_len={max_len}")

# append each sentence with a space (' ') until they all are in the same length
for i,_ in enumerate(text):
  while len(text[i]) < max_len:
    text[i] += ' ' 

Similar to training all the other model prior to this, we need an array of labelled data to train our model. 

Since in this project we will be predicting the upcoming character to an input character, we'll divide each sentence (Eg: "hey how are you") into:

  **1. Input Data:**
  - Strip the last character of the sentence
  - Eg: "hey how are yo" <br>

  
  **2. Target:**
  - Strip the first character of the sentence
  - Eg: "ey how are you"

**How is this going to work?**
1. Using the above example sentence, the first character of the input data will be 'h'. The model will be trained to predict the first target, 'e'. This is followed by:
  - input ('e'), target ('y') > 
  - input ('y'), target (' ') >
  - input (' '), target ('h') and on...

2. The idea is to have the target to be a character that is one time-step ahead of the input character.

_SIDE NOTE: Since the sequence of the output characters is crucial to ensure that the output sentence is English and not gibberish, we must be careful to not misarrange our training set._

In [None]:
# building input and target sequence
input_seq = []
target_seq = []

for i,_ in enumerate(text):
  # Remove last character for input sequence
  input_seq.append(text[i][:-1])

  # Remove first character for target sequence
  target_seq.append(text[i][1:])

  print(f"Input Sequence: {input_seq[i]}", end="\n")
  print(f"Target Sequence: {target_seq[i]}\n")

## Encoding

Now we can convert our input and target sequences to sequences of integers instead of a sequence of characters by mapping them using the dictionaries we created above. Subsequently, this will allow us to one-hot-encode our input sequence. In other words, we need to perform **integer encoding** on our data before being able to perform **one-hot encoding**. 

Although there are classes available in the `scikit-learn` library (Eg: `sklearn.preprocessing.LabelEncoder`, `sklearn.preprocessing.OneHotEncoder`), we will rather do them manually to breakdown what is going on during an encoding. You may try to recreate the encodings using these classes in your own time.

In [None]:
# Join all sentences together and extract the unique characters from the combined sentences
unique_chars = set(''.join(text))

# Create a dictionary that maps characters to integers (for encoding)
char2int = dict((c,i) for i,c in enumerate(unique_chars))

# Create a dictionary that maps integers to characters (for decoding)
int2char = dict((i,c) for i,c in enumerate(unique_chars))

The `char2int` dictionary holds all the unique letters/symbols that were present in our sentences and maps each of them to a unique integer.

In [None]:
print(f"Length of 'char2int' is {len(char2int)}")
print(char2int)

`int2char` is a reverse mapping of `char2int`.

In [None]:
print(f"Length of 'int2char' is {len(int2char)}")
print(int2char)

### Integer Encoding

In [None]:
def integer_encoding(seq_list:list, mapping_dict:dict):
  """Encodes a list of """
  encoded_data = []
  for i,_ in enumerate(seq_list):
    encoded_data.append([mapping_dict[character] for character in seq_list[i]])
  return encoded_data

In [None]:
print(f"Before integer encoding: {input_seq[0]}")

input_seq_int_enc = integer_encoding(input_seq, char2int)
target_seq_int_enc = integer_encoding(target_seq, char2int)

print(f"After integer encoding: {input_seq_int_enc[0]}")

### One-Hot Encoding

In [None]:
def one_hot_encoding(integer_encoded_data, mapping_dict : dict):
  # initialize the output with zeros
  total_seqs = len(integer_encoded_data)
  seq_len = len(integer_encoded_data[0])
  encoding_vector = len(mapping_dict)

  one_hot_encoded = np.zeros((total_seqs, seq_len, encoding_vector), dtype=np.float32) # Shape: (6,14,22)

  for i, seq in enumerate(integer_encoded_data):
    for j, integer in enumerate(seq):
      one_hot_encoded[i, j, integer] = 1
  return one_hot_encoded

In [None]:
print(f"Before one hot encoding: {input_seq_int_enc[0]}, Shape: {len(input_seq_int_enc[0])}")
input_seq_onehot = one_hot_encoding(input_seq_int_enc, char2int)
print(f"After one hot encoding: {input_seq_onehot[0]}, Shape: {input_seq_onehot[0].shape}")

Now that we're done with the pre-processing, let's convert types into `torch.Tensor`.

In [None]:
input_seq_onehot = torch.from_numpy(input_seq_onehot)
target_seq_int_enc = torch.Tensor(target_seq_int_enc)

## Model building

We'll be defining the model using the `torch` library. We'll be using the basic `nn.rnn` to demonstrate a simple example of how RNNs can be used.

To start building our own neural network, we can define a class that inherits PyTorch's base class (`nn.Module`) for all neural network modules.

Our model class will:
  - have its layers initiated in the constructor.
  - particulary, use only 1 layer of RNN followed by a fully connected layer. The fully connected layer will convert the RNN output to our desired output shape.
  - contain a method to initialize a tensor of zeros for the hidden state, `init_hidden()`.
  - contain a forward pass method, `forward()`, which will be executed sequentially. We'll pass in the inputs and the zero-initialized hidden state first, then pass the intermediate result to the fully-connected layer.


In [None]:
class RNNModel(nn.Module):
  def __init__(self, input_dim, output_dim, hidden_dim, n_layers):
    super(RNNModel, self).__init__()

    # Defining some parameters
    # ------------------------
    self.hidden_dim = hidden_dim
    self.n_layers = n_layers

    # Layer definition
    # ----------------
    self.rnn = nn.RNN(input_dim, hidden_dim, n_layers, batch_first=True) # with 'batch_first=True', the input dim expected is (batch_size, seq_len, one_hot_encode_len)
    self.fc = nn.Linear(hidden_dim, output_dim)
  
  def init_hidden(self, batch_size):
    """Generate a tensor with zeros with size of (n_layers, batch_size, hidden_dim).
    It will act as our initial hidden state.    
    """
    hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
    return hidden

  def forward(self, x):
    batch_size = x.size(0)

    # initialize hidden state
    hidden = self.init_hidden(batch_size)

    # Passing in the input and hidden state into the model
    out, hidden = self.rnn(x, hidden)

    # Reshaping the output such that it can be fit into the  fully connected layer
    out = out.contiguous().view(-1, self.hidden_dim)
    out = self.fc(out)
    # contiguous() returns itself if the input tensor is already contiguous, otherwise it returns a new contiguous tensor by copying data
    # What is contiguous? https://discuss.pytorch.org/t/contigious-vs-non-contigious-tensor/30107/2

    return out, hidden

For the loss function, we use `CrossEntropyLoss` as the final output is basically a number of possible characters, which make it a classification task. Also, we use the common Adam optimizer.

Next, we'll instantate the model with the relevant parameters and define our hyper-parameters which are epochs and learning rate.

In [None]:
# Model instantiation
model_rnn = RNNModel(input_dim = len(char2int), output_dim=len(char2int), hidden_dim=12, n_layers=1)

# Print model summary
print(model_rnn)

## Training the model

In [None]:
n_epochs = 100
lr = 0.01

# Initialize the optimizer and loss function
optimizer = optim.Adam(model_rnn.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

for epoch in range(n_epochs):
  optimizer.zero_grad() # Clears existing gradients from previous epoch
  output, hidden = model_rnn(input_seq_onehot)
  loss = criterion(output, target_seq_int_enc.view(-1).long())
  loss.backward() # Calculate gradient and backprop
  optimizer.step() # Update the weights accordingly

  if epoch %10 == 0:
    print(f"Epoch: {epoch}/{n_epochs}.....", end=" ")
    print(f"Loss: {loss.item():.4f}")

## Generate Text

Let's test out model now and see what kind of output will we get. A helper function to convert our model ouptut back to text will be helpful!

In [None]:
def predict(model, input_characters):
  """Returns the predicted character as well as the hidden state."""

  # One hot encoding our input characters to fit in the model
  input_characters_int_enc = integer_encoding(input_characters, char2int)
  input_characters_onehot = one_hot_encoding(input_characters_int_enc, char2int)
  input_characters_onehot = torch.from_numpy(np.array(input_characters_onehot)) # Convert to torch tensor

  out, hidden = model(input_characters_onehot)

  # Pass into Softmax to output probability
  # Only take the last output, which does not include our input
  prob = F.softmax(out[-1], dim=0).data
  # Taking the class with the highest probability score from the output
  char_ind = torch.max(prob, dim=0)[1].item()

  return int2char[char_ind], hidden

In [None]:
def sample(model, sentence_length, starting_chars):
  model.eval()
  # Put starting chars in a list
  chars = list(starting_chars)
  predict_size = sentence_length - len(chars)
  # Now pass in the previous characters and get a new one
  for _ in range(predict_size):
    input_characters = []
    input_characters.append(chars)
    char, h = predict(model,input_characters)
    chars.append(char)
  
  return ''.join(chars)

Before we test, let's remind ourselves on the sentences that the model was trained with and the length of each sentence.

In [None]:
print(text)
print(len(text[0]))

In [None]:
sample(model=model_rnn, sentence_length=15, starting_chars='hey')


## Model Limitations

1. Overfitting
    - We only fed the model with a few sentences during training, therefore it essentially “memorized” the sequence of characters of these sentences and thus returned us the exact sentence that we trained it on. <br>
    <br>
2. Handling of unseen characters
    - The model is currently only able to process the characters that it has seen before in the training data set. <br>
    <br>
3. Representation of Textual Data

    - In this implementation, we used one-hot encoding to represent our characters. While it may be fine for this task due to its simplicity, most of the time it should not be used as a solution in actual or more complex problems. This is because:

        - It is computationally too expensive for large datasets. 
        - There is no contextual/semantic information embedded in one-hot vectors.

    - Instead, most modern NLP solutions rely on word embeddings (word2vec, GloVe) or more recently, unique contextual word representations in BERT, ELMo, and ULMFit. You can also use the pretrained NLP models which will ease the task instead of you having to train a new model on millions of existing words.

While the vanilla RNN is rarely used in solving NLP or sequential problems nowadays, having a good grasp of the basic concepts of RNNs will definitely aid in your understanding as you move towards the more popular GRUs and LSTMs.

## Appendix

##### Revolutionized Google Search Engine

On 18-20 May 2021, Google hosted its annual Google I/O conference. They presented updates to existing technologies such as Google Maps or Google Photos together with some amazing technologies such as LaMDA, a skilled conversationalist AI that could revolutionize chatbot tech, or MUM. 

MUM is an improvement of Google’s search engine. Like other popular state-of-the-art language models such as GPT-3 or LaMDA, MUM is based on the transformer architecture. BERT (MUM’s predecessor) is similar in this regard, the main difference being that MUM is 1000x more powerful. 

<p style='text-align: right;'> --- by Alberto Romero, dated 29 May 2021 <a href="https://towardsdatascience.com/will-googles-mum-kill-seo-d283927f0fde">source</a> </p>

## Reference

1. [A Beginner’s Guide on Recurrent Neural Networks with PyTorch](https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/)
2. [Pytorch [Basics] — Intro to RNN](https://towardsdatascience.com/pytorch-basics-how-to-train-your-neural-net-intro-to-rnn-cb6ebc594677)