<a href="https://colab.research.google.com/github/CyShahedB/AiMl_Module4_shahed/blob/main/Mohammad_Shahed_0129_RNN_Training_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Networks with PyTorch

In [None]:
import torch
from torch import nn

import numpy as np

 PyTorch library and its neural network module (nn), as well as the NumPy library (np). PyTorch is a popular deep learning library, and NumPy is commonly used for numerical operations in Python.

In [None]:
text = ['hey how are you','good i am fine','have a nice day']

# Join all the sentences together and extract the unique characters from the combined sentences
chars = set(''.join(text))

# Creating a dictionary that maps integers to the characters
int2char = dict(enumerate(chars))

# Creating another dictionary that maps characters to integers
char2int = {char: ind for ind, char in int2char.items()}

 we are working on encoding characters into integers and vice versa using dictionaries in Python. This is a common preprocessing step when working with text data in natural language processing tasks. Here, we are creating dictionaries to map characters to integers (char2int) and integers back to characters (int2char).

The code you provided is well-structured. Just to clarify:

1) chars contains the unique characters present in all the sentences.

2) int2char is a dictionary that maps integers to characters.

3) char2int is a dictionary that maps characters to integers.

Now, we can use these dictionaries to convert your text data into numerical representations, which is often necessary when working with neural networks or other machine learning models.

In [None]:
print(char2int)

{'v': 0, 'c': 1, 'o': 2, 'a': 3, 'f': 4, 'r': 5, 'u': 6, 'h': 7, 'n': 8, 'd': 9, 'w': 10, 'y': 11, 'g': 12, 'i': 13, 'm': 14, 'e': 15, ' ': 16}


 it will show the mapping of characters to integers that we have created.

In [None]:
maxlen = len(max(text, key=len))
print("The longest string has {} characters".format(maxlen))

The longest string has 15 characters


It calculates the length of the longest string in the text list using the max function and the len function. Then, it prints a message indicating the length of the longest string.

In this case, the max function is used with the key=len argument to find the string with the maximum length in the text list. The len function is applied to each string, and the maximum length is determined.

The result (maxlen) is then printed in a formatted string, indicating the length of the longest string in the text list.



In [None]:
# Padding

# A simple loop that loops through the list of sentences and adds a ' ' whitespace until the length of the sentence matches
# the length of the longest sentence
for i in range(len(text)):
    while len(text[i])<maxlen:
        text[i] += ' '

 we are implementing padding to make all sentences in the text list have the same length. This is a common preprocessing step, especially when working with sequences like sentences in natural language processing. Padding ensures that all sequences are of equal length, which is often required when feeding data into neural networks.

Our code snippet achieves this by looping through each sentence in the text list and adding whitespace (' ') to the end of each sentence until its length matches the length of the longest sentence (maxlen).

After running this code, all sentences in the text list will have the same length, equal to maxlen. This is beneficial for creating uniform input data when working with neural networks or other machine learning models that expect fixed-size inputs.

In [None]:
# Creating lists that will hold our input and target sequences
input_seq = []
target_seq = []

for i in range(len(text)):
    # Remove last character for input sequence
    input_seq.append(text[i][:-1])

    # Remove firsts character for target sequence
    target_seq.append(text[i][1:])
    print("Input Sequence: {}\nTarget Sequence: {}".format(input_seq[i], target_seq[i]))

Input Sequence: hey how are yo
Target Sequence: ey how are you
Input Sequence: good i am fine
Target Sequence: ood i am fine 
Input Sequence: have a nice da
Target Sequence: ave a nice day


you are creating input and target sequences for a character-level language modeling task. In this code snippet, you iterate through each sentence in the text list and create input sequences (input_seq) and target sequences (target_seq).

input_seq: It contains sequences where the last character of each sentence in the text list is removed. This represents the input to your model.

target_seq: It contains sequences where the first character of each sentence in the text list is removed. This represents the target output for your model, where the goal is to predict the next character in each sentence.

The print statement is just for visualization, showing the corresponding input and target sequences for each sentence.

In [None]:
for i in range(len(text)):
    input_seq[i] = [char2int[character] for character in input_seq[i]]
    target_seq[i] = [char2int[character] for character in target_seq[i]]

Before encoding our input sequence into one-hot vectors, we'll define 3 key variables:

- *dict_size*: The number of unique characters that we have in our text
    - This will determine the one-hot vector size as each character will have an assigned index in that vector
- *seq_len*: The length of the sequences that we're feeding into the model
    - As we standardised the length of all our sentences to be equal to the longest sentences, this value will be the max length - 1 as we removed the last character input as well
- *batch_size*: The number of sentences that we defined and are going to feed into the model as a batch

In [None]:
dict_size = len(char2int)
seq_len = maxlen - 1
batch_size = len(text)

def one_hot_encode(sequence, dict_size, seq_len, batch_size):
    # Creating a multi-dimensional array of zeros with the desired output shape
    features = np.zeros((batch_size, seq_len, dict_size), dtype=np.float32)

    # Replacing the 0 at the relevant character index with a 1 to represent that character
    for i in range(batch_size):
        for u in range(seq_len):
            features[i, u, sequence[i][u]] = 1
    return features

we're defining a function one_hot_encode to convert our character sequences into one-hot encoded representations. One-hot encoding is a common technique used to represent categorical data, such as characters, in a binary matrix format.

dict_size: The number of unique characters in our dataset.

seq_len: The length of each sequence (excluding the last character for input sequences).

batch_size: The number of sequences in our dataset.

The function one_hot_encode takes a sequence as input and returns a one-hot encoded representation in the form of a multi-dimensional array (features). It iterates through each character in each sequence, setting the corresponding index in the one-hot encoded array to 1.

In [None]:
input_seq = one_hot_encode(input_seq, dict_size, seq_len, batch_size)
print("Input shape: {} --> (Batch Size, Sequence Length, One-Hot Encoding Size)".format(input_seq.shape))

Input shape: (3, 14, 17) --> (Batch Size, Sequence Length, One-Hot Encoding Size)


 we are applying our one_hot_encode function to the input_seq to convert the character sequences into one-hot encoded representations.

Since we're done with all the data pre-processing, we can now move the data from numpy arrays to PyTorch's very own data structure - **Torch Tensors**

In [None]:
input_seq = torch.from_numpy(input_seq)
target_seq = torch.Tensor(target_seq)

we are converting your NumPy arrays (input_seq and target_seq) into PyTorch tensors using the torch.from_numpy() and torch.Tensor() functions, respectively.

torch.from_numpy(input_seq): Converts the NumPy array input_seq to a PyTorch tensor.

torch.Tensor(target_seq): Converts the NumPy array target_seq to a PyTorch tensor. Note that torch.Tensor() creates a new tensor with the same data type as the input.

In [None]:
# torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
is_cuda = torch.cuda.is_available()

# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
    device = torch.device("cuda")
    print("GPU is available")
else:
    device = torch.device("cpu")
    print("GPU not available, CPU used")

GPU not available, CPU used


This code checks if a GPU is available using torch.cuda.is_available() and sets the device variable accordingly. If a GPU is available, it sets device to "cuda", and if not, it sets it to "cpu".

torch.cuda.is_available(): Checks if a GPU is available and returns a boolean value (True if GPU is available, False otherwise).

The code then uses an if statement to determine whether a GPU is available. If a GPU is available, it sets the device variable to "cuda" and prints a corresponding message. If no GPU is available, it sets the device variable to "cpu" and prints a message indicating that the CPU will be used.

This is a common practice to dynamically set the device based on GPU availability, allowing for flexibility in running code on either a GPU or CPU.

In [None]:
class Model(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(Model, self).__init__()

        # Defining some parameters
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Defining the layers
        # RNN Layer
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):

        batch_size = x.size(0)

        #Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(batch_size)

        # Passing in the input and hidden state into the model and obtaining outputs
        out, hidden = self.rnn(x, hidden)

        # Reshaping the outputs such that it can be fit into the fully connected layer
        out = out.contiguous().view(-1, self.hidden_dim)
        out = self.fc(out)

        return out, hidden

    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)
         # We'll send the tensor holding the hidden state to the device we specified earlier as well
        return hidden

This code defines a simple RNN (Recurrent Neural Network) model using PyTorch's nn.Module class. The model consists of an RNN layer and a fully connected layer. The forward method specifies how input data flows through the layers, and there's an additional method init_hidden to initialize the hidden state.

In [None]:
# Instantiate the model with hyperparameters
model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)
# We'll also set the model to the device that we defined earlier (default is CPU)
model = model.to(device)

# Define hyperparameters
n_epochs = 100
lr=0.01

# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

Now we can begin our training! As we only have a few sentences, this training process is very fast. However, as we progress, larger datasets and deeper models mean that the input data is much larger and the number of parameters within the model that we have to compute is much more.

In [None]:
# Training Run
input_seq = input_seq.to(device)
for epoch in range(1, n_epochs + 1):
    optimizer.zero_grad() # Clears existing gradients from previous epoch
    #input_seq = input_seq.to(device)
    output, hidden = model(input_seq)
    output = output.to(device)
    target_seq = target_seq.to(device)
    loss = criterion(output, target_seq.view(-1).long())
    loss.backward() # Does backpropagation and calculates gradients
    optimizer.step() # Updates the weights accordingly

    if epoch%10 == 0:
        print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))

Epoch: 10/100............. Loss: 2.4408
Epoch: 20/100............. Loss: 2.0811
Epoch: 30/100............. Loss: 1.6645
Epoch: 40/100............. Loss: 1.2433
Epoch: 50/100............. Loss: 0.8861
Epoch: 60/100............. Loss: 0.6095
Epoch: 70/100............. Loss: 0.4061
Epoch: 80/100............. Loss: 0.2775
Epoch: 90/100............. Loss: 0.1993
Epoch: 100/100............. Loss: 0.1517


Let’s test our model now and see what kind of output we will get. Before that, let’s define some helper function to convert our model output back to text.

In [None]:
def predict(model, character):
    # One-hot encoding our input to fit into the model
    character = np.array([[char2int[c] for c in character]])
    character = one_hot_encode(character, dict_size, character.shape[1], 1)
    character = torch.from_numpy(character)
    character = character.to(device)

    out, hidden = model(character)

    prob = nn.functional.softmax(out[-1], dim=0).data
    # Taking the class with the highest probability score from the output
    char_ind = torch.max(prob, dim=0)[1].item()

    return int2char[char_ind], hidden

The predict function takes a trained model and a single character as input and returns the predicted next character and the hidden state.

The function first converts the input character into a one-hot encoded representation suitable for the model.

Then, it performs a forward pass through the model to obtain the output and hidden state.

The output is passed through a softmax function to obtain probabilities for each character.

The index of the character with the highest probability is extracted (char_ind).

Finally, the predicted character and the hidden state are returned.

This function can be used for generating sequences character by character using the trained RNN model.

In [None]:
def sample(model, out_len, start='hey'):
    model.eval() # eval mode
    start = start.lower()
    # First off, run through the starting characters
    chars = [ch for ch in start]
    size = out_len - len(chars)
    # Now pass in the previous characters and get a new one
    for ii in range(size):
        char, h = predict(model, chars)
        chars.append(char)

    return ''.join(chars)

The sample function generates a sequence of characters given a starting string (start). It uses the trained model to predict the next characters in the sequence.

In [None]:
sample(model, 15, 'good')

'good i am fine '

As we can see, the model is able to come up with the sentence ‘good i am fine ‘ if we feed it with the words ‘good’, achieving what we intended for it to do!