<center>
    <h1>Recurrent Neural Networks(RNNs)</h1>
</center>

# Brief Recap

Recurrent Neural Networks (RNNs) are a type of artificial neural network designed for processing sequential data, where the output depends not just on the current input but also on previous inputs. This makes them particularly well-suited for tasks like time series forecasting, language modeling, speech recognition, and machine translation. <br>
They were introduced in the 1980s. The foundational work on RNNs is attributed to John Hopfield, who introduced the Hopfield Network in 1982, and David Rumelhart, along with Geoffrey Hinton and Ronald J. Williams, who formalized the concept of backpropagation through time (BPTT) in the mid-1980s.



## RNN Architecture Description


<img src='assets/rnn.png' width=500/>

1. **Input Layer:** The input to an RNN is typically a sequence of vectors, where each element of the sequence corresponds to one time step. For example, in a sequence of words (like a sentence), each word can be represented as a vector (often a word embedding).
2. **Hidden Layer:** The defining feature of an RNN is its hidden state, which acts as a memory of the network. This state is updated at each time step based on the current input and the previous hidden state.
3. **Output Layer:** At each time step, the RNN can produce an output, which might be a prediction(e.g., the next word in a sentence), a classification(e.g., positive or negative sentiment of a sentence) or a final output after the whole sequence has been processed(e.g., classifying a video).  
4. **Recurrent Loop:** Unlike traditional neural networks, which process inputs independently, RNNs have recurrent connections that loop information back into the network. This loop allows information to flow from one time step to the next.

## Advantages of RNNs
* **Sequential Data Processing:** They can process input data one element at a time while retaining information about previous inputs, making them ideal for tasks where the order of data matters.
* **Memory of previous inputs:** Retains past information through hidden states, capturing context and temporal dependencies.
* **Parameter Efficiency:** Uses shared weights across time steps, reducing the number of parameters.
* **Variable-Length Inputs:** Can handle sequences of different lengths without architectural changes.
* **Contextual Understanding:** Captures broader context in tasks like sentiment analysis and question answering.

# Implementing RNNs with TensorFlow

TensorFlow provides an easy way to implement LSTM layers using the `tf.keras.layers.SimpleRNN` class. Here's an overview of the key components:
* `units`: Number of neurons in an RNN layer
* `input_shape` (`timesteps`, `input_dim`): Defines shape of the input to the RNN.`timesteps` refers to the number of time steps in each input sequence, and `input_dim` refers to the number of features in each time step.
* `return_sequences`: Determines whether to return the output for each time step (`True`) or only the final time step's output (`False`).
* `activation`: Specifies the activation function for the output layer.

For more detailed information, refer to the TensorFlow documentation on [SimpleRNN](https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN).

In [23]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Input

# Define model parameters
input_dim = 10  # Number of features in each time step
timesteps = 5   # Number of time steps in each sequence
num_classes = 3 # Number of output classes

# Build the RNN model
model = Sequential()

# Add an RNN layer
model.add(Input(shape=(timesteps, input_dim)))
model.add(SimpleRNN(50, return_sequences=False))

# Add a dense output layer
model.add(Dense(num_classes, activation='softmax'))

# Summarize the model
model.summary()


**SimpleRNN Layer:**

This is the recurrent layer with 50 units (neurons). It takes a sequence of shape (timesteps, input_dim) as input, where timesteps = 5 and input_dim = 10.
The parameter `return_sequences=False` means that only the output of the last time step is passed to the next layer.

**Dense Layer:**

A fully connected layer with num_classes = 3 neurons, each corresponding to one output class. The `activation` function is softmax, which is typically used for multi-class classification tasks.

# Sample Q/A Use Case with RNN

Let's build a **Question Answering problem** and solve it using `SimpleRNN`.

Preparing data for **Recurrent Neural Networks(RNNs)** in TensorFlow involves several key steps to ensure that your sequential data is in the right format for training and evaluation.



In [24]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

In [25]:
# Dummy dataset
questions = [
    "What is your name?",
    "How are you?",
    "What is your profession?",
    "Where do you live?",
    "What is your favorite color?"
]

# Corresponding answers (encoded as class labels)
# We'll assign a unique class to each answer
answers = [0, 1, 0, 2, 3]

### Tokenizing and Preprocessing
* `Tokenizer`: It is used to convert words in questions into integer sequences.
* `pad_sequences`: It ensures that all sequences are of same length by padding shorter sequences with zeros.

In [26]:
# Initialize tokenizer to convert text to sequences

tokenizer = Tokenizer()
tokenizer.fit_on_texts(questions)

You can check how each word is encoded using `tokenizer.word_index`

In [27]:
tokenizer.word_index

{'what': 1,
 'is': 2,
 'your': 3,
 'you': 4,
 'name': 5,
 'how': 6,
 'are': 7,
 'profession': 8,
 'where': 9,
 'do': 10,
 'live': 11,
 'favorite': 12,
 'color': 13}

In [28]:
# Convert text to sequences of integers
sequences = tokenizer.texts_to_sequences(questions)

# Pad sequences to ensure all have the same length
max_sequence_length = max(len(seq) for seq in sequences)
X = pad_sequences(sequences, maxlen=max_sequence_length)

In [29]:
sequences

[[1, 2, 3, 5], [6, 7, 4], [1, 2, 3, 8], [9, 10, 4, 11], [1, 2, 3, 12, 13]]

In [30]:
# Convert answers to numpy array
y = np.array(answers)

# Print the preprocessed data
print("Padded Sequences:\n", X)
print("\nLabels:\n", y)

Padded Sequences:
 [[ 0  1  2  3  5]
 [ 0  0  6  7  4]
 [ 0  1  2  3  8]
 [ 0  9 10  4 11]
 [ 1  2  3 12 13]]

Labels:
 [0 1 0 2 3]


## Build and train a `SimpleRNN` model

The **SimpleRNN** layer processes the input sequences. We use an embedding dimension based on the vocabulary size (number of unique words), followed by a `dense` output layer for classification.

**Training:**
The model is trained for 20 epochs using `sparse_categorical_crossentropy` as the loss function, which is suitable for multi-class classification.


In [31]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Input

In [32]:
# Define the model
model = Sequential()

# Add SimpleRNN layer

# 50 units in the RNN
model.add(SimpleRNN(50, input_shape=(5, 1))) # Sending 1 feature at each time-step

# Output layer (using softmax for multi-class classification)
model.add(Dense(4, activation='softmax'))  # 4 unique answer classes

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=20, verbose=1)

# Evaluate the model
loss, accuracy = model.evaluate(X, y, verbose=1)
print(f"Model Accuracy: {accuracy:.4f}")


Epoch 1/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2000 - loss: 1.9474
Epoch 2/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 104ms/step - accuracy: 0.0000e+00 - loss: 1.8730
Epoch 3/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 110ms/step - accuracy: 0.0000e+00 - loss: 1.8029
Epoch 4/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 95ms/step - accuracy: 0.0000e+00 - loss: 1.7366
Epoch 5/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 96ms/step - accuracy: 0.0000e+00 - loss: 1.6734
Epoch 6/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 91ms/step - accuracy: 0.2000 - loss: 1.6129
Epoch 7/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 94ms/step - accuracy: 0.2000 - loss: 1.5549
Epoch 8/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 100ms/step - accuracy: 0.4000 - loss: 1.4993
Epoch 9/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━

## Inference

| Questions                       | Answers |
|----------------------------------|---------|
| What is your name?               | 0       |
| How are you?                     | 1       |
| What is your profession?         | 0       |
| Where do you live?               | 2       |
| What is your favorite color?     | 3       |

<br>

You can ask the above questions under `new_question` variable and check the output if they are true.

In [33]:
# New question for inference
new_question = ["How are you?"]

# Tokenize the new question (same tokenizer used during training)
new_sequence = tokenizer.texts_to_sequences(new_question)

# Pad the new sequence to the same length as the training data
new_padded_sequence = pad_sequences(new_sequence, maxlen=max_sequence_length)

# new_padded_sequence = to_categorical(new_padded_sequence, num_classes=vocab_size)

# Make prediction
predicted_probabilities = model.predict(new_padded_sequence)

# Get the predicted class (answer) with the highest probability
predicted_class = np.argmax(predicted_probabilities)

# Print the predicted class
print(f"\nPredicted Answer: {predicted_class}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 265ms/step

Predicted Answer: 1


## Embedding Layer



It's primarily used to map categorical data, such as words or items, into a continuous vector space. This vector space is learned during the training process and allows the model to capture semantic relationships and similarities between the items.

**Commonly used to represent words as dense vectors, capturing semantic similarities and relationships.**

<img src='https://www.researchgate.net/publication/349630764/figure/fig3/AS:999014610788354@1615195052671/Detail-of-the-embedding-layer-of-the-NN-implementing-the-Encoding-model-used-for-the.png' width=400/>

**Benefits:**
* The learned embeddings often capture semantic relationships between the items. For example, words with similar meanings might be located closer together in the embedding space.
* Using embeddings can significantly improve the performance of neural networks on tasks like text classification, machine translation, and recommendation systems.
* Embeddings reduce the high-dimensional categorical data into a lower-dimensional, dense vector space. This makes it easier for the neural network to process and learn patterns.

[`tf.keras.layers.Embedding`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)


In [34]:
import tensorflow as tf

# Sample input text
input_text = ["hello world", "this is a sentence"]

# Create a vocabulary mapping words to unique indices
vocab = ["hello", "world", "this", "is", "a", "sentence"]
word_to_index = {word: i for i, word in enumerate(vocab)}

In [35]:
# Convert input text to sequences of indices
input_sequences = [[word_to_index[word] for word in sentence.split()] for sentence in input_text]

# Create an embedding layer with a vocabulary size of 6 (matching the number of words),
# an embedding dimension of 10, and padding to ensure all sequences have the same length
embedding_layer = tf.keras.layers.Embedding(input_dim=len(vocab), output_dim=10)


In [36]:
print(input_sequences)

[[0, 1], [2, 3, 4, 5]]


Input sentences represented in a sequence of numbers.

In [37]:
# Pad sequences using tf.keras.preprocessing.sequence.pad_sequences
from tensorflow.keras.preprocessing.sequence import pad_sequences
input_sequences = pad_sequences(input_sequences)

# Convert input_sequences to a tensor
input_sequences = tf.convert_to_tensor(input_sequences)

# Embed the input sequences
embedded_sequences = embedding_layer(input_sequences)

# Print the embedded sequences
print(embedded_sequences)

tf.Tensor(
[[[ 0.02632297  0.0067999  -0.03521492 -0.02990087 -0.03918583
   -0.01334275 -0.02421255 -0.03841143 -0.02234389  0.03832153]
  [ 0.02632297  0.0067999  -0.03521492 -0.02990087 -0.03918583
   -0.01334275 -0.02421255 -0.03841143 -0.02234389  0.03832153]
  [ 0.02632297  0.0067999  -0.03521492 -0.02990087 -0.03918583
   -0.01334275 -0.02421255 -0.03841143 -0.02234389  0.03832153]
  [-0.03405018  0.00875782  0.03970999 -0.00955458  0.03895843
   -0.02426677  0.01247746  0.01855269  0.00441507  0.03619235]]

 [[-0.01285506  0.04230471  0.00410714  0.01571724  0.04584699
   -0.01689519  0.03950064  0.04205848  0.01486951  0.00253208]
  [-0.02851579 -0.04387398  0.02798631 -0.04968909 -0.02135345
    0.0410854  -0.02813737 -0.03266578 -0.00893842  0.04640827]
  [-0.03648572  0.03039191  0.00799771  0.00564852  0.01863635
   -0.04800963 -0.02653407 -0.00791831  0.03134607  0.02439905]
  [-0.00593539 -0.01391931 -0.02124027 -0.01489367  0.00415547
   -0.01684654 -0.03015283  0.00541

Now both the sentences are represented as a sequence of vectors.

# Sentiment Analysis using RNN

In [38]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the IMDB dataset
data = tf.keras.datasets.imdb.load_data()
(x_train, y_train), (x_test, y_test) = data

All the reviews in this dataset have already been tokenized. <br>
If we check the first review in the training set,

In [39]:
print("First review in the training set:\n", x_train[0], "length:", len(x_train[0]), "class:", y_train[0])


First review in the training set:
 [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32] l

In [44]:
vocab_size = 100000  # Limit the vocabulary size to the top 10,000 words
maxlen = 200  # Limit each review to 200 words

# Pad sequences to have the same length
x_train = pad_sequences(x_train,padding='post',maxlen=maxlen)
x_test = pad_sequences(x_test,padding='post',maxlen=maxlen)


The max length of sequence is quite huge. Hence, we try to limit it to a max length of 100 to avoid inconsistencies while training.

In [45]:
x_train.shape

(25000, 200)

In [46]:

# Create an RNN model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 32),
    tf.keras.layers.SimpleRNN(64),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()

In [47]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2)

Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 95ms/step - accuracy: 0.4979 - loss: 0.6967 - val_accuracy: 0.5330 - val_loss: 0.6883
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 69ms/step - accuracy: 0.6117 - loss: 0.6581 - val_accuracy: 0.5444 - val_loss: 0.6831
Epoch 3/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 86ms/step - accuracy: 0.6796 - loss: 0.5401 - val_accuracy: 0.6506 - val_loss: 0.6336
Epoch 4/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 68ms/step - accuracy: 0.7397 - loss: 0.4286 - val_accuracy: 0.5444 - val_loss: 0.7764
Epoch 5/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 63ms/step - accuracy: 0.7662 - loss: 0.3698 - val_accuracy: 0.5494 - val_loss: 0.8229
Epoch 6/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 64ms/step - accuracy: 0.7738 - loss: 0.3630 - val_accuracy: 0.5468 - val_loss: 0.9160
Epoch 7/10
[1m3

<keras.src.callbacks.history.History at 0x182566fe190>

In [48]:
loss, accuracy = model.evaluate(x_test, y_test)
print(f'Test Accuracy: {accuracy:.4f}')

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 17ms/step - accuracy: 0.5518 - loss: 0.8998
Test Accuracy: 0.5524


In [49]:
# Assuming you have the model and tokenizer defined as in the previous code

tokenizer = Tokenizer(num_words=100)

# Convert a new text to a sequence
new_text = "the movie was bad!"
tokenizer.fit_on_texts(new_text)
new_sequence = tokenizer.texts_to_sequences([new_text])
new_sequence = pad_sequences(new_sequence, maxlen=100)

# Make a prediction
prediction = model.predict(new_sequence)

print(prediction)
# Interpret the prediction (assuming a binary classification task)
if prediction[0][0] > 0.5:
    print("Positive sentiment")
else:
    print("Negative sentiment")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 361ms/step
[[0.5505097]]
Positive sentiment


## Improvement Strategies

Consider the following strategies to help improve the accuracy of the above model.

1. **Increase max length of sequence:** Increasing the maxlen might help in increasing the accuracy as more information from each review would be available to the model.
2. **Increase the number of epochs**: The model might need more training iterations to learn the patterns in the data effectively.
3. **Add more RNN layers:** Stacking multiple RNN layers can help the model capture more complex dependencies in the sequence.
  * **Increase the number of units in the RNN layers:** More units can enhance the model's capacity to learn intricate patterns.
4. **Adjust the learning rate:** Fine-tuning the learning rate can impact the model's convergence speed and performance.
