# Recurrent Neural Network (RNN)

Recurrent Neural Network aka RNN is a type of deep learning model specifically designed for processing sequential or time-series data. RNNs are capable of capturing the temporal dependencies and patterns in sequential data by maintaining an internal memory or hidden state that persists as new inputs are processed.

The fundamental building block of an RNN is a recurrent neuron, which takes both the current input and the previous hidden state as inputs. It performs computations on these inputs and produces an output and a new hidden state. This hidden state is then fed back into the recurrent neuron for the next time step, creating a recurrent loop that allows the network to remember and learn from previous inputs.

The key characteristic of RNNs is that they share the same set of weights and parameters across all time steps, allowing them to handle inputs of variable length. This weight sharing enables the network to learn and generalize patterns across different time steps, capturing long-range dependencies in the sequential data.

![Recurrent Neural Network](./../../assets/rnn.png)

### Why RNNs?

RNNs are required because regular neural networks along with CNNs;

- Cannot handle sequential data
- Takes into account the current input only

### RNNs, they Remember!

In traditional neural networks, each input is treated as independent and unrelated to other inputs. However, recurrent neural networks (RNNs) are different. They have the ability to remember and consider the relationships between inputs.

Imagine an RNN as a loop that keeps passing information to itself. This loop allows the RNN to maintain an internal memory or hidden state. When the RNN receives an input, it not only processes that input but also takes into account the information it has stored from previous inputs.

Think of it like reading a story. In a traditional neural network, you would read each sentence separately, without any context from previous sentences. But in an RNN, you read the sentences in order and remember the information from previous sentences as you move forward. This helps you understand the story better and predict what might come next.

The RNN's ability to remember comes from this loop-like structure. It allows the network to capture and store information from previous inputs and use it to influence the processing of future inputs. This memory allows the RNN to recognize patterns, dependencies, and sequences in the data.

By remembering and considering the relationships between inputs, RNNs are particularly useful for tasks that involve sequential data, such as predicting the next word in a sentence, generating text, analyzing time series data, and processing speech or natural language.

## Types of RNNs

Based on the patterns of inputs taken and outputs generated, RNNs can be categorized into five major types:

1. **One to One RNNs (Vanilla Neural Networks):** One-to-one Networks are not actually RNNs but Vanilla Neural Networks which has a single entrance and single exit. These type of RNNs are mainly used for general machine learning problems.
2. **One to Many RNNs:** An RNN variant where it takes a single input and generates a sequence of outputs. `For example:` Image Caption Generator
3. **Many to One RNNs:** An RNN variant where it takes a sequence of inputs and produces a single output. `For example:` Sentiment Analysis
4. **Many to Many RNNs  (Sequence-to-Sequence):** This variant takes a sequence of inputs and produces a sequence of outputs, matching the input sequence's length. `For example:` Machine Translation.
5. **Seq2seq RNNs  (Sequence-to-Sequence):** In this architecture, there are two RNNs: an encoder RNN that processes the input sequence and summarizes it into a fixed-size context vector, and a decoder RNN that generates the output sequence based on this context vector. `For example:` Speech Recognition.

![Types of RNN](./../../assets/rnn-types.jpg)

## Application areas of RNNs

- Natural Language Processing
- Time Series Prediction

## Backpropagation Through Time

Backpropagation Through Time (BPTT) is a gradient-based optimization algorithm used to train recurrent neural networks (RNNs) and other sequential models. It is an extension of the standard backpropagation algorithm to handle sequences of data.

![Back Propagation Through Time](./../../assets/bptt.jpg)

To compute the gradients of the objective function with respect to all the decomposed model parameters, we can use the chain rule of calculus to propagate gradients through time. The process involves three main steps:

### 1. Forward Pass

During the forward pass, we compute the hidden states and output for each time step using the given equations:

Hidden state at time step t:

$h_t = activation(W_{hh}h_{t−1} + W_{xh}x_t)$

Output at time step t:

$o_t = W_{ho}h_t$

Where:
- $h_t$ is the hidden state at time step `t`.
- $x_t$ is the input at time step `t`.
- $W_{hh}$ is the weight matrix for the hidden-to-hidden connections.
- $W_{xh}$ is the weight matrix for the input-to-hidden connections.
- $W_{ho}$ is the weight matrix for the hidden-to-output connections.
- `activation` is the identity mapping in this case.

### 2. Loss Computation

After obtaining the output $O_t$ for each time step, we compute the loss $L_t$ at each time stamp using a loss function (not specified in the given equations):

$L_t = loss(O_t, y_t)$

Where:
- is the target at time step `t`.

### 3. Backward Pass (BPTT)

The goal of BPTT is to compute the gradients of the objective function with respect to all the model parameters (weights). Starting from the last time step (`T`), we backpropagate the gradients through time by iteratively applying the chain rule.

- Initialize the gradient of the loss with respect to the output at the last time step:

    $\frac{\partial L_T}{\partial o_T} = \text{gradient of loss}(o_T, y_T)$

- For each time step t from T to 1:
    - Compute the gradient of the loss with respect to the hidden state:

        $\frac{\partial L_t}{\partial h_t} = \frac{\partial L_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial h_t} + \frac{\partial L_{t+1}}{\partial h_t} \cdot \frac{\partial h_{t+1}}{\partial h_t}$

    - Compute the gradients of the loss with respect to the weights:

        $
        \frac{\partial L_t}{\partial W_{ho}} = \frac{\partial L_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial W_{ho}} \\
        \frac{\partial L_t}{\partial W_{hh}} = \frac{\partial L_t}{\partial h_t} \cdot \frac{\partial h_t}{\partial W_{hh}} \\
        \frac{\partial L_t}{\partial W_{xh}} = \frac{\partial L_t}{\partial h_t} \cdot \frac{\partial h_t}{\partial W_{xh}}
        $

    - Update the gradient of the loss with respect to the output of the previous time step:

        $\frac{\partial L_{t-1}}{\partial o_{t-1}} = \frac{\partial L_t}{\partial h_t} \cdot \frac{\partial h_t}{\partial o_{t-1}}$

After computing the gradients for all time steps, we can update the model parameters using an optimization algorithm (e.g., stochastic gradient descent) to minimize the objective function.

The computational graph, shown in the figure above, helps visualize the dependencies among model variables and parameters during the computation of the RNN. Each node in the graph represents a variable, and each edge represents a computation involving those variables. The graph helps in understanding how gradients flow through the network during backpropagation through time.

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.preprocessing import sequence

In [2]:
# Set the parameters for the RNN
max_features = 10000  # Vocabulary size (use the top 10,000 most frequent words)
maxlen = 500  # Maximum sequence length (truncate/pad sequences to this length)
batch_size = 32

In [3]:
# Load the IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad/truncate sequences to a fixed length
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

In [4]:
# Build the RNN model
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=batch_size, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1c2da96a4d0>

In [5]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test loss: {loss:.4f}, Test accuracy: {accuracy:.4f}")

Test loss: 0.6633, Test accuracy: 0.8186


This program loads the IMDB dataset, which consists of movie reviews labeled with sentiment (positive or negative). It preprocesses the text data, creates a basic RNN model using TensorFlow's Sequential API, and trains the model on the training set. Finally, it evaluates the model on the test set and prints the test loss and accuracy.