<a href="https://colab.research.google.com/github/KhotNoorin/Deep-Learning/blob/main/Deep_RNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Recurrent Neural Networks (Deep RNNs):

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/0*3MT2-1Ekkoh_N8sG.png" width="1000"/>



Deep Recurrent Neural Networks (Deep RNNs) are an extension of standard Recurrent Neural Networks (RNNs) that consist of multiple hidden layers between the input and output layers. While a basic RNN has only one hidden layer, a Deep RNN stacks multiple RNN layers on top of each other, enabling the network to learn hierarchical features from sequential data.



Standard RNNs often struggle to model complex patterns in sequences due to their limited depth. Adding depth in the time dimension (by unfolding over time) does not sufficiently improve their representational power. Deep RNNs introduce additional depth in the spatial (layer-wise) dimension to capture more abstract temporal dependencies and enhance modeling capacity.

## Architecture

In a Deep RNN, each time step has multiple hidden layers stacked on top of one another. The input to each layer at a given time step is the output of the layer below it, while the recurrent connection is maintained at each layer independently across time steps.

Let:
- x_t be the input at time t
- h_t^l be the hidden state at time t and layer l
- L be the total number of layers

Then the recurrence relations are:
- h_t^1 = f(W^1 x_t + U^1 h_{t-1}^1 + b^1)
- h_t^l = f(W^l h_t^{l-1} + U^l h_{t-1}^l + b^l) for l = 2, ..., L

Where:
- W^l, U^l, and b^l are the input weight matrix, recurrent weight matrix, and bias for layer l
- f is an activation function, typically tanh or ReLU

## Benefits of Deep RNNs

- **Hierarchical feature learning**: Lower layers capture simpler patterns (e.g., phonemes), while higher layers capture complex structures (e.g., words or sentences).
- **Improved modeling capacity**: Better at capturing long-range dependencies and complex temporal dynamics.
- **Better generalization**: Deeper models often generalize better with appropriate regularization and training strategies.

## Challenges

- **Vanishing and exploding gradients**: More severe in deep RNNs than in shallow ones. Techniques like LSTM/GRU cells, gradient clipping, and residual connections help mitigate this.
- **Training difficulty**: Requires careful initialization, optimization techniques (e.g., Adam, RMSprop), and possibly pretraining.
- **Overfitting**: More parameters increase the risk of overfitting, requiring regularization strategies like dropout, weight decay, or early stopping.

## Variants and Improvements

- **Deep Bidirectional RNNs**: Stack bidirectional layers to incorporate context from both past and future.
- **Deep LSTMs/GRUs**: Use more advanced recurrent units like Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) cells in a deep configuration.
- **Residual and Highway connections**: Facilitate training deep RNNs by allowing gradient flow through shortcut paths.




In [9]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense,LSTM

In [10]:
# Load the IMDb dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

# Pad sequences to have the same length
x_train = pad_sequences(x_train, maxlen=100)
x_test = pad_sequences(x_test, maxlen=100)

In [12]:
# Define the Deep RNN model
model = Sequential([
    Embedding(input_dim=10000, output_dim=32),  # Embedding layer to convert words to vectors
    SimpleRNN(5, return_sequences=True),        # RNN layer with 5 units
    SimpleRNN(5),                                # Another RNN layer with 5 units
    Dense(1, activation='sigmoid')               # Output layer for binary classification
])
# Build the model with the expected input shape
model.build(input_shape=(None, 100))  # Batch size can be variable (None), sequence length = 100

model.summary()

In [13]:
# Define the LSTM model
model = Sequential([
    Embedding(input_dim=10000, output_dim=32),
    LSTM(5, return_sequences=True),
    LSTM(5),
    Dense(1, activation='sigmoid')
])

# Build the model to initialize parameters and define shapes
model.build(input_shape=(None, 100))  # (batch_size, sequence_length)

model.summary()

In [14]:
# Define the GRU model
model = Sequential([
    Embedding(input_dim=10000, output_dim=32),
    GRU(5, return_sequences=True),
    GRU(5),
    Dense(1, activation='sigmoid')
])

# Build the model manually to set the input shape
model.build(input_shape=(None, 100))  # (batch_size, sequence_length)

model.summary()

In [15]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [16]:
# Train the model
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.6197 - loss: 0.6295 - val_accuracy: 0.7992 - val_loss: 0.4515
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step - accuracy: 0.8475 - loss: 0.3710 - val_accuracy: 0.8194 - val_loss: 0.4223
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 11ms/step - accuracy: 0.8927 - loss: 0.2871 - val_accuracy: 0.8334 - val_loss: 0.3990
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 12ms/step - accuracy: 0.9188 - loss: 0.2248 - val_accuracy: 0.8292 - val_loss: 0.4088
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 11ms/step - accuracy: 0.9434 - loss: 0.1726 - val_accuracy: 0.8282 - val_loss: 0.4476
