<a href="https://colab.research.google.com/github/ReyhaneTaj/ML_Concepts_and_Algorithms/blob/main/RNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Networks (RNNs) Notebook

Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data by retaining a "memory" of previous inputs in the sequence. This makes them particularly useful for tasks where context or order matters, such as time-series forecasting, natural language processing (NLP), and speech recognition.

### Key Characteristics
- **Sequence Handling**: RNNs can process sequences of variable length.
- **Memory Mechanism**: They maintain a hidden state that captures information from previous time steps.

## RNN Architecture

An RNN is characterized by a loop within its architecture, allowing information to be passed from one step of the sequence to the next.

### Basic Components
- **Input Layer**: Receives the sequence data.
- **Hidden Layer**: Maintains the hidden state that is updated at each time step.
- **Output Layer**: Produces the output for each time step or for the entire sequence.

### Mathematical Representation
For a given time step `t`:
- **Hidden state update**:
  \[
  h_t = \sigma(W_{hh}h_{t-1} + W_{xh}x_t + b_h)
  \]
- **Output calculation**:
  \[
  y_t = W_{hy}h_t + b_y
  \]
Where:
- \( h_t \) is the hidden state at time step `t`.
- \( x_t \) is the input at time step `t`.
- \( y_t \) is the output at time step `t`.
- \( W_{hh} \), \( W_{xh} \), and \( W_{hy} \) are the weight matrices.
- \( b_h \) and \( b_y \) are the biases.
- \( \sigma \) is the activation function, typically `tanh` or `ReLU`.

### Backpropagation Through Time (BPTT)
RNNs are trained using a variant of backpropagation called Backpropagation Through Time (BPTT), which considers the dependencies between time steps.

## Types of RNNs

### 1. **Vanilla RNN**
   - The simplest form of RNN with a single hidden layer.

### 2. **Long Short-Term Memory (LSTM)**
   - Designed to address the vanishing gradient problem in vanilla RNNs.
   - Introduces gates (input, forget, and output gates) to control the flow of information.

### 3. **Gated Recurrent Unit (GRU)**
   - A variant of LSTM with fewer gates, making it computationally more efficient.

### 4. **Bidirectional RNN**
   - Processes the sequence in both forward and backward directions, capturing information from both past and future states.

### 5. **Deep RNNs**
   - Consist of multiple RNN layers stacked on top of each other, allowing the network to capture more complex patterns.

## Applications of RNNs

1. **Natural Language Processing (NLP)**
   - Language Modeling
   - Machine Translation
   - Sentiment Analysis

2. **Time-Series Forecasting**
   - Stock Price Prediction
   - Weather Forecasting

3. **Speech Recognition**
   - Voice Assistants (e.g., Siri, Alexa)
   - Automated Transcription

4. **Image Captioning**
   - Generating descriptions for images by combining Convolutional Neural Networks (CNNs) and RNNs.

## Challenges of RNNs

### 1. **Vanishing and Exploding Gradients**
   - Gradients can become too small or too large during training, leading to difficulty in learning long-term dependencies.

### 2. **Long Training Times**
   - RNNs can be slow to train due to the sequential nature of processing.

### 3. **Difficulty in Parallelization**
   - Unlike feedforward networks, RNNs process data sequentially, making them harder to parallelize.

### 4. **Short-Term Memory**
   - Vanilla RNNs struggle to retain information from the distant past, which is partially addressed by LSTMs and GRUs.

## Implementation in Python

Here’s a simple implementation of an RNN using TensorFlow:




In [None]:
#RNN in Python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Input

# Example definitions (these should be set according to your data)
timesteps = 10     # Number of time steps in each input sequence
input_dim = 20     # Number of features in each time step
output_dim = 5     # Number of classes in the output

# Define the model using an Input layer
model = Sequential([
    Input(shape=(timesteps, input_dim)),
    SimpleRNN(50, activation='tanh'),
    Dense(output_dim, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
#RNN in Pytorch

import torch
import torch.nn as nn
import torch.optim as optim

# Define the necessary dimensions and data
input_dim = 10      # Number of features in each time step
hidden_dim = 50     # Number of hidden units in the RNN
output_dim = 5      # Number of classes in the output
num_epochs = 20     # Number of training epochs

# Example data (replace with your actual data)
batch_size = 32
sequence_length = 15
X_train = torch.randn(batch_size, sequence_length, input_dim)  # Randomly generated data
y_train = torch.randint(0, output_dim, (batch_size,))          # Randomly generated labels

# Convert labels to one-hot encoding
y_train_onehot = torch.nn.functional.one_hot(y_train, num_classes=output_dim).float()

# Define the RNN model
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(RNNModel, self).__init__()
        self.rnn = nn.RNN(input_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), hidden_dim)  # Initial hidden state
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])  # Taking the output from the last time step
        return out

# Initialize the model, define the loss and optimizer
model = RNNModel(input_dim, hidden_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(num_epochs):
    model.train()  # Set the model to training mode

    # Forward pass
    outputs = model(X_train)

    # Compute the loss
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()        # Backpropagation
    optimizer.step()       # Update weights

    # Print loss
    if (epoch + 1) % 5 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')



Epoch [5/20], Loss: 1.5073
Epoch [10/20], Loss: 1.4229
Epoch [15/20], Loss: 1.3382
Epoch [20/20], Loss: 1.2486


As I understood **Recurrent Neural Networks** are powerful tools for modeling sequential data, but they come with **challenges** like **vanishing gradients** and **long training times.** By using advanced variants like **LSTMs** and **GRUs**, and following best practices, RNNs can be effectively applied to tasks in **NLP, time-series forecasting**, and more.