<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day62.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding RNN Architecture and Backpropagation Through Time(BPTT)

## Detailed Architecture of RNNs

* **Components of an RNN**
    * **Input Layer**
        * Takes sequential data as input at each time step
    * **Hidden Layer**
        * Maintains a 'memory' of past inputs through recurrent connections. The hidden state at time $t$ ($h_t$) is calculated as
 $$
  h_t = f(W_{h} \cdot h_{t-1} + W_{x} \cdot x_{t} + b_h)
 $$
 * $W_h$: Weight matrix for recurrent connections
 * $W_x$: Weight matrix for input connections
 * $b_h$: Bias term
 * $f$: Non-linear activation function (e.g., tanh, ReLU)
    * **Output Layer**
        * Produces output $y_t$, based on the hidden state $h_t$
 * $$y_t = g(W_{y} \cdot h_{t} + b_{y})$$
* $g$: Activation function (e.g., softmax for classification)
  

# Backpropagation Through Time (BPTT)

* **What is BPTT?**
    * Extension of standard backpropagation to handle sequential data in RNNs
    * It calculates gradients for each time step and propagates them backward through the sequence

* **Steps of BPTT**
    * Unroll the RNN across the sequence for a fixed number of time steps
    * Compute the loss for each time step
    * Backpropagate the errors across all time steps to update weights

* **Challenges in BPTT**
    * **Vanishing Gradient Problem**
        * Gradients diminish exponentially as they are propagated back through time
        * Leads to difficulty in learning long-term dependencies
    * **Exploding Gradient Problem**
        * Gradients grow exponentially, causing numerical instability during training

* **Solutions**
    * Use **gradient clipping** to handle exploding gradients
    * Use architectures like **Long Short-Term Memory (LSTM)** or **Gated Recurrent Units (GRU)** to mitigate the vanishing gradient problem

# Limitations of Vanilla RNNs

* **Short-Term Memory**
    * Struggle to learn dependencies in long sequences due to vanishing gradients

* **Sequential Computation**
    * Cannot parallelize training across time steps, making them computationally expensive

* **Sensitive Initialization**
    * Performance depends heavily on proper weight initialization and learning rates

**Objective**
- Build a simple RNN model for Text Classification using TensorFlow or PyTorch

- Train the RNN and observe how it captures sequence patterns

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

vocab_size = 10000
max_len = 200

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

X_train = pad_sequences(X_train, maxlen = max_len, padding="post")
X_test = pad_sequences(X_test, maxlen = max_len, padding="post")

print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)

model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=128),
    SimpleRNN(128, activation="tanh", return_sequences=False),
    Dense(1, activation="sigmoid")
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

model.summary()

history=model.fit(X_train, y_train, epochs = 5, batch_size=32, validation_split=0.2)
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Training data shape: (25000, 200)
Testing data shape: (25000, 200)


Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 111ms/step - accuracy: 0.5165 - loss: 0.6931 - val_accuracy: 0.5350 - val_loss: 0.6883
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 105ms/step - accuracy: 0.5538 - loss: 0.6805 - val_accuracy: 0.5314 - val_loss: 0.6825
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 112ms/step - accuracy: 0.5752 - loss: 0.6607 - val_accuracy: 0.5444 - val_loss: 0.6793
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 105ms/step - accuracy: 0.6001 - loss: 0.6273 - val_accuracy: 0.5522 - val_loss: 0.6867
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 107ms/step - accuracy: 0.6302 - loss: 0.5953 - val_accuracy: 0.5694 - val_loss: 0.6820
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 27ms/step - accuracy: 0.5776 - loss: 0.6744
Test Loss: 0.6771971583366394
Test Accuracy: 0.5767999887466431


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

vocab_size = 10000
max_len = 200

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

X_train = pad_sequences(X_train, maxlen = max_len, padding="post")
X_test = pad_sequences(X_test, maxlen = max_len, padding="post")

train_dataset = TensorDataset(torch.tensor(X_train, dtype=torch.long), torch.tensor(y_train, dtype=torch.float))
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

class RNNModel(nn.Module):
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    super(RNNModel, self).__init__()
    self.embedding = nn.Embedding(vocab_size, embedding_dim)
    self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
    self.fc = nn.Linear(hidden_dim, output_dim)

  def forward(self, x):
    embedded = self.embedding(x)
    output, hidden = self.rnn(embedded)
    return torch.sigmoid(self.fc(hidden.squeeze(0)))
model = RNNModel(vocab_size=10000, embedding_dim=128, hidden_dim=128, output_dim=1)

criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def train_rnn(model, train_loader, criterion, optimizer, epochs=5):
  model.train()
  for epoch in range(epochs):
    epoch_loss = 0
    for X_batch, y_batch in train_loader:
          optimizer.zero_grad()
          predictions = model(X_batch).squeeze(1)
          loss = criterion(predictions, y_batch.float())
          loss.backward()
          optimizer.step()
          epoch_loss += loss.item()
  print(f"Epoch {epoch+1}, Loss: {epoch_loss/len(train_loader):.4f}")
train_rnn(model, train_loader, criterion, optimizer)

def evaluate_rnn(model, X_test, y_test):
  model.eval()
  with torch.no_grad():
    predictions = model(torch.tensor(X_test)).squeeze(1)
    loss = criterion(predictions, torch.tensor(y_test).float())
    accuracy = ((predictions > 0) == torch.tensor(y_test).float()).float().mean().item()
  print(f"Test Loss: {loss:.4f}")
  print(f"Test Accuracy: {accuracy:.4f}")

evaluate_rnn(model, X_test, y_test)


Epoch 5, Loss: 0.5843
Test Loss: 0.6733
Test Accuracy: 0.5000
