# Implement an RNN in PyTorch

## Problem Statement

You are tasked with implementing a **Recurrent Neural Network (RNN)** in PyTorch to process sequential data. The goal is to classify human activities based on motion sensor readings from a real-world dataset.

### Dataset
Use the **Human Activity Recognition (HAR) dataset** from the UCI Machine Learning Repository:  
[UCI HAR Dataset](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones).  
This dataset contains accelerometer and gyroscope readings from smartphones, labeled with six different activities:  
- Walking  
- Walking Upstairs  
- Walking Downstairs  
- Sitting  
- Standing  
- Laying  

### Requirements

#### Define the RNN Model:
- Implement an **RNN-based classifier** that takes time-series sensor data as input and predicts the corresponding human activity.
- Use a **recurrent layer** (e.g., `nn.RNN`, `nn.LSTM`, or `nn.GRU`) to process sequential input.
- Add a **fully connected layer** (`nn.Linear`) to map the RNN output to the final classification.

#### Implement the Forward Pass:
- Process the input sequence through the RNN layer.
- Extract the final hidden state to make a prediction.
- Pass the final hidden state through the fully connected layer to obtain the output logits.

### Constraints
- The input data is a sequence of shape `(batch_size, sequence_length, input_dim)`, where:
  - `sequence_length` is the number of timesteps (128 in this dataset).
  - `input_dim` is the number of sensor features (9 in this dataset).
  - The model should output a prediction for each sample in the batch.
- Use appropriate configurations for:
  - **Hidden units** (`hidden_size`).
  - **Number of recurrent layers** (`num_layers`).
  - **Dropout** for regularization.
  
### Mathematical Formulation

A basic RNN computes the hidden state **\( h_t \)** at time step **\( t \)** using the formula:

$$h_t = \tanh(W_h h_{t-1} + W_x x_t + b)$$

where:
- \( x_t \) is the input at time **\( t \)**.
- \( h_{t-1} \) is the hidden state from the previous step.
- \( W_h, W_x, b \) are learnable parameters.

The final output is obtained using:

$$y = \text{softmax}(W_y h_T + b_y)$$

where \( h_T \) is the hidden state at the final timestep \( T \).

### 💡 Hints
1. Define an RNN layer (`self.rnn = nn.RNN(...)`) in `__init__`.
2. Use `self.fc = nn.Linear(hidden_size, num_classes)` for classification.
3. Extract the last hidden state and pass it through `self.fc` in `forward()`.
4. Consider using **GRU** (`nn.GRU`) or **LSTM** (`nn.LSTM`) for improved performance.

### Example Input and Output

#### Input:
A batch of accelerometer and gyroscope readings:

```python
torch.Size([32, 128, 9])  # batch_size=32, sequence_length=128, input_dim=9


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from torchsummary import summary

In [21]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        # input -> hidden 
        self.w_xh = nn.Linear(input_size, hidden_size)
        # hidden -> hidden
        self.w_hh = nn.Linear(hidden_size, hidden_size)
        # Final Fully Connected Layer for Output
        self.fcn = nn.Linear(hidden_size, num_classes)
        self.tanh = nn.Tanh()

    def forward(self, x):
        batch_size, seq_length, input_size = x.shape
        h_t = torch.randn((batch_size, self.hidden_size))
        
        # for each seq, generate output and hidden
        for i in range(seq_length):
            x_t = x[:, i, :]
            h_t = self.tanh(self.w_xh(x_t) + self.w_hh(h_t))

        # return output and hidden of the final seq
        y_t = self.fcn(h_t)
        return y_t
        
        

In [22]:
input_dim = 8
hidden_size = 32
model = RNN(input_dim, hidden_size, 3)
summary(model, input_size=(32, 8))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 32]             288
            Linear-2                   [-1, 32]           1,056
              Tanh-3                   [-1, 32]               0
            Linear-4                   [-1, 32]             288
            Linear-5                   [-1, 32]           1,056
              Tanh-6                   [-1, 32]               0
            Linear-7                   [-1, 32]             288
            Linear-8                   [-1, 32]           1,056
              Tanh-9                   [-1, 32]               0
           Linear-10                   [-1, 32]             288
           Linear-11                   [-1, 32]           1,056
             Tanh-12                   [-1, 32]               0
           Linear-13                   [-1, 32]             288
           Linear-14                   

In [23]:


# Define Toy Dataset
class ToySequenceDataset(Dataset):
    def __init__(self, num_samples=500, sequence_length=10, input_dim=5, num_classes=3):
        self.sequence_length = sequence_length
        self.input_dim = input_dim
        self.num_classes = num_classes

        # Generate random sequences
        self.data = torch.rand(num_samples, sequence_length, input_dim)  # Shape: (num_samples, seq_length, input_dim)
        
        # Labels based on sum over sequence
        sums = self.data.sum(dim=(1, 2))  # Sum across sequence and features
        self.labels = torch.tensor((sums % num_classes).long())  # Convert to 3-class classification

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]


# Initialize Dataset and DataLoader
batch_size = 16
toy_dataset = ToySequenceDataset(num_samples=500)
train_loader = DataLoader(toy_dataset, batch_size=batch_size, shuffle=True)

# Model Configuration
input_size = 5  # Features per timestep
hidden_size = 32
num_layers = 2
num_classes = 3  # Classification into 3 categories
learning_rate = 0.001
num_epochs = 200

# Training Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = RNN(input_size, hidden_size, num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training Loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for sequences, labels in train_loader:
        sequences, labels = sequences.to(device), labels.to(device)

        # Forward pass
        outputs = model(sequences)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    if epoch % 10 == 0:
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}")

# Test Model on Sample Data
model.eval()
with torch.no_grad():
    test_sequences, test_labels = next(iter(train_loader))  # Get a small batch
    test_sequences, test_labels = test_sequences.to(device), test_labels.to(device)

    predictions = model(test_sequences)
    predicted_classes = torch.argmax(predictions, dim=1)

    print("Sample Predictions:", predicted_classes.cpu().numpy())
    print("Actual Labels:     ", test_labels.cpu().numpy())


  self.labels = torch.tensor((sums % num_classes).long())  # Convert to 3-class classification


Epoch [1/200], Loss: 1.1011
Epoch [11/200], Loss: 1.0755
Epoch [21/200], Loss: 1.0574
Epoch [31/200], Loss: 1.0450
Epoch [41/200], Loss: 1.0231
Epoch [51/200], Loss: 1.0104
Epoch [61/200], Loss: 0.9973
Epoch [71/200], Loss: 0.9841
Epoch [81/200], Loss: 0.9396
Epoch [91/200], Loss: 0.9138
Epoch [101/200], Loss: 0.8781
Epoch [111/200], Loss: 0.8280
Epoch [121/200], Loss: 0.8083
Epoch [131/200], Loss: 0.7351
Epoch [141/200], Loss: 0.6787
Epoch [151/200], Loss: 0.6374
Epoch [161/200], Loss: 0.6173
Epoch [171/200], Loss: 0.5722
Epoch [181/200], Loss: 0.5138
Epoch [191/200], Loss: 0.4768
Sample Predictions: [1 2 0 0 2 1 2 1 0 0 1 1 2 2 1 0]
Actual Labels:      [1 2 0 1 2 0 2 1 0 0 1 1 1 2 1 0]
