# Hybrid CNN-RNN Model

Hybrid models combine **Convolutional Neural Networks (CNNs)** and **Recurrent Neural Networks (RNNs)**.

## Why Hybrid?
- CNNs are good at extracting **local spatial features** (e.g., from images, word embeddings).
- RNNs (LSTMs/GRUs) capture **temporal/sequential dependencies**.
- A hybrid model is useful in tasks like:
  - **Video classification** (frames → CNN → sequence → RNN)
  - **Image captioning** (CNN for features + RNN for sentence generation)
  - **Text classification** (CNN for n-gram features + RNN for context)

## 1. Import Libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

## 2. Define Hybrid CNN-RNN Model

In [2]:
class CNN_RNN_Model(nn.Module):
    def __init__(self, vocab_size, embed_size, num_classes, hidden_size, num_layers):
        super(CNN_RNN_Model, self).__init__()
        
        # Embedding Layer
        self.embedding = nn.Embedding(vocab_size, embed_size)
        
        # CNN Layer
        self.conv1 = nn.Conv1d(in_channels=embed_size, out_channels=128, kernel_size=5)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool1d(kernel_size=2)
        
        # RNN Layer (LSTM)
        self.lstm = nn.LSTM(input_size=128, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        
        # Fully connected output layer
        self.fc = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        x = self.embedding(x)       # (batch, seq_len, embed_size)
        x = x.permute(0, 2, 1)      # (batch, embed_size, seq_len)
        x = self.conv1(x)
        x = self.relu(x)
        x = self.pool(x)
        
        # Prepare for LSTM
        x = x.permute(0, 2, 1)      # (batch, seq_len, features)
        out, _ = self.lstm(x)
        out = out[:, -1, :]         # Last time step
        out = self.fc(out)
        return out

## 3. Initialize Model

In [3]:
vocab_size = 5000   # example vocab size
embed_size = 100
num_classes = 2     # e.g., positive/negative sentiment
hidden_size = 128
num_layers = 2

model = CNN_RNN_Model(vocab_size, embed_size, num_classes, hidden_size, num_layers)
print(model)

## 4. Training Setup (Example)
- Loss: CrossEntropyLoss
- Optimizer: Adam

In [4]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

## 5. Example Forward Pass

In [5]:
dummy_input = torch.randint(0, vocab_size, (8, 50))  # (batch=8, seq_len=50)
output = model(dummy_input)
print(output.shape)  # (8, num_classes)

### Expected Output:
```
torch.Size([8, 2])
```
This means for each of the 8 input sequences, the model predicts 2 class probabilities (binary classification).

## Summary
- CNN extracts local features (like n-grams).
- LSTM captures sequential dependencies.
- The hybrid model is powerful for **text and video tasks**.

Try training it on **sentiment analysis** or **video frame sequence data**.