# Session 7: Building Models in PyTorch with `nn.Module`

**Objective:** To learn how to define, inspect, and modify neural network models in PyTorch by understanding layers, creating custom model classes, and leveraging pre-trained architectures.

In [None]:
! pip install torchvision

## Part 1: Concepts

### 1. PyTorch Layers: The Building Blocks

The `torch.nn` namespace contains all the building blocks we need to create neural networks. Think of them as individual layers that perform specific operations. Crucially, these layers are themselves callable objects (like functions) that can process tensors.

In [None]:
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Create a sample input tensor: 1 sample, 10 features
input_tensor = torch.randn(1, 10)
print(f"Input Tensor Shape: {input_tensor.shape}\n")

# --- Example 1: A Linear Layer ---
# A linear layer applies a linear transformation: y = Wx + b
# It takes in_features and maps them to out_features.
linear_layer = nn.Linear(in_features=10, out_features=5)

# Pass the input through the layer as if it were a function
output_tensor = linear_layer(input_tensor)
print(f"--- Linear Layer ---")
print(f"Output Tensor Shape: {output_tensor.shape}")
print(f"Output Tensor Values:\n {output_tensor}\n")

# --- Example 2: An Activation Function ---
# A ReLU (Rectified Linear Unit) activation is a non-linear operation.
relu_layer = nn.ReLU()
activated_tensor = relu_layer(output_tensor)
print(f"--- ReLU Layer ---")
print(f"Activated Tensor Values (negative values become 0):\n {activated_tensor}")

Note that `nn.Linear` is a *class*. 
When you write  `nn.Linear(in_features=10, out_features=5)` instead you are created a "real" linear layer which is an instance of that class.

### 2. Building a Model with `nn.Module`

To create a full model, we group layers together in a logical structure. The standard way to do this in PyTorch is to create a class that inherits from `nn.Module`.


What is inherited from `nn.Module`? Most importantly:
1. **"Parameter tracking"**. When you add a layer to a `nn.Module`, it actually constructs a robust structure where everything is accounted for (see e.g. the `module.parameters()` call)
2. **"State management"**. The methods to easily save and load the model etc. etc.

A custom model class has two essential parts:
1.  `__init__(self)`: This is where you **define** all the layers your model will use. You instantiate them here and assign them as attributes of the class.
2.  `forward(self, x)`: This is where you **define the data flow**. You take an input tensor `x` and pass it through the layers you defined in `__init__` in the correct sequence.

In [None]:
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        
        # 1. Define the layers
        # From their respective class blueprints -> generate the instances.
        self.layer1 = nn.Linear(in_features=10, out_features=32)
        self.activation = nn.ReLU()
        self.layer2 = nn.Linear(in_features=32, out_features=2) # Output 2 classes

    def forward(self, x):
        # 2. Define the forward pass
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

# Instantiate the model
model = SimpleNet()

In [None]:
model(input_tensor)

### 3. Inspecting a Model

Once you have a model instance, you can easily inspect its structure and access its learnable parameters. The `.parameters()` method is particularly important, as this is what you pass to an optimizer.

In [None]:
# Print the model architecture
print("--- Model Architecture ---")
print(model)

# Access and inspect learnable parameters
print("\n--- Model Parameters ---")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"Layer: {name}, Shape: {param.shape}")

### 4. Using Pre-trained Models

Training large models from scratch is computationally expensive. **Transfer Learning** is the practice of taking a powerful model that has already been trained on a massive dataset (like ImageNet) and adapting it for your own task. `torchvision.models` provides many such models.


In [None]:
import torchvision.models as models

# Load a pre-trained ResNet-18 model
resnet18 = models.resnet18(pretrained=True)

# Print the last few layers of the model to see its structure
print("--- Original ResNet-18 Final Layer ---")
print(resnet18.fc) # .fc is the name of the final fully connected layer in ResNet

In [None]:
print(resnet18)

### 5. Modifying a Pre-trained Model (Finetuning)

The most common transfer learning technique is to replace the final layer of a pre-trained model. The original model was trained to classify 1000 ImageNet classes. If our task is to classify 10 different classes (e.g., CIFAR-10), we need a final layer that outputs 10 values, not 1000.

The process is:
1.  **Freeze** the weights of the pre-trained layers so they don't change during training.
2.  **Replace** the final classification layer with a new one tailored to your task.

In [None]:
# 1. Freeze all the parameters in the network
for param in resnet18.parameters():
    param.requires_grad = False

# 2. Replace the final layer
num_features = resnet18.fc.in_features # Get the number of input features for the last layer
num_classes = 10 # Our new task has 10 classes



In [None]:
# Create a new final layer and assign it
resnet18.fc = nn.Linear(num_features, num_classes)

print("--- Modified ResNet-18 Final Layer ---")
print(resnet18.fc)

# Verify that only the new layer's parameters require gradients
print("\n--- Trainable Parameters After Modification ---")
for name, param in resnet18.named_parameters():
    if param.requires_grad:
        print(name)

## Part 2: Exercises & Debugging

### Lab 7.1: Build a Simple Multi-Layer Perceptron (MLP) for MNIST

* **Task:** Create a simple MLP to classify handwritten digits from the MNIST dataset.
  1. Define a class `SimpleMLP` that inherits from `nn.Module`.
  2. In `__init__`, define the following layers:
     - A flatten layer (`nn.Flatten`) to convert the 28x28 images into a 1D vector.
     - A linear layer that takes the flattened image (784 features) and maps it to a hidden dimension of 128 features.
     - A ReLU activation.
     - A second linear layer that maps the 128 hidden features to the 10 output classes (digits 0-9).
  3. Define the `forward` pass to connect these layers in sequence.
  4. Instantiate your MLP and pass a dummy image tensor through it to verify the output shape. A dummy tensor for a single MNIST image would have the shape `(1, 1, 28, 28)`.

In [None]:
# --- Your Code Here ---
class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        ...

    def forward(self, x):
        x = ...
        return x

# Instantiate and test
mlp = SimpleMLP()
dummy_image = torch.randn(1, 1, 28, 28) # (batch, channels, height, width)
output = mlp(dummy_image)

print("--- SimpleMLP Architecture ---")
print(mlp)
print(f"\nOutput shape for a dummy image: {output.shape}")

In [None]:
# 2. Prepare the MNIST Dataset
# Define a transform to normalize the data and convert images to tensors
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Download and load the training data
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Download and load the test data
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# 3. Instantiate the Model, Loss Function, and Optimizer
model = SimpleMLP()
criterion = nn.CrossEntropyLoss() # CrossEntropyLoss is ideal for multi-class classification
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# 4. The Training Loop
num_epochs = 5
print("Starting training...")

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, (images, labels) in enumerate(train_loader):
        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i + 1) % 100 == 0: # Print status every 100 batches
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

print('Finished Training')

# 5. Evaluate the Model (Optional but good practice)
correct = 0
total = 0
with torch.no_grad(): # We don't need to calculate gradients during evaluation
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')