# CSE 151B: Homework 2 Coding
## PyTorch Implementation

Using PyTorch’s `Sequential` model class, build a deep convolutional network to classify handwritten digits in MNIST.

You are only allowed to use the following in your model design:
- Linear Layers
- Conv2D
- MaxPool2D
- BatchNorm2D
- Dropout Layers
- ReLU and Softmax
- Flatten

Your goal is to build a model that achieves **test accuracy ≥ 0.985** with fewer than 1 million parameters.

**Warning**: The modules in your Sequential network should *only* consist of `nn` objects! That means you should not be using `torch.nn.functional` modules or lambda expressions in your Sequential block. Leaving functional/lambda expressions in your model code will result in no credit!

This notebook provides a skeleton layout for you. You may use whatever parts of this notebook you deem necessary; there is no need for you to adhere to the structure. However, during submission, you must carefully follow the zip file formatting as requested; see the bottom of the notebook.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

In [9]:
def get_data_loaders(batch_size) -> tuple[DataLoader, DataLoader]:
    '''
    Return the training and testing MNIST dataloaders.
    '''
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
    
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    return train_loader, test_loader

In [None]:
def build_model(dropout_prob=0.5) -> nn.Module:
    model = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=3, padding=1),  # 28x28 -> 28x28
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=3, padding=1), # 28x28 -> 28x28
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2),                              # 28x28 -> 14x14
        nn.Dropout(dropout_prob),

        nn.Conv2d(64, 128, kernel_size=3, padding=1), # 14x14 -> 14x14
        nn.BatchNorm2d(128),
        nn.ReLU(),
        nn.MaxPool2d(2),                              # 14x14 -> 7x7
        nn.Dropout(dropout_prob),

        nn.Flatten(),
        nn.Linear(128 * 7 * 7, 256),
        nn.ReLU(),
        nn.Dropout(dropout_prob),
        nn.Linear(256, 10),
        nn.Softmax(dim=1)
    )
    return model

1701578


In [23]:
def check_params():
    model = build_model()
    print(f"Number of parameters: {sum(p.numel() for p in model.parameters())}")

In [24]:
def train(model, optimizer, criterion, train_loader, n_epochs = 1):
    '''
    Train the model for `n_epochs` epochs. Returns none (model is modified in place)
    '''
    model.train()
    for epoch in range(n_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        for batch_idx, (inputs, labels) in enumerate(train_loader):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        print(f"Epoch [{epoch+1}/{n_epochs}] Loss: {running_loss/len(train_loader):.4f} Accuracy: {correct/total:.4f}")


In [25]:
def test(model, test_loader):
    '''
    Tests the model. Returns none (you should print the accuracy)
    '''
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = correct / total
    print(f"Test Accuracy: {accuracy:.4f}")
    return accuracy

In [29]:
# try 10 different dropout values
train_loader, test_loader = get_data_loaders(batch_size=32)
criterion = nn.CrossEntropyLoss()
dropout_values = [i / 10 for i in range(10)]
accuracies = []
for p in dropout_values:
    model = build_model(dropout_prob=p)
    
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    train(model, optimizer, criterion, train_loader)
    test(model, test_loader)
    torch.save(model, f'hw2_dropout_{p}.pt')

Epoch [1/1] Loss: 1.8346 Accuracy: 0.6247
Test Accuracy: 0.7846
Epoch [1/1] Loss: 1.9882 Accuracy: 0.4713
Test Accuracy: 0.6937
Epoch [1/1] Loss: 2.3620 Accuracy: 0.0991
Test Accuracy: 0.1009
Epoch [1/1] Loss: 1.6626 Accuracy: 0.7978
Test Accuracy: 0.9744
Epoch [1/1] Loss: 1.7684 Accuracy: 0.6914
Test Accuracy: 0.8720
Epoch [1/1] Loss: 1.5938 Accuracy: 0.8683
Test Accuracy: 0.9763
Epoch [1/1] Loss: 1.6224 Accuracy: 0.8406
Test Accuracy: 0.9704
Epoch [1/1] Loss: 1.7188 Accuracy: 0.7435
Test Accuracy: 0.9570
Epoch [1/1] Loss: 1.8900 Accuracy: 0.5672
Test Accuracy: 0.9297
Epoch [1/1] Loss: 2.1861 Accuracy: 0.2509
Test Accuracy: 0.7302


After training we have 

Epoch [1/1] Loss: 1.8346 Accuracy: 0.6247
Test Accuracy: 0.7846

Epoch [1/1] Loss: 1.9882 Accuracy: 0.4713
Test Accuracy: 0.6937

Epoch [1/1] Loss: 2.3620 Accuracy: 0.0991
Test Accuracy: 0.1009

Epoch [1/1] Loss: 1.6626 Accuracy: 0.7978
Test Accuracy: 0.9744

Epoch [1/1] Loss: 1.7684 Accuracy: 0.6914
Test Accuracy: 0.8720

Epoch [1/1] Loss: 1.5938 Accuracy: 0.8683
Test Accuracy: 0.9763

Epoch [1/1] Loss: 1.6224 Accuracy: 0.8406
Test Accuracy: 0.9704

Epoch [1/1] Loss: 1.7188 Accuracy: 0.7435
Test Accuracy: 0.9570

Epoch [1/1] Loss: 1.8900 Accuracy: 0.5672
Test Accuracy: 0.9297

Epoch [1/1] Loss: 2.1861 Accuracy: 0.2509
Test Accuracy: 0.7302

In [32]:

# find your best model, and train it for 10 epochs
best_p = dropout_values[accuracies.index(max(accuracies))]  
print(f"Best dropout probability: {best_p}")

model = build_model(dropout_prob=best_p)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
train(model, optimizer, criterion, train_loader, n_epochs=10)
test(model, test_loader)
torch.save(model, "hw2_model.pt")

Best dropout probability: 0.0
Epoch [1/10] Loss: 1.6403 Accuracy: 0.8196
Epoch [2/10] Loss: 1.4931 Accuracy: 0.9679
Epoch [3/10] Loss: 1.4891 Accuracy: 0.9718
Epoch [4/10] Loss: 1.4831 Accuracy: 0.9781
Epoch [5/10] Loss: 1.4822 Accuracy: 0.9791
Epoch [6/10] Loss: 1.4804 Accuracy: 0.9807
Epoch [7/10] Loss: 1.4813 Accuracy: 0.9799
Epoch [8/10] Loss: 1.4786 Accuracy: 0.9825
Epoch [9/10] Loss: 1.4791 Accuracy: 0.9820
Epoch [10/10] Loss: 1.4787 Accuracy: 0.9824
Test Accuracy: 0.9889


After training we have 

Epoch [1/10] Loss: 1.6403 Accuracy: 0.8196

Epoch [2/10] Loss: 1.4931 Accuracy: 0.9679

Epoch [3/10] Loss: 1.4891 Accuracy: 0.9718

Epoch [4/10] Loss: 1.4831 Accuracy: 0.9781

Epoch [5/10] Loss: 1.4822 Accuracy: 0.9791

Epoch [6/10] Loss: 1.4804 Accuracy: 0.9807

Epoch [7/10] Loss: 1.4813 Accuracy: 0.9799

Epoch [8/10] Loss: 1.4786 Accuracy: 0.9825

Epoch [9/10] Loss: 1.4791 Accuracy: 0.9820

Epoch [10/10] Loss: 1.4787 Accuracy: 0.9824

Test Accuracy: 0.9889

# Submission Instructions

Zip all of your **code** and **model .pt files** into one file, and submit on Gradescope to the respective submission.