# Lab 0. MNIST

In [5]:
# download the packeges
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
import torchvision.models as models
from torchvision.models import AlexNet_Weights
import torchvision.datasets as datasets
from torchvision.datasets import MNIST

Part 1: Train a CNN on MNIST

In the first part, we build and train a simple CNN on MNIST. Then we use the trained network (its weights) as a starting point (i.e. as a pretrained model) for training on the SVHN dataset. In this transfer‐learning setup we “fine‑tune” the model on SVHN. Notice that because MNIST images are 28×28 grayscale and SVHN images are originally 32×32 RGB, we convert SVHN to grayscale and resize it to 28×28 so that the network architecture remains unchanged.

In [2]:
# Device configuration
device = torch.device("cpu")

In [8]:
# Define transformation for MNIST
transform_mnist = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Download MNIST dataset
train_dataset_mnist = MNIST(root='./MNIST', train=True, download=True, transform=transform_mnist)
test_dataset_mnist = MNIST(root='./MNIST', train=False, download=True, transform=transform_mnist)

train_loader_mnist = DataLoader(dataset=train_dataset_mnist, batch_size=64, shuffle=True)
test_loader_mnist = DataLoader(dataset=test_dataset_mnist, batch_size=64, shuffle=False)

In [9]:
# Define a simple CNN for MNIST
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            # Input: 1 x 28 x 28
            nn.Conv2d(1, 32, kernel_size=3, padding=1),  # 32 x 28 x 28
            nn.ReLU(),
            nn.MaxPool2d(2),                             # 32 x 14 x 14
            nn.Conv2d(32, 64, kernel_size=3, padding=1), # 64 x 14 x 14
            nn.ReLU(),
            nn.MaxPool2d(2)                              # 64 x 7 x 7
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128),
            nn.ReLU(),
            nn.Linear(128, 10)  # 10 classes for MNIST
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

In [10]:
# Instantiate and train the model on MNIST
model_mnist = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_mnist.parameters(), lr=0.001)
num_epochs = 5  # use a small number for demonstration

print("Training on MNIST...")
for epoch in range(num_epochs):
    model_mnist.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader_mnist:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model_mnist(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    epoch_loss = running_loss / total
    epoch_acc = 100 * correct / total
    print(f"Epoch {epoch+1}/{num_epochs} - Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%")

# Evaluate on MNIST test set
model_mnist.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader_mnist:
        images, labels = images.to(device), labels.to(device)
        outputs = model_mnist(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
mnist_test_acc = 100 * correct / total
print(f"\nMNIST Test Accuracy: {mnist_test_acc:.2f}%")

Training on MNIST...
Epoch 1/5 - Loss: 0.1544, Accuracy: 95.30%
Epoch 2/5 - Loss: 0.0469, Accuracy: 98.54%
Epoch 3/5 - Loss: 0.0319, Accuracy: 98.99%
Epoch 4/5 - Loss: 0.0226, Accuracy: 99.29%
Epoch 5/5 - Loss: 0.0171, Accuracy: 99.45%

MNIST Test Accuracy: 99.04%


Because MNIST is a relatively simple and clean dataset, our CNN learns to recognize digits very effectively, achieving very high accuracy (around 99%). This training process establishes a strong baseline and creates a set of learned features that are specific to digit recognition.

Part 2: Transfer Learning from MNIST to SVHN

In the second part, we use the MNIST-trained CNN as a starting point for the SVHN dataset. SVHN consists of more complex 32×32 RGB images of house numbers. To make these images compatible with our MNIST model, we convert them to grayscale and resize them to 28×28. Then, we fine-tune the pre-trained model on SVHN. This fine-tuning allows the model to adapt its learned features to the new, more challenging domain.

In [11]:
# Define transformations for SVHN: convert to grayscale, resize to 28x28, and normalize
transform_svhn = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),  # Convert RGB to grayscale
    transforms.Resize(28),                          # Resize to 28x28
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

In [13]:
# Download SVHN dataset (the train set here is used for fine-tuning)
train_dataset_svhn = datasets.SVHN(root='./data', split='train', download=True, transform=transform_svhn)
test_dataset_svhn = datasets.SVHN(root='./data', split='test', download=True, transform=transform_svhn)

train_loader_svhn = DataLoader(dataset=train_dataset_svhn, batch_size=64, shuffle=True)
test_loader_svhn = DataLoader(dataset=test_dataset_svhn, batch_size=64, shuffle=False)


Using downloaded and verified file: ./data\train_32x32.mat
Using downloaded and verified file: ./data\test_32x32.mat


In [14]:
# Use the model pre-trained on MNIST as initialization.
# Optionally, you can freeze some layers. Here, we'll fine-tune the whole network.
model_svhn = SimpleCNN().to(device)
model_svhn.load_state_dict(model_mnist.state_dict())  # load pre-trained MNIST weights

# Define a new optimizer for fine-tuning on SVHN
optimizer_svhn = optim.Adam(model_svhn.parameters(), lr=0.001)
num_epochs_svhn = 5  # number of fine-tuning epochs

print("\nFine-tuning on SVHN...")
for epoch in range(num_epochs_svhn):
    model_svhn.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader_svhn:
        images, labels = images.to(device), labels.to(device)
        optimizer_svhn.zero_grad()
        outputs = model_svhn(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer_svhn.step()
        
        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    epoch_loss = running_loss / total
    epoch_acc = 100 * correct / total
    print(f"Epoch {epoch+1}/{num_epochs_svhn} - Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%")

# Evaluate on SVHN test set
model_svhn.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader_svhn:
        images, labels = images.to(device), labels.to(device)
        outputs = model_svhn(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
svhn_test_acc = 100 * correct / total
print(f"\nSVHN Test Accuracy (Transfer Learning from MNIST): {svhn_test_acc:.2f}%")


Fine-tuning on SVHN...
Epoch 1/5 - Loss: 0.6427, Accuracy: 81.09%
Epoch 2/5 - Loss: 0.3873, Accuracy: 88.66%
Epoch 3/5 - Loss: 0.3103, Accuracy: 90.87%
Epoch 4/5 - Loss: 0.2567, Accuracy: 92.44%
Epoch 5/5 - Loss: 0.2143, Accuracy: 93.81%

SVHN Test Accuracy (Transfer Learning from MNIST): 89.77%


Although the transferred model starts with a good initialization (thanks to MNIST), its performance on SVHN is lower (around 89.77% test accuracy) due to the higher variability and complexity in SVHN images.

Summary of Differences and Results

- Dataset Complexity:
MNIST is simpler, with centered, clean handwritten digits, leading to near-perfect performance. In contrast, SVHN is more diverse with real-world variations in color, background, and noise, making it a harder task.

- Training Strategy:
In Part 1, the model is fully trained on MNIST from scratch. In Part 2, we leverage transfer learning by fine-tuning the MNIST model on SVHN. This means the model uses the robust features learned on MNIST as a foundation but adapts them to handle the additional complexity in SVHN.

- Results:
The MNIST model achieves extremely high accuracy because the task is straightforward. The SVHN model, while benefiting from the transferred features, achieves lower accuracy due to the increased difficulty of the dataset. This demonstrates that transfer learning can be effective, but the performance is influenced by how similar the source and target domains are.

Overall, transfer learning from MNIST to SVHN shows that while features learned on a simple dataset can help with a more complex one, the differences in image characteristics naturally lead to a drop in performance when moving to a more challenging task.