# Introduction


In this assignment, you will practice building and training Convolutional Neural Networks with Pytorch to solve computer vision tasks.  This assignment includes two sections, each involving different tasks:

(1) Image Classification. Predict image-level category labels on two historically notable image datasets: **CIFAR-10** and **MNIST**.

(2) Image Segmentation. Predict pixel-wise classification (semantic segmentation) on synthetic input images formed by superimposing MNIST images on top of CIFAR images.

You will design your own models in each section and build the entire training/testing pipeline with PyTorch.
PyTorch provides optimized implementations of the building blocks and additional utilities, both of which will be necessary for experiments on real datasets. It is highly recommended to read the official [documentation](https://pytorch.org/docs/stable/index.html) and [examples](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) before starting your implementation. There are some APIs that you'll find useful:
[Layers](http://pytorch.org/docs/stable/nn.html),
[Activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),
[Loss functions](http://pytorch.org/docs/stable/nn.html#loss-functions),
[Optimizers](http://pytorch.org/docs/stable/optim.html)

It is highly recommended to use Google Colab and run the notebook on a GPU node. Check https://colab.research.google.com/ and look for tutorials online. To use a GPU go to Runtime -> Change runtime type and select GPU.






# (1) Image Classification

In this section, you will design and train an image classification network, which takes images as input and outputs vectors whose length equals the number of possible categories on **MNIST** and **CIFAR-10** datasets.

You can design your models by borrowing ideas from recent architectures, e.g., ResNet, but you may not simply copy an entire existing model.

For image classification, you can use a built-in dataset provided by [torchvision](https://pytorch.org/vision/stable/index.html), a PyTorch official extension for image tasks.

To finish this section step by step, you need to:

* Prepare data by building a dataset and dataloader. (with [torchvision](https://pytorch.org/vision/stable/index.html))

* Implement training code (6 points) & testing code (6 points), including model saving and loading.

* Construct a model (12 points) and choose an optimizer (3 points).

* Describe what you did, any additional features you implemented, and/or any graphs you made in training and evaluating your network. Also report final test accuracy @100 epochs in a writeup: hw3.pdf (3 points)

In [None]:
# !pip install torch==2.1.0 # working on torch 2.1.0
import numpy as np
import os
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torch.utils.data import sampler
import torchvision
import torchvision.transforms as T

## Data Preparation:

Setup a Dataset for training and testing.

Datasets load single training examples one a time, so we practically wrap each Dataset in a DataLoader, which loads a data batch in parallel.

We provide an example for setting up a training set for MNIST, and you should complete the rest.

In [None]:
##################### MNIST DATA LOADING
mnist_data = torchvision.datasets.MNIST('./data', train = True, download = True, transform = T.ToTensor())

mtrain_size = int(0.8 * len(mnist_data))
mtest_size = len(mnist_data) - mtrain_size
mnist_train, mnist_test = random_split(mnist_data, [mtrain_size, mtest_size])

mnist_loader_train = torch.utils.data.DataLoader(mnist_test,
                                          batch_size=16,
                                          shuffle=True,
                                          num_workers=1)

mnist_loader_test = torch.utils.data.DataLoader(mnist_train,
                                          batch_size=16,
                                          shuffle=False,
                                          num_workers=1)

##################### CIFAR-10 DATA LOADING
CIFAR10_data = torchvision.datasets.CIFAR10('./data', train = True, download=True, transform = T.ToTensor())

ctrain_size = int(0.8 * len(CIFAR10_data))
ctest_size = len(CIFAR10_data) - ctrain_size
c_train, c_test = random_split(CIFAR10_data, [ctrain_size, ctest_size])

cifar10_loader_train = torch.utils.data.DataLoader(c_test,
                                          batch_size=128,
                                          shuffle=True,
                                          num_workers=1)

cifar10_loader_test = torch.utils.data.DataLoader(c_train,
                                          batch_size=128,
                                          shuffle=False,
                                          num_workers=1)

## Design/choose your own model structure (12 points) and optimizer (3 points).
You might want to adjust the following configurations for better performance:

(1) Network architecture:
- You can borrow some ideas from existing CNN designs, e.g., ResNet where
the input from the previous layer is added to the output
https://arxiv.org/abs/1512.03385
- Note: Do not **directly copy** an entire existing network design.

(2) Architecture hyperparameters:
- Filter size, number of filters, and number of layers (depth). Make careful choices to tradeoff computational efficiency and accuracy.
- Pooling vs. Strided Convolution
- Batch normalization
- Choice of non-linear activation

(3) Choice of optimizer (e.g., SGD, Adam, Adagrad, RMSprop) and associated hyperparameters (e.g., learning rate, momentum).

# MNIST MODEL ARCHITECTURE

In [None]:
class MNIST(nn.Module):
    def __init__(self):
        super(MNIST, self).__init__()
        # convolution blocks
        self.conv1 = nn.Conv2d(1, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # linear reduction for classification
        self.fc1 = nn.Linear(8192, 1024)
        self.fc2 = nn.Linear(1024, 512)
        self.fc3 = nn.Linear(512, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.pool(F.relu(self.conv4(x)))
        x = x.view(x.size(0), -1)
        return x

# CIFAR-10 MODEL ARCHITECTURE

In [None]:
# resblock architecture with striding
class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResBlock, self).__init__()
        # convolution / batch norm blocks
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = nn.Sequential()
        # strided downsampling
        if stride != 1 or in_channels != out_channels:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += self.downsample(identity)
        out = self.relu(out)
        return out

# primary CIFAR10 model
class CIFAR10(nn.Module):
    def __init__(self):
        super(CIFAR10, self).__init__()
        # convolution / resnet blocks
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.resblock1 = ResBlock(64, 64)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.resblock2 = ResBlock(128, 128)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.resblock3 = ResBlock(256, 256)
        self.conv4 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
        self.resblock4 = ResBlock(512, 512)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # linear reduction for classification
        self.fc1 = nn.Linear(8192, 1024)
        self.fc2 = nn.Linear(1024, 512)
        self.fc3 = nn.Linear(512, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.resblock1(x)
        x = self.pool(F.relu(self.conv2(x)))
        x = self.resblock2(x)
        x = self.pool(F.relu(self.conv3(x)))
        x = self.resblock3(x)
        x = self.pool(F.relu(self.conv4(x)))
        x = self.resblock4(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

## Training (6 points)

Train a model on the given dataset using the PyTorch Module API.

Inputs:
- loader_train: The loader from which train samples will be drawn from.
- loader_test: The loader from which test samples will be drawn from.
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model.
- epochs: (Optional) A Python integer giving the number of epochs to train for.

Returns: Nothing, but prints model accuracies during training.

In [None]:
def train(loader_train, loader_test, model, optimizer, epochs=100):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # set to cuda
    model = model.cuda()
    # standard cross entropy loss function
    criterion = nn.CrossEntropyLoss()


    for e in range(epochs):
        model.train()
        for t, (x, y) in enumerate(loader_train):
            # (1) move data to GPU
            x, y = x.to(device), y.to(device)

            # (2) forward and get loss
            output = model(x)
            loss = criterion(output, y)

            # (3) zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # (4) the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # (5) update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            # only print out every 25 steps
            if t % 25 == 0:
              print('Epoch %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))


        test(loader_test, model)
        # check point at each epoch for resuming training
        torch.save(model.state_dict(), "model.pth")

## Testing (6 points)
Test a model using the PyTorch Module API.

Inputs:
- loader: The loader from which test samples will be drawn from.
- model: A PyTorch Module giving the model to test.

Returns: Nothing, but prints model accuracies during training.

In [None]:
def test(loader, model):
    num_correct = 0
    num_samples = 0
    model.eval() # set model to evaluation mode
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    with torch.no_grad():
        for x, y in loader:
            # (1) move to GPU
            x, y = x.to(device), y.to(device)

            # (2) forward and calculate scores and predictions
            output = model(x)
            _, predicted = torch.max(output.data, 1)

            # (3) accumulate num_correct and num_samples
            num_samples += y.size(0)
            num_correct += (predicted == y).sum().item()


    acc = num_correct / num_samples
    print('Eval %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

In [None]:
# load model util for partial model training
def load_model(model, load_path):
    model.load_state_dict(torch.load(load_path))
    print(f"Model loaded from {load_path}")
    return model

Describe your design details in the writeup hw3.pdf. (3 points)

Finish your model and optimizer below.

# MNIST Implementation

In [None]:
lr = 0.01 # standard learning rate, momentum, decay rate
momentum = 0.9
weight_decay = 0.01
model = MNIST()
optimizer = optim.SGD(model.parameters(), lr, momentum, weight_decay)
train(mnist_loader_train, mnist_loader_test, model, optimizer, epochs=100)

# loaded_model = load_model(MNIST_models(), "model.pth") # example model loading for resuming training

# CIFAR10 Implementation

In [None]:
lr = 0.01 # standard learning rate, momentum, decay rate
momentum = 0.9
weight_decay = 0.01
model = CIFAR10()
optimizer = optim.SGD(model.parameters(), lr, momentum, weight_decay)
train(cifar10_loader_train, cifar10_loader_test, model, optimizer, epochs=100)

# loaded_model = load_model(CIFAR10_models(), "model.pth") # example model loading for resuming training