# Introduction


In this assignment, you will practice building and training Convolutional Neural Networks with Pytorch to solve computer vision tasks.  This assignment includes two sections, each involving different tasks:

(1) Image Classification. Predict image-level category labels on two historically notable image datasets: **CIFAR-10** and **MNIST**.

(2) Image Segmentation. Predict pixel-wise classification (semantic segmentation) on synthetic input images formed by superimposing MNIST images on top of CIFAR images.

You will design your own models in each section and build the entire training/testing pipeline with PyTorch. 
PyTorch provides optimized implementations of the building blocks and additional utilities, both of which will be necessary for experiments on real datasets. It is highly recommended to read the official [documentation](https://pytorch.org/docs/stable/index.html) and [examples](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) before starting your implementation. There are some APIs that you'll find useful:
[Layers](http://pytorch.org/docs/stable/nn.html),
[Activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),
[Loss functions](http://pytorch.org/docs/stable/nn.html#loss-functions),
[Optimizers](http://pytorch.org/docs/stable/optim.html)

It is highly recommended to use Google Colab and run the notebook on a GPU node. Check https://colab.research.google.com/ and look for tutorials online. To use a GPU go to Runtime -> Change runtime type and select GPU. 






# (1) Image Classification

In this section, you will design and train an image classification network, which takes images as input and outputs vectors whose length equals the number of possible categories on **MNIST** and **CIFAR-10** datasets. 

You can design your models by borrowing ideas from recent architectures, e.g., ResNet, but you may not simply copy an entire existing model. 

For image classification, you can use a built-in dataset provided by [torchvision](https://pytorch.org/vision/stable/index.html), a PyTorch official extension for image tasks. 

To finish this section step by step, you need to:

* Prepare data by building a dataset and dataloader. (with [torchvision](https://pytorch.org/vision/stable/index.html))

* Implement training code (6 points) & testing code (6 points), including model saving and loading.

* Construct a model (12 points) and choose an optimizer (3 points).

* Describe what you did, any additional features you implemented, and/or any graphs you made in training and evaluating your network. Also report final test accuracy @100 epochs in a writeup: hw3.pdf (3 points)

In [8]:
import numpy as np
import os
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as T


## Data Preparation:

Setup a Dataset for training and testing.

Datasets load single training examples one a time, so we practically wrap each Dataset in a DataLoader, which loads a data batch in parallel.

We provide an example for setting up a training set for MNIST, and you should complete the rest. 

In [9]:
# Setting up MNIST data loaders
mnist_train = torchvision.datasets.MNIST('./data', train = True, download = True, transform = T.ToTensor())
mnist_train_data_loader = torch.utils.data.DataLoader(mnist_train,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)

mnist_test = torchvision.datasets.MNIST("./data", train = False, download = True, transform = T.ToTensor())
mnist_test_data_loader = torch.utils.data.DataLoader(mnist_test,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)

# Setting up CIFAR10 data loaders
cifar_train = torchvision.datasets.CIFAR10('./data', train = True, download = True, transform = T.ToTensor())
cifar_train_data_loader = torch.utils.data.DataLoader(cifar_train,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)

cifar_test = torchvision.datasets.CIFAR10("./data", train = False, download = True, transform = T.ToTensor())
cifar_test_data_loader = torch.utils.data.DataLoader(cifar_test,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)



Files already downloaded and verified
Files already downloaded and verified


## Design/choose your own model structure (12 points) and optimizer (3 points).
You might want to adjust the following configurations for better performance:

(1) Network architecture:
- You can borrow some ideas from existing CNN designs, e.g., ResNet where
the input from the previous layer is added to the output
https://arxiv.org/abs/1512.03385
- Note: Do not **directly copy** an entire existing network design.

(2) Architecture hyperparameters:
- Filter size, number of filters, and number of layers (depth). Make careful choices to tradeoff computational efficiency and accuracy.
- Pooling vs. Strided Convolution
- Batch normalization
- Choice of non-linear activation

(3) Choice of optimizer (e.g., SGD, Adam, Adagrad, RMSprop) and associated hyperparameters (e.g., learning rate, momentum).

In [17]:
#Basic model, feel free to customize the layout to fit your model design.

##########################################################################
# TODO: YOUR CODE HERE
# (1) Design the model for MNIST
# (2) Design the model for CIFAR-10
##########################################################################

# model with 1 convolution layer 
class MnistNet(nn.Module):
    def __init__(self):
        super(MnistNet, self).__init__()
        self.convLayers = nn.Sequential(nn.Conv2d(in_channels = 1, out_channels = 8, kernel_size=3, stride=1, padding=1), 
                                        nn.BatchNorm2d(8), 
                                        nn.ReLU(), 
                                        nn.MaxPool2d(2))
        self.out = nn.Linear(14*14*8, 10)


    def forward(self, x):
        x = self.convLayers(x)
        x = x.view(x.size(0), -1)
        out = self.out(x)
        return out

class CifarNet(nn.Module):
    def __init__(self):
        super(CifarNet, self).__init__()
        self.convLayers = nn.Sequential(
            nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3, stride=1, padding=1), 
            nn.ReLU(), 
            nn.Conv2d(in_channels = 32, out_channels = 32, kernel_size = 3, stride = 1, padding = 1),
            nn.BatchNorm2d(32),
            nn.ReLU(), 
            nn.MaxPool2d(2),

            nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels = 64, out_channels = 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2)


            )
        self.out = nn.Sequential(
            nn.Linear(8*8*64, 1024),
            nn.ReLU(),
            nn.Linear(1024, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )


    def forward(self, x):
        x = self.convLayers(x)
        x = x.view(x.size(0), -1)
        out = self.out(x)
        return out

# Using Adam optimizer with suggested parameters from class, learning rate = 0.001, betas = 0.9, 0.999. Initialized in the last code block, shown below just for completeness
### optimizer = optim.Adam(mnistNet1.parameters(), lr = 0.001, betas=(0.9,0.999))


## Training (6 points)

Train a model on the given dataset using the PyTorch Module API.

Inputs:
- loader_train: The loader from which train samples will be drawn from.
- loader_test: The loader from which test samples will be drawn from.
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model.
- epochs: (Optional) A Python integer giving the number of epochs to train for.

Returns: Nothing, but prints model accuracies during training.

In [13]:
# Create a validation set using the training data loader (necessary because do not want to introduce another parameter 
# in the train() function)
def validation_split_from_loader(loader_train):
    features = []
    labels = []
    for _, (x,y) in enumerate(loader_train):
        features.append(x)
        labels.append(y)
    all_features = torch.cat(features)
    all_labels = torch.cat(labels)
    train_dataset = torch.utils.data.TensorDataset(all_features, all_labels)
    train_set, val_set = random_split(train_dataset, [int(0.9 * len(train_dataset)), int(0.1 * len(train_dataset))])
    train_data_loader = torch.utils.data.DataLoader(train_set,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)
    val_data_loader = torch.utils.data.DataLoader(val_set,
                                          batch_size=50,
                                          shuffle=True,
                                          num_workers=2)
    return train_data_loader, val_data_loader

# Calculate accuracy on the validation set
def validate(loader_validation, model, device):
    model = model.to(device)
    model.eval()
    num_samples = 0
    num_correct = 0
    for t, (x,y) in enumerate(loader_validation):
        x = x.to(device)
        y = y.to(device)
        # For validation using loss:
        # loss = F.cross_entropy(model(x), y)
        num_samples += x.size()[0]
        outputs = torch.argmax(model(x),1)
        num_correct += (outputs == y).sum()
    return (num_correct / num_samples) * 100

def train(loader_train, loader_test, model, optimizer, epochs=100, model_name="current_model"):
    # Working locally on laptop and desktop, sometimes no gpu available. Device is gpu if gpu is available, cpu otherwise
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    loader_train, loader_val = validation_split_from_loader(loader_train)
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    prevValAcc = None
    bestModelStateDict = None
    num_times_val_decreased = 0
    for e in range(epochs):
        model.train()
        for t, (x, y) in enumerate(loader_train):
            x = x.to(device)
            y = y.to(device)
            outputs = model(x)
            loss = criterion(outputs, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if t % 100 == 0:
                print('Epoch %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))
       
        valAcc = validate(loader_val, model, device)
        print(f"validation accuracy on validation set: {valAcc}")
        if prevValAcc == None:
            prevValAcc = valAcc
            bestModelStateDict = model.state_dict()
            continue

        # Early stopping condition
        if valAcc < prevValAcc:
             num_times_val_decreased += 1
             if num_times_val_decreased > 3:
                print(f"EARLY STOP DUE TO DECREASED VALIDATION ACCURACY: USING MODEL AT EPOCH {e - num_times_val_decreased}")
                print(f"best model validation accuracy: {prevValAcc}")
                break
        else:
            prevValAcc = valAcc
            bestModelStateDict = model.state_dict()
            num_times_val_decreased = 0
    # saving model 
    torch.save(bestModelStateDict, model_name + '.pth')

            


## Testing (6 points)
Test a model using the PyTorch Module API.

Inputs:
- loader: The loader from which test samples will be drawn from.
- model: A PyTorch Module giving the model to test.

Returns: Nothing, but prints model accuracies during training.

In [5]:
def test(loader, model, model_name="current_model"):
    model.load_state_dict(torch.load(model_name + '.pth'))
    num_correct = 0
    num_samples = 0
    model.eval() # set model to evaluation mode
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device)
            y = y.to(device)
            num_samples += x.size()[0]
            outputs = torch.argmax(model(x),1)
            num_correct += (outputs == y).sum()
            
    acc = num_correct / num_samples
    print('Eval %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))


Describe your design details in the writeup hw3.pdf. (3 points)

Finish your model and optimizer below.

In [18]:
cifarNet1 = CifarNet()
optimizer = optim.Adam(cifarNet1.parameters(), lr = 0.0005, betas=(0.9,0.999))

# Train model on training data and save parameters as cifar_model.pth
# Early stopping at around epoch 7, roughly 73.8% accuracy
train(cifar_train_data_loader, cifar_test_data_loader, cifarNet1, optimizer, epochs=100)

cuda:0
Epoch 0, Iteration 0, loss = 2.3082
Epoch 0, Iteration 100, loss = 1.4560
Epoch 0, Iteration 200, loss = 1.4030
Epoch 0, Iteration 300, loss = 1.5352
Epoch 0, Iteration 400, loss = 1.2495
Epoch 0, Iteration 500, loss = 0.7189
Epoch 0, Iteration 600, loss = 0.9608
Epoch 0, Iteration 700, loss = 1.0324
Epoch 0, Iteration 800, loss = 1.1941
validation accuracy on validation set: 64.22000122070312
Epoch 1, Iteration 0, loss = 0.9160
Epoch 1, Iteration 100, loss = 1.0343
Epoch 1, Iteration 200, loss = 0.6910
Epoch 1, Iteration 300, loss = 0.6907
Epoch 1, Iteration 400, loss = 0.9765
Epoch 1, Iteration 500, loss = 0.8560
Epoch 1, Iteration 600, loss = 1.0971
Epoch 1, Iteration 700, loss = 0.7546
Epoch 1, Iteration 800, loss = 0.6864
validation accuracy on validation set: 70.72000122070312
Epoch 2, Iteration 0, loss = 0.8453
Epoch 2, Iteration 100, loss = 0.5980
Epoch 2, Iteration 200, loss = 0.7450
Epoch 2, Iteration 300, loss = 0.4089
Epoch 2, Iteration 400, loss = 0.8246
Epoch 2, It

In [19]:
# Load model and test it on the testing set
test(cifar_test_data_loader, cifarNet1)

Eval 7786 / 10000 correct (77.86)


In [14]:
mnistNet1 = MnistNet()
# Using Adam optimizer with suggested parameters from class, learning rate = 0.001, betas = 0.9, 0.999
optimizer = optim.Adam(mnistNet1.parameters(), lr = 0.0005, betas=(0.9,0.999))

# Train model on training data and save parameters as mnist_model.pth
# Early stopping at around epoch 5, roughly 97.8% accuracy
train(mnist_train_data_loader, mnist_test_data_loader, mnistNet1, optimizer, epochs= 100)


cuda:0
Epoch 0, Iteration 0, loss = 2.1835
Epoch 0, Iteration 100, loss = 0.4409
Epoch 0, Iteration 200, loss = 0.2257
Epoch 0, Iteration 300, loss = 0.4354
Epoch 0, Iteration 400, loss = 0.3453
Epoch 0, Iteration 500, loss = 0.2040
Epoch 0, Iteration 600, loss = 0.1599
Epoch 0, Iteration 700, loss = 0.2159
Epoch 0, Iteration 800, loss = 0.1399
Epoch 0, Iteration 900, loss = 0.0414
Epoch 0, Iteration 1000, loss = 0.0594
validation accuracy on validation set: 96.16666412353516
Epoch 1, Iteration 0, loss = 0.1561
Epoch 1, Iteration 100, loss = 0.1090
Epoch 1, Iteration 200, loss = 0.1599
Epoch 1, Iteration 300, loss = 0.2414
Epoch 1, Iteration 400, loss = 0.0914
Epoch 1, Iteration 500, loss = 0.1241
Epoch 1, Iteration 600, loss = 0.0424
Epoch 1, Iteration 700, loss = 0.0278
Epoch 1, Iteration 800, loss = 0.2441
Epoch 1, Iteration 900, loss = 0.0592
Epoch 1, Iteration 1000, loss = 0.0688
validation accuracy on validation set: 97.06666564941406
Epoch 2, Iteration 0, loss = 0.0139
Epoch 2, 

In [15]:
# Load model and test it on the testing set
test(mnist_test_data_loader, mnistNet1)

Eval 9795 / 10000 correct (97.95)
