# Assignment 2: Convolutional Neural Networks
Instructions: In Assignment 2, you will learn all about the convolutional neural networks. In particular, you will gain a first-hand experience of the training process, understand the architectural details, and familiarize with transfer learning
with deep networks.

## Part 1: Convolutional Neural Networks
In this part, you will experiment with a convolutional neural network implementation to perform image classification. The dataset we will use for this assignment was created by Zoya Bylinskii, and contains 451 works of art from 11 different artists all downsampled and padded to the same size. The task is to identify which artist produced each image. The original images can be found in the `art_data/artists` directory included with the data zip file. The composition of the dataset and a sample painting from each artist are shown in Table 1.

Figure 1 shows an example of the type of convolutional architecture typically employed for similar image recognition problems. Convolutional layers apply filters to the image, and produce layers of
feature maps. Often, the convolutional layers are interspersed with pooling layers. The final layers of the network are fully connected, and lead to an output layer with one node for each of the K classes
the network is trying to detect. We will use a similar architecture for our network.

![](figures/figure1.jpg)

The code for performing the data processing and training the network is provided in the starter
pack. You will use PyTorch to implement convolutional neural networks. We create a dataset from the artists’ images by downsampling them to 50x50 pixels, and transforming the RGB values to lie within the range $[-0.5, 0.5]$. We provide a lot of starter code below, but you will need to modify the hyperparameters and network structure.

### Part 1.1: Convolutional Filter Receptive Field

First, it is important to develop an intuition for how a convolutional layer affects the feature representations that the network learns. Assume that you have a network in which the first convolutional layer
applies a 5x5 patch to the image, producing a feature map $Z_{1}$. The next layer of the network is also convolutional; in this case, a 3x3 patch is applied to the feature map $Z_{1}$ to produce a new feature
map, $Z_{2}$. Assume the stride used in both cases is 1. Let the receptive field of a node in this network be the portion of the original image that contributes information to the node (that it can, through the filters of the network, “see”). What are the dimensions of the receptive field for a node in $Z_{2}$? Note that you can ignore padding, and just consider patches in the middle of the image and $Z_{1}$. Thinking about your answer, why is it effective to build convolutional networks deeper, i.e. with more layers?

**ANSWER**<br>
For a node in $Z_{2}$ the receptive field is 7x7. A 7x7 receptive field can be accomplished directly by using a 7x7 filter. But if we do it as explained above or even better with 3 3x3 filters, we will use more non-linear layers and the extracted features will be improved.<br> Also, using a single 7x7 layer, we will need K x (7 x 7 x C) = 49 x K x C parameters where K is the number of filters (or output channels) and C is the number of input channels. Whereas using 3 3x3 filters we will need  3 x (K x (3 x 3 x C)) = 27 x K x C for the same output volume, less that a single convoluional layer with a 7x7 filter.

### Part 1.2: Run the PyTorch ConvNet

Study the provided SimpleCNN class below, and take a look at the hyperparameters. Answer the following questions about the initial implementation:

1) How many layers are there? Are they all convolutional? If not, what structure do they have?
2) Which activation function is used on the hidden nodes?
3) What loss function is being used to train the network?
4) How is the loss being minimized?

**ANSWER**<br
The architecture:<br>
CONV -> ReLU -> POOL (?) -> CONV -> RelLU -> ReLU -> POOL (?) -> FC -> ReLU -> FC<br>
There are 2 convolutional layers, which are both followed by ReLU activation functions. Depending on the pooling flag, these are followed by pooling. At the end, there are 2 fully connected layers and a ReLU activation function in between.<br>
Cross entropy loss is being used in this multiclass classification problem.<br>
The loss is being minimized by backpropagation.


Now that you are familiar with the code, try training the network. It should take between 60-120 seconds to train for 50 epochs. What is the training accuracy for your network after training? What is the validation accuracy? What do these two numbers tell you about what your network is doing?

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from PIL import Image, ImageFile
import tqdm
from torch.nn import CrossEntropyLoss
import time
import random
from torchvision import transforms, utils
import numpy as np
import os
from torch import optim

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
# device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")

In [2]:
class SimpleCNN(torch.nn.Module):
    def __init__(self,device,pooling= False):
        super(SimpleCNN, self).__init__()
        self.device = device
        self.pooling = pooling
        self.conv_layer1 =  torch.nn.Conv2d(in_channels=3,out_channels=16,kernel_size=5,stride=2, device=device)
        self.pool_layer1 = torch.nn.MaxPool2d(kernel_size=2,stride=2)
        self.conv_layer2 = torch.nn.Conv2d(in_channels=16,out_channels=16,kernel_size=5,stride=2, device=device)
        self.pool_layer2 = torch.nn.MaxPool2d(kernel_size=2,stride=2)
        if pooling:
            self.fully_connected_layer = nn.Linear(64,64, device=device)
            self.dropout = nn.Dropout(p=0.5)
            self.final_layer = nn.Linear(64,11, device=device)
        else:
            self.fully_connected_layer = nn.Linear(1600, 64, device=device)
            self.dropout = nn.Dropout(p=0.5)
            self.final_layer = nn.Linear(64, 11, device=device)
    def forward(self,inp):
        x = torch.nn.functional.relu(self.conv_layer1(inp))
        if self.pooling:
            x = self.pool_layer1(x)
        x = torch.nn.functional.relu(self.conv_layer2(x))
        if self.pooling:
            x = self.pool_layer2(x)
        x = x.reshape(x.size(0),-1)
        x = torch.nn.functional.relu(self.fully_connected_layer(x))
        #x = self.dropout(x)
        x = self.final_layer(x)
        return x

In [3]:
class LoaderClass(Dataset):
    def __init__(self,data,labels,phase,transforms):
        super(LoaderClass, self).__init__()
        self.transforms = transforms
        self.labels = labels[phase + "_labels"]
        self.data = data[phase + "_data"]
        self.phase = phase

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        label = self.labels[idx]
        img = self.data[idx]
        img = Image.fromarray(img)
        img = self.transforms(img)
        return img,torch.from_numpy(label)

In [4]:
class Trainer():
    def __init__(self,model,criterion,tr_loader,val_loader,optimizer,
                 num_epoch,patience,batch_size,lr_scheduler=None):
        self.model = model
        self.tr_loader = tr_loader
        self.val_loader = val_loader
        self.optimizer = optimizer
        self.num_epoch = num_epoch
        self.patience = patience
        self.lr_scheduler = lr_scheduler
        self.criterion = criterion
        self.softmax = nn.Softmax()
        self.no_inc = 0
        self.best_loss = 9999
        self.phases = ["train","val"]
        self.best_model = []
        self.best_val_acc = 0
        self.best_train_acc = 0
        self.best_val_loss = 0
        self.best_train_loss = 0
        self.batch_size = batch_size

        pass
    def train(self):
        pbar = tqdm.tqdm(desc= "Epoch 0, phase: Train",postfix="train_loss : ?, train_acc: ?")
        for i in range(self.num_epoch):
            last_train_acc = 0
            last_val_acc = 0
            last_val_loss = 0
            last_train_loss = 0
            pbar.update(1)

            for phase in self.phases:
                total_acc = 0
                total_loss = 0
                start = time.time()
                if phase == "train":
                    pbar.set_description_str("Epoch %d,"% i + "phase: Training")
                    loader = self.tr_loader
                    self.model.train()
                else:
                    pbar.set_description_str("Epoch %d,"% i + "phase: Validation")
                    loader = self.val_loader
                    self.model.eval()
                iter = 0
                for images,labels in loader:
                    iter += 1
                    images = images.to(self.model.device)
                    labels = labels.to(self.model.device)
                    self.optimizer.zero_grad()
                    logits = self.model(images)
                    softmaxed_scores = self.softmax(logits)
                    _, predictions = torch.max(softmaxed_scores,1)
                    _, labels = torch.max(labels,1)
                    loss = self.criterion(softmaxed_scores.float(),labels.long())
                    total_loss += loss.item()
                    total_acc += torch.sum(predictions == labels).item()

                    if phase == "train":
                        pbar.set_postfix_str("train acc: %6.3f," %(total_acc/ (iter*self.batch_size)) + ("train loss: %6.3f" % (total_loss / iter)))
                        loss.backward()
                        self.optimizer.step()
                    else:
                        pbar.set_postfix_str("val acc: %6.3f," %(total_acc/ (iter*self.batch_size)) + ("val loss: %6.3f" % (total_loss / iter)))


                if phase == "train":
                    if self.lr_scheduler:

                        self.lr_scheduler.step()
                end = time.time()
                if phase == "train":
                    loss_p = total_loss / iter
                    acc_p = total_acc / len(self.tr_loader.dataset)
                    last_train_acc = acc_p
                    last_train_loss = loss_p
                else:
                    loss_p = total_loss / iter
                    acc_p = total_acc / len(self.val_loader.dataset)
                    last_val_acc = acc_p
                    last_val_loss = loss_p

                    if loss_p < self.best_loss:
                        print("New best loss, loss is: ",str(loss_p), "acc is: ",acc_p )
                        self.best_loss = loss_p
                        self.no_inc = 0
                        self.best_model = self.model
                        self.best_train_acc = last_train_acc
                        self.best_train_loss = last_train_loss
                        self.best_val_loss = last_val_loss
                        self.best_val_acc = last_val_acc
                    else:
                        print("Not a better score")


                        self.no_inc += 1
                        if self.no_inc == self.patience:
                            print("Out of patience returning the best model")
                            print(
                                "Best val acc: {}, Best val loss: {}, Best train acc: {}, Best train loss: {} ".format(
                                    self.best_val_acc, self.best_val_loss, self.best_train_acc, self.best_train_loss
                                ))  # Stats of the best model
                            return self.best_model
        print("Training ended returning the best model")
        print(
            "Best val acc: {}, Best val loss: {}, Best train acc: {}, Best train loss: {} ".format(
                self.best_val_acc, self.best_val_loss, self.best_train_acc, self.best_train_loss
            ))  # Stats of the best model
        return self.best_model

In [5]:
LR = 1e-4
Momentum = 0.9 # If you use SGD with momentum
BATCH_SIZE = 16
POOLING = False
NUM_EPOCHS = 200
PATIENCE = -1
TRAIN_PERCENT = 0.8
VAL_PERCENT = 0.2
NUM_ARTISTS = 11
DATA_PATH = "./art_data/artists"
ImageFile.LOAD_TRUNCATED_IMAGES = True # Do not change this

In [6]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

In [7]:
def load_artist_data():
    data = []
    labels = []
    artists = [x for x in os.listdir(DATA_PATH) if x != '.DS_Store']
    print(artists)
    for folder in os.listdir(DATA_PATH):
        class_index = artists.index(folder)
        for image_name in os.listdir(DATA_PATH + "/" + folder):
            img = Image.open(DATA_PATH + "/" + folder + "/" + image_name)
            artist_label = (np.arange(NUM_ARTISTS) == class_index).astype(np.float32)
            data.append(np.array(img))
            labels.append(artist_label)
    shuffler = np.random.permutation(len(labels))
    data = np.array(data)[shuffler]
    labels = np.array(labels)[shuffler]

    length = len(data)
    val_size = int(length*0.2)
    val_data = data[0:val_size+1]
    train_data = data[val_size+1::]
    val_labels = labels[0:val_size+1]
    train_labels = labels[val_size+1::]
    print(val_labels)
    data_dict = {"train_data":train_data,"val_data":val_data}
    label_dict = {"train_labels":np.array(train_labels),"val_labels":np.array(val_labels)}

    return data_dict,label_dict

In [8]:
seed_everything(42)
data,labels = load_artist_data()
model = SimpleCNN(device=device,pooling=False)
optimizer = optim.Adam(model.parameters(), lr=LR)
transform = {
    'train': transforms.Compose([
        transforms.Resize(50),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(50),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    }

['canaletto', 'claude monet', 'george romney', 'j. m. w. turner', 'john robert cozens', 'paul cezanne', 'paul gauguin', 'paul sandby', 'peter paul rubens', 'rembrandt', 'richard wilson']
[[0. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 1. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]]


In [11]:
train_dataset = LoaderClass(data,labels,"train",transform["train"])
valid_dataset = LoaderClass(data,labels,"val",transform["val"])
train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=BATCH_SIZE,
                                               shuffle=True, num_workers=0, pin_memory=True)
val_loader = torch.utils.data.DataLoader(valid_dataset,
                                             batch_size=BATCH_SIZE,
                                             shuffle=True, num_workers=0, pin_memory=True)


In [12]:
criterion = nn.CrossEntropyLoss()

In [13]:
# standard adam optimizer, no pooling, no weight decay, no dropout, no scheduler
trainer_m = Trainer(model, criterion, train_loader, val_loader, optimizer, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model = trainer_m.train()
# Best val acc: 0.4945054945054945, Best val loss: 2.0346497694651284, Best train acc: 0.8333333333333334, Best train loss: 1.7067015378371528
# Best val acc: 0.5054945054945055, Best val loss: 2.0317842761675515, Best train acc: 0.8527777777777777, Best train loss: 1.691160772157752
# Best val acc: 0.4835164835164835, Best val loss: 2.048618813355764, Best train acc: 0.8638888888888889, Best train loss: 1.6764802984569385 

Epoch 1,phase: Training: 2it [00:00,  4.61it/s, train acc:  0.255,train loss:  2.366]  

New best loss, loss is:  2.3737993637720742 acc is:  0.18681318681318682


Epoch 2,phase: Training: 3it [00:01,  3.40it/s, train acc:  0.353,train loss:  2.234]  

New best loss, loss is:  2.265027324358622 acc is:  0.31868131868131866


Epoch 3,phase: Training: 4it [00:01,  3.06it/s, train acc:  0.288,train loss:  2.223]  

New best loss, loss is:  2.1992594401041665 acc is:  0.34065934065934067


Epoch 4,phase: Training: 5it [00:01,  2.86it/s, train acc:  0.418,train loss:  2.130]  

New best loss, loss is:  2.176307201385498 acc is:  0.32967032967032966


Epoch 5,phase: Training: 6it [00:02,  2.72it/s, train acc:  0.481,train loss:  2.072]  

Not a better score


Epoch 6,phase: Training: 7it [00:02,  2.57it/s, train acc:  0.504,train loss:  2.061]  

Not a better score


Epoch 7,phase: Training: 8it [00:03,  2.57it/s, train acc:  0.536,train loss:  2.055]  

New best loss, loss is:  2.154374440511068 acc is:  0.3626373626373626


Epoch 8,phase: Training: 9it [00:03,  2.62it/s, train acc:  0.500,train loss:  2.061]  

Not a better score


Epoch 9,phase: Training: 10it [00:03,  2.68it/s, train acc:  0.500,train loss:  2.048] 

Not a better score


Epoch 10,phase: Training: 11it [00:04,  2.74it/s, train acc:  0.567,train loss:  2.001] 

Not a better score


Epoch 11,phase: Training: 12it [00:04,  2.73it/s, train acc:  0.558,train loss:  1.997]  

New best loss, loss is:  2.1323689421017966 acc is:  0.3956043956043956


Epoch 12,phase: Training: 13it [00:04,  2.69it/s, train acc:  0.591,train loss:  1.971]  

New best loss, loss is:  2.127723137537638 acc is:  0.4065934065934066


Epoch 13,phase: Training: 14it [00:05,  2.67it/s, train acc:  0.621,train loss:  1.955]  

New best loss, loss is:  2.1123339533805847 acc is:  0.4065934065934066


Epoch 14,phase: Training: 15it [00:05,  2.68it/s, train acc:  0.620,train loss:  1.948]  

New best loss, loss is:  2.1123239596684775 acc is:  0.4065934065934066


Epoch 15,phase: Training: 16it [00:05,  2.70it/s, train acc:  0.608,train loss:  1.961]  

New best loss, loss is:  2.096355060736338 acc is:  0.43956043956043955


Epoch 16,phase: Training: 17it [00:06,  2.72it/s, train acc:  0.617,train loss:  1.963]  

Not a better score


Epoch 17,phase: Training: 18it [00:06,  2.69it/s, train acc:  0.654,train loss:  1.913]  

Not a better score


Epoch 18,phase: Training: 19it [00:07,  2.65it/s, train acc:  0.644,train loss:  1.928]  

Not a better score


Epoch 19,phase: Training: 20it [00:07,  2.61it/s, train acc:  0.617,train loss:  1.943]  

Not a better score


Epoch 20,phase: Training: 21it [00:07,  2.67it/s, train acc:  0.683,train loss:  1.896]  

New best loss, loss is:  2.0849414666493735 acc is:  0.43956043956043955


Epoch 21,phase: Training: 22it [00:08,  2.66it/s, train acc:  0.652,train loss:  1.913]  

Not a better score


Epoch 22,phase: Training: 23it [00:08,  2.65it/s, train acc:  0.635,train loss:  1.925]  

Not a better score


Epoch 23,phase: Training: 24it [00:09,  2.59it/s, train acc:  0.635,train loss:  1.931]  

Not a better score


Epoch 24,phase: Training: 25it [00:09,  2.58it/s, train acc:  0.642,train loss:  1.912]  

New best loss, loss is:  2.0650925834973655 acc is:  0.45054945054945056


Epoch 25,phase: Training: 26it [00:09,  2.62it/s, train acc:  0.607,train loss:  1.941]  

Not a better score


Epoch 26,phase: Training: 27it [00:10,  2.67it/s, train acc:  0.661,train loss:  1.901]  

Not a better score


Epoch 27,phase: Training: 28it [00:10,  2.67it/s, train acc:  0.625,train loss:  1.919]  

Not a better score


Epoch 28,phase: Training: 29it [00:10,  2.65it/s, train acc:  0.625,train loss:  1.937]  

Not a better score


Epoch 29,phase: Training: 30it [00:11,  2.58it/s, train acc:  0.683,train loss:  1.883]  

Not a better score


Epoch 30,phase: Training: 31it [00:11,  2.61it/s, train acc:  0.638,train loss:  1.913]  

Not a better score


Epoch 31,phase: Training: 32it [00:12,  2.61it/s, train acc:  0.625,train loss:  1.926]  

Not a better score


Epoch 32,phase: Training: 33it [00:12,  2.64it/s, train acc:  0.659,train loss:  1.899]  

Not a better score


Epoch 33,phase: Training: 34it [00:12,  2.59it/s, train acc:  0.642,train loss:  1.910]  

New best loss, loss is:  2.062154173851013 acc is:  0.4725274725274725


Epoch 34,phase: Training: 35it [00:13,  2.61it/s, train acc:  0.649,train loss:  1.899]  

Not a better score


Epoch 35,phase: Training: 36it [00:13,  2.62it/s, train acc:  0.670,train loss:  1.889]  

Not a better score


Epoch 36,phase: Training: 37it [00:13,  2.66it/s, train acc:  0.710,train loss:  1.851]  

Not a better score


Epoch 37,phase: Training: 38it [00:14,  2.69it/s, train acc:  0.651,train loss:  1.905]  

Not a better score


Epoch 38,phase: Training: 39it [00:14,  2.65it/s, train acc:  0.672,train loss:  1.880]  

New best loss, loss is:  2.047486503918966 acc is:  0.4835164835164835


Epoch 39,phase: Training: 40it [00:15,  2.72it/s, train acc:  0.672,train loss:  1.891]  

Not a better score


Epoch 40,phase: Training: 41it [00:15,  2.66it/s, train acc:  0.659,train loss:  1.895]  

Not a better score


Epoch 41,phase: Training: 42it [00:15,  2.62it/s, train acc:  0.707,train loss:  1.857]  

Not a better score


Epoch 42,phase: Training: 43it [00:16,  2.69it/s, train acc:  0.676,train loss:  1.878]  

Not a better score


Epoch 43,phase: Training: 44it [00:16,  2.79it/s, train acc:  0.679,train loss:  1.877]  

Not a better score


Epoch 44,phase: Training: 45it [00:16,  2.82it/s, train acc:  0.634,train loss:  1.913]  

Not a better score


Epoch 45,phase: Training: 46it [00:17,  2.73it/s, train acc:  0.697,train loss:  1.858]  

Not a better score


Epoch 46,phase: Training: 47it [00:17,  2.70it/s, train acc:  0.688,train loss:  1.865]  

Not a better score


Epoch 47,phase: Training: 48it [00:17,  2.74it/s, train acc:  0.695,train loss:  1.857]  

Not a better score


Epoch 48,phase: Training: 49it [00:18,  2.82it/s, train acc:  0.699,train loss:  1.867]  

Not a better score


Epoch 49,phase: Training: 50it [00:18,  2.88it/s, train acc:  0.696,train loss:  1.867]  

Not a better score


Epoch 50,phase: Training: 51it [00:19,  2.84it/s, train acc:  0.696,train loss:  1.863]  

Not a better score


Epoch 51,phase: Training: 52it [00:19,  2.77it/s, train acc:  0.697,train loss:  1.857]  

Not a better score


Epoch 52,phase: Training: 53it [00:19,  2.71it/s, train acc:  0.719,train loss:  1.841]  

Not a better score


Epoch 53,phase: Training: 54it [00:20,  2.75it/s, train acc:  0.705,train loss:  1.850]  

Not a better score


Epoch 54,phase: Training: 55it [00:20,  2.74it/s, train acc:  0.665,train loss:  1.883]  

Not a better score


Epoch 55,phase: Training: 56it [00:20,  2.73it/s, train acc:  0.670,train loss:  1.888]  

Not a better score


Epoch 56,phase: Training: 57it [00:21,  2.71it/s, train acc:  0.726,train loss:  1.832]  

Not a better score


Epoch 57,phase: Training: 58it [00:21,  2.72it/s, train acc:  0.729,train loss:  1.824]  

Not a better score


Epoch 58,phase: Training: 59it [00:21,  2.80it/s, train acc:  0.700,train loss:  1.854]  

Not a better score


Epoch 59,phase: Training: 60it [00:22,  2.82it/s, train acc:  0.696,train loss:  1.852]  

Not a better score


Epoch 60,phase: Training: 61it [00:22,  2.81it/s, train acc:  0.701,train loss:  1.855]  

Not a better score


Epoch 61,phase: Training: 62it [00:23,  2.75it/s, train acc:  0.724,train loss:  1.825]  

Not a better score


Epoch 62,phase: Training: 63it [00:23,  2.69it/s, train acc:  0.714,train loss:  1.835]  

Not a better score


Epoch 63,phase: Training: 64it [00:23,  2.68it/s, train acc:  0.729,train loss:  1.824]  

Not a better score


Epoch 64,phase: Training: 65it [00:24,  2.73it/s, train acc:  0.728,train loss:  1.821]  

Not a better score


Epoch 65,phase: Training: 66it [00:24,  2.75it/s, train acc:  0.742,train loss:  1.816]  

Not a better score


Epoch 66,phase: Training: 67it [00:24,  2.80it/s, train acc:  0.711,train loss:  1.838]  

Not a better score


Epoch 67,phase: Training: 68it [00:25,  2.83it/s, train acc:  0.742,train loss:  1.808]  

Not a better score


Epoch 68,phase: Training: 69it [00:25,  2.80it/s, train acc:  0.723,train loss:  1.834]  

Not a better score


Epoch 69,phase: Training: 70it [00:25,  2.82it/s, train acc:  0.762,train loss:  1.799]  

Not a better score


Epoch 70,phase: Training: 71it [00:26,  2.84it/s, train acc:  0.730,train loss:  1.819]  

Not a better score


Epoch 71,phase: Training: 72it [00:26,  2.90it/s, train acc:  0.721,train loss:  1.826]  

Not a better score


Epoch 72,phase: Training: 73it [00:26,  2.89it/s, train acc:  0.732,train loss:  1.822]  

Not a better score


Epoch 73,phase: Training: 74it [00:27,  2.86it/s, train acc:  0.721,train loss:  1.824]  

Not a better score


Epoch 74,phase: Training: 75it [00:27,  2.73it/s, train acc:  0.725,train loss:  1.824]  

Not a better score


Epoch 75,phase: Training: 76it [00:28,  2.74it/s, train acc:  0.725,train loss:  1.819]  

Not a better score


Epoch 76,phase: Training: 77it [00:28,  2.84it/s, train acc:  0.750,train loss:  1.794]  

Not a better score


Epoch 77,phase: Training: 78it [00:28,  2.86it/s, train acc:  0.789,train loss:  1.765]  

Not a better score


Epoch 78,phase: Training: 79it [00:29,  2.90it/s, train acc:  0.777,train loss:  1.777]  

Not a better score


Epoch 79,phase: Training: 80it [00:29,  2.83it/s, train acc:  0.728,train loss:  1.821]  

Not a better score


Epoch 80,phase: Training: 81it [00:29,  2.82it/s, train acc:  0.742,train loss:  1.809]  

Not a better score


Epoch 81,phase: Training: 82it [00:30,  2.81it/s, train acc:  0.754,train loss:  1.795]  

Not a better score


Epoch 82,phase: Training: 83it [00:30,  2.82it/s, train acc:  0.740,train loss:  1.815]  

Not a better score


Epoch 83,phase: Training: 84it [00:30,  2.79it/s, train acc:  0.754,train loss:  1.797]  

Not a better score


Epoch 84,phase: Training: 85it [00:31,  2.84it/s, train acc:  0.723,train loss:  1.820]  

Not a better score


Epoch 85,phase: Training: 86it [00:31,  2.83it/s, train acc:  0.763,train loss:  1.791]  

Not a better score


Epoch 86,phase: Training: 87it [00:31,  2.78it/s, train acc:  0.742,train loss:  1.804]  

Not a better score


Epoch 87,phase: Training: 88it [00:32,  2.81it/s, train acc:  0.746,train loss:  1.805]  

Not a better score


Epoch 88,phase: Training: 89it [00:32,  2.79it/s, train acc:  0.762,train loss:  1.790]  

Not a better score


Epoch 89,phase: Training: 90it [00:32,  2.79it/s, train acc:  0.750,train loss:  1.799]  

Not a better score


Epoch 90,phase: Training: 91it [00:33,  2.83it/s, train acc:  0.750,train loss:  1.796]  

Not a better score


Epoch 91,phase: Training: 92it [00:33,  2.76it/s, train acc:  0.759,train loss:  1.789]  

Not a better score


Epoch 92,phase: Training: 93it [00:34,  2.80it/s, train acc:  0.746,train loss:  1.796]  

Not a better score


Epoch 93,phase: Training: 94it [00:34,  2.85it/s, train acc:  0.746,train loss:  1.804]  

Not a better score


Epoch 94,phase: Training: 95it [00:34,  2.92it/s, train acc:  0.746,train loss:  1.799]  

Not a better score


Epoch 95,phase: Training: 96it [00:35,  2.95it/s, train acc:  0.759,train loss:  1.791]  

Not a better score


Epoch 96,phase: Training: 97it [00:35,  2.93it/s, train acc:  0.801,train loss:  1.749]  

Not a better score


Epoch 97,phase: Training: 98it [00:35,  2.88it/s, train acc:  0.750,train loss:  1.801]  

Not a better score


Epoch 98,phase: Training: 99it [00:36,  2.88it/s, train acc:  0.759,train loss:  1.786]  

Not a better score


Epoch 99,phase: Training: 100it [00:36,  2.93it/s, train acc:  0.775,train loss:  1.769] 

Not a better score


Epoch 100,phase: Training: 101it [00:36,  2.96it/s, train acc:  0.746,train loss:  1.797] 

Not a better score


Epoch 101,phase: Training: 102it [00:37,  2.96it/s, train acc:  0.746,train loss:  1.798]  

Not a better score


Epoch 102,phase: Training: 103it [00:37,  3.00it/s, train acc:  0.772,train loss:  1.776]  

Not a better score


Epoch 103,phase: Training: 104it [00:37,  2.97it/s, train acc:  0.795,train loss:  1.749]  

Not a better score


Epoch 104,phase: Training: 105it [00:38,  2.90it/s, train acc:  0.759,train loss:  1.792]  

Not a better score


Epoch 105,phase: Training: 106it [00:38,  2.85it/s, train acc:  0.758,train loss:  1.797]  

Not a better score


Epoch 106,phase: Training: 107it [00:38,  2.78it/s, train acc:  0.746,train loss:  1.809]  

Not a better score


Epoch 107,phase: Training: 108it [00:39,  2.80it/s, train acc:  0.787,train loss:  1.760]  

Not a better score


Epoch 108,phase: Training: 109it [00:39,  2.83it/s, train acc:  0.775,train loss:  1.778]  

Not a better score


Epoch 109,phase: Training: 110it [00:39,  2.82it/s, train acc:  0.760,train loss:  1.788]  

Not a better score


Epoch 110,phase: Training: 111it [00:40,  2.72it/s, train acc:  0.726,train loss:  1.823]  

Not a better score


Epoch 111,phase: Training: 112it [00:40,  2.69it/s, train acc:  0.759,train loss:  1.788]  

Not a better score


Epoch 112,phase: Training: 113it [00:41,  2.74it/s, train acc:  0.789,train loss:  1.767]  

Not a better score


Epoch 113,phase: Training: 114it [00:41,  2.82it/s, train acc:  0.783,train loss:  1.768]  

Not a better score


Epoch 114,phase: Training: 115it [00:41,  2.84it/s, train acc:  0.772,train loss:  1.778]  

Not a better score


Epoch 115,phase: Training: 116it [00:42,  2.78it/s, train acc:  0.816,train loss:  1.737]  

Not a better score


Epoch 116,phase: Training: 117it [00:42,  2.82it/s, train acc:  0.817,train loss:  1.737]  

Not a better score


Epoch 117,phase: Training: 118it [00:42,  2.74it/s, train acc:  0.774,train loss:  1.777]  

Not a better score


Epoch 118,phase: Training: 119it [00:43,  2.71it/s, train acc:  0.808,train loss:  1.742]  

Not a better score


Epoch 119,phase: Training: 120it [00:43,  2.76it/s, train acc:  0.797,train loss:  1.751]  

Not a better score


Epoch 120,phase: Training: 121it [00:43,  2.88it/s, train acc:  0.771,train loss:  1.780]  

Not a better score


Epoch 121,phase: Training: 122it [00:44,  2.92it/s, train acc:  0.816,train loss:  1.732]  

Not a better score


Epoch 122,phase: Training: 123it [00:44,  2.95it/s, train acc:  0.795,train loss:  1.756]  

Not a better score


Epoch 123,phase: Training: 124it [00:44,  2.82it/s, train acc:  0.804,train loss:  1.745]  

Not a better score


Epoch 124,phase: Training: 125it [00:45,  2.84it/s, train acc:  0.794,train loss:  1.751]  

Not a better score


Epoch 125,phase: Training: 126it [00:45,  2.86it/s, train acc:  0.790,train loss:  1.760]  

Not a better score


Epoch 126,phase: Training: 127it [00:45,  2.81it/s, train acc:  0.821,train loss:  1.725]  

Not a better score


Epoch 127,phase: Training: 128it [00:46,  2.80it/s, train acc:  0.801,train loss:  1.744]  

Not a better score


Epoch 128,phase: Training: 129it [00:46,  2.78it/s, train acc:  0.832,train loss:  1.716]  

Not a better score


Epoch 129,phase: Training: 130it [00:47,  2.70it/s, train acc:  0.826,train loss:  1.719]  

Not a better score


Epoch 130,phase: Training: 131it [00:47,  2.78it/s, train acc:  0.822,train loss:  1.727]  

Not a better score


Epoch 131,phase: Training: 132it [00:47,  2.76it/s, train acc:  0.803,train loss:  1.743]  

Not a better score


Epoch 132,phase: Training: 133it [00:48,  2.69it/s, train acc:  0.821,train loss:  1.725]  

Not a better score


Epoch 133,phase: Training: 134it [00:48,  2.71it/s, train acc:  0.821,train loss:  1.722]  

Not a better score


Epoch 134,phase: Training: 135it [00:48,  2.71it/s, train acc:  0.832,train loss:  1.711]  

Not a better score


Epoch 135,phase: Training: 136it [00:49,  2.70it/s, train acc:  0.832,train loss:  1.713]  

Not a better score


Epoch 136,phase: Training: 137it [00:49,  2.67it/s, train acc:  0.825,train loss:  1.719]  

Not a better score


Epoch 137,phase: Training: 138it [00:50,  2.75it/s, train acc:  0.812,train loss:  1.734]  

Not a better score


Epoch 138,phase: Training: 139it [00:50,  2.77it/s, train acc:  0.796,train loss:  1.748]  

Not a better score


Epoch 139,phase: Training: 140it [00:50,  2.79it/s, train acc:  0.835,train loss:  1.709]  

Not a better score


Epoch 140,phase: Training: 141it [00:51,  2.76it/s, train acc:  0.821,train loss:  1.724]  

Not a better score


Epoch 141,phase: Training: 142it [00:51,  2.82it/s, train acc:  0.829,train loss:  1.712]  

Not a better score


Epoch 142,phase: Training: 143it [00:51,  2.88it/s, train acc:  0.781,train loss:  1.762]  

Not a better score


Epoch 143,phase: Training: 144it [00:52,  2.89it/s, train acc:  0.801,train loss:  1.742]  

Not a better score


Epoch 144,phase: Training: 145it [00:52,  2.93it/s, train acc:  0.796,train loss:  1.748]  

Not a better score


Epoch 145,phase: Training: 146it [00:52,  2.88it/s, train acc:  0.777,train loss:  1.767]  

Not a better score


Epoch 146,phase: Training: 147it [00:53,  2.84it/s, train acc:  0.805,train loss:  1.740]  

Not a better score


Epoch 147,phase: Training: 148it [00:53,  2.88it/s, train acc:  0.801,train loss:  1.742]  

Not a better score


Epoch 148,phase: Training: 149it [00:53,  2.94it/s, train acc:  0.808,train loss:  1.735]  

Not a better score


Epoch 149,phase: Training: 150it [00:54,  2.93it/s, train acc:  0.812,train loss:  1.730]  

Not a better score


Epoch 150,phase: Training: 151it [00:54,  3.00it/s, train acc:  0.804,train loss:  1.742]  

Not a better score


Epoch 151,phase: Training: 152it [00:54,  2.89it/s, train acc:  0.802,train loss:  1.740]  

Not a better score


Epoch 152,phase: Training: 153it [00:55,  2.85it/s, train acc:  0.808,train loss:  1.733]  

Not a better score


Epoch 153,phase: Training: 154it [00:55,  2.90it/s, train acc:  0.817,train loss:  1.722]  

Not a better score


Epoch 154,phase: Training: 155it [00:55,  2.87it/s, train acc:  0.790,train loss:  1.753]  

Not a better score


Epoch 155,phase: Training: 156it [00:56,  2.82it/s, train acc:  0.808,train loss:  1.732]  

Not a better score


Epoch 156,phase: Training: 157it [00:56,  2.79it/s, train acc:  0.793,train loss:  1.750]  

Not a better score


Epoch 157,phase: Training: 158it [00:57,  2.71it/s, train acc:  0.838,train loss:  1.704]  

Not a better score


Epoch 158,phase: Training: 159it [00:57,  2.78it/s, train acc:  0.821,train loss:  1.721]  

Not a better score


Epoch 159,phase: Training: 160it [00:57,  2.78it/s, train acc:  0.808,train loss:  1.734]  

Not a better score


Epoch 160,phase: Training: 161it [00:58,  2.76it/s, train acc:  0.800,train loss:  1.741]  

Not a better score


Epoch 161,phase: Training: 162it [00:58,  2.79it/s, train acc:  0.804,train loss:  1.737]  

Not a better score


Epoch 162,phase: Training: 163it [00:58,  2.82it/s, train acc:  0.812,train loss:  1.729]  

Not a better score


Epoch 163,phase: Training: 164it [00:59,  2.75it/s, train acc:  0.817,train loss:  1.724]  

Not a better score


Epoch 164,phase: Training: 165it [00:59,  2.72it/s, train acc:  0.812,train loss:  1.728]  

Not a better score


Epoch 165,phase: Training: 166it [00:59,  2.74it/s, train acc:  0.809,train loss:  1.732]  

Not a better score


Epoch 166,phase: Training: 167it [01:00,  2.85it/s, train acc:  0.821,train loss:  1.720]  

Not a better score


Epoch 167,phase: Training: 168it [01:00,  2.84it/s, train acc:  0.839,train loss:  1.702]  

Not a better score


Epoch 168,phase: Training: 169it [01:00,  2.82it/s, train acc:  0.832,train loss:  1.710]  

Not a better score


Epoch 169,phase: Training: 170it [01:01,  2.73it/s, train acc:  0.779,train loss:  1.761]  

Not a better score


Epoch 170,phase: Training: 171it [01:01,  2.78it/s, train acc:  0.829,train loss:  1.712]  

Not a better score


Epoch 171,phase: Training: 172it [01:02,  2.82it/s, train acc:  0.809,train loss:  1.732]  

Not a better score


Epoch 172,phase: Training: 173it [01:02,  2.86it/s, train acc:  0.818,train loss:  1.724]  

Not a better score


Epoch 173,phase: Training: 174it [01:02,  2.74it/s, train acc:  0.812,train loss:  1.728]  

Not a better score


Epoch 174,phase: Training: 175it [01:03,  2.65it/s, train acc:  0.788,train loss:  1.752]  

Not a better score


Epoch 175,phase: Training: 176it [01:03,  2.69it/s, train acc:  0.830,train loss:  1.711]  

Not a better score


Epoch 176,phase: Training: 177it [01:03,  2.75it/s, train acc:  0.812,train loss:  1.728]  

Not a better score


Epoch 177,phase: Training: 178it [01:04,  2.76it/s, train acc:  0.792,train loss:  1.749]  

Not a better score


Epoch 178,phase: Training: 179it [01:04,  2.80it/s, train acc:  0.844,train loss:  1.697]  

Not a better score


Epoch 179,phase: Training: 180it [01:04,  2.79it/s, train acc:  0.828,train loss:  1.713]  

Not a better score


Epoch 180,phase: Training: 181it [01:05,  2.62it/s, train acc:  0.807,train loss:  1.734]  

Not a better score


Epoch 181,phase: Training: 182it [01:05,  2.52it/s, train acc:  0.824,train loss:  1.717]  

Not a better score


Epoch 182,phase: Training: 183it [01:06,  2.65it/s, train acc:  0.804,train loss:  1.736]  

Not a better score


Epoch 183,phase: Training: 184it [01:06,  2.72it/s, train acc:  0.812,train loss:  1.728]  

Not a better score


Epoch 184,phase: Training: 185it [01:06,  2.81it/s, train acc:  0.790,train loss:  1.750]  

Not a better score


Epoch 185,phase: Training: 186it [01:07,  2.76it/s, train acc:  0.817,train loss:  1.723]  

Not a better score


Epoch 186,phase: Training: 187it [01:07,  2.71it/s, train acc:  0.798,train loss:  1.742]  

Not a better score


Epoch 187,phase: Training: 188it [01:07,  2.69it/s, train acc:  0.829,train loss:  1.712]  

Not a better score


Epoch 188,phase: Training: 189it [01:08,  2.75it/s, train acc:  0.795,train loss:  1.745]  

Not a better score


Epoch 189,phase: Training: 190it [01:08,  2.73it/s, train acc:  0.832,train loss:  1.711]  

Not a better score


Epoch 190,phase: Training: 191it [01:09,  2.70it/s, train acc:  0.817,train loss:  1.725]  

Not a better score


Epoch 191,phase: Training: 192it [01:09,  2.64it/s, train acc:  0.800,train loss:  1.740]  

Not a better score


Epoch 192,phase: Training: 193it [01:09,  2.71it/s, train acc:  0.789,train loss:  1.751]  

Not a better score


Epoch 193,phase: Training: 194it [01:10,  2.80it/s, train acc:  0.812,train loss:  1.729]  

Not a better score


Epoch 194,phase: Training: 195it [01:10,  2.85it/s, train acc:  0.808,train loss:  1.732]  

Not a better score


Epoch 195,phase: Training: 196it [01:10,  2.75it/s, train acc:  0.762,train loss:  1.779]  

Not a better score


Epoch 196,phase: Training: 197it [01:11,  2.67it/s, train acc:  0.712,train loss:  1.829]  

Not a better score


Epoch 197,phase: Training: 198it [01:11,  2.62it/s, train acc:  0.837,train loss:  1.709]  

Not a better score


Epoch 198,phase: Training: 199it [01:12,  2.68it/s, train acc:  0.787,train loss:  1.758]  

Not a better score


Epoch 199,phase: Training: 200it [01:12,  2.78it/s, train acc:  0.783,train loss:  1.756]  

Not a better score


Epoch 199,phase: Validation: 200it [01:12,  2.76it/s, val acc:  0.406,val loss:  2.110]    

Not a better score
Training ended returning the best model
Best val acc: 0.4835164835164835, Best val loss: 2.047486503918966, Best train acc: 0.6722222222222223, Best train loss: 1.8831380450207253 





### Part 1.3: Add Pooling Layers
We will now add max pooling layers after each of our convolutional layers. This code has already been provided for you; all you need to do is switch the pooling flag in the hyper-parameters to True,
and choose different values for the pooling filter size and stride. After you applied max pooling, what happened to your results? How did the training accuracy vs. validation accuracy change? What does
that tell you about the effect of max pooling on your network?

In [14]:
# create new model because the other model's params got updated.
model_pooling = SimpleCNN(device=device,pooling=False)
trainer_m_pooling = Trainer(model_pooling, criterion, train_loader, val_loader, optimizer, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_pooling = trainer_m_pooling.train()
# Best val acc: 0.054945054945054944, Best val loss: 2.3989224831263223, Best train acc: 0.041666666666666664, Best train loss: 2.4011570267055347
# Best val acc: 0.15384615384615385, Best val loss: 2.39505668481191, Best train acc: 0.11944444444444445, Best train loss: 2.3947692124739937
# Best val acc: 0.03296703296703297, Best val loss: 2.399859070777893, Best train acc: 0.044444444444444446, Best train loss: 2.3989661154539688
# Best val acc: 0.07692307692307693, Best val loss: 2.3962440888086953, Best train acc: 0.075, Best train loss: 2.397473718809045

# pooling=False
# Best val acc: 0.06593406593406594, Best val loss: 2.397407333056132, Best train acc: 0.08888888888888889, Best train loss: 2.3978021248527197
# Best val acc: 0.14285714285714285, Best val loss: 2.397582252820333, Best train acc: 0.11388888888888889, Best train loss: 2.3991457172062085 

Epoch 1,phase: Training: 2it [00:00,  4.57it/s, train acc:  0.096,train loss:  2.398]  

New best loss, loss is:  2.3971465826034546 acc is:  0.16483516483516483


Epoch 2,phase: Training: 3it [00:01,  3.32it/s, train acc:  0.062,train loss:  2.398]  

New best loss, loss is:  2.397123336791992 acc is:  0.16483516483516483


Epoch 3,phase: Training: 4it [00:01,  2.85it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 4,phase: Training: 5it [00:01,  2.77it/s, train acc:  0.078,train loss:  2.398]  

Not a better score


Epoch 5,phase: Training: 6it [00:02,  2.62it/s, train acc:  0.085,train loss:  2.398]  

New best loss, loss is:  2.39700448513031 acc is:  0.16483516483516483


Epoch 6,phase: Training: 7it [00:02,  2.52it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 7,phase: Training: 8it [00:03,  2.48it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 8,phase: Training: 9it [00:03,  2.55it/s, train acc:  0.076,train loss:  2.398]  

Not a better score


Epoch 9,phase: Training: 10it [00:03,  2.59it/s, train acc:  0.094,train loss:  2.398] 

New best loss, loss is:  2.3969308137893677 acc is:  0.16483516483516483


Epoch 10,phase: Training: 11it [00:04,  2.61it/s, train acc:  0.071,train loss:  2.398] 

Not a better score


Epoch 11,phase: Training: 12it [00:04,  2.73it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 12,phase: Training: 13it [00:05,  2.54it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 13,phase: Training: 14it [00:05,  2.57it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 14,phase: Training: 15it [00:05,  2.63it/s, train acc:  0.090,train loss:  2.399]  

Not a better score


Epoch 15,phase: Training: 16it [00:06,  2.70it/s, train acc:  0.077,train loss:  2.399]  

Not a better score


Epoch 16,phase: Training: 17it [00:06,  2.69it/s, train acc:  0.098,train loss:  2.398]  

Not a better score


Epoch 17,phase: Training: 18it [00:06,  2.64it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 18,phase: Training: 19it [00:07,  2.61it/s, train acc:  0.075,train loss:  2.398]  

Not a better score


Epoch 19,phase: Training: 20it [00:07,  2.66it/s, train acc:  0.054,train loss:  2.398]  

Not a better score


Epoch 20,phase: Training: 21it [00:07,  2.69it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 21,phase: Training: 22it [00:08,  2.68it/s, train acc:  0.057,train loss:  2.398]  

Not a better score


Epoch 22,phase: Training: 23it [00:08,  2.54it/s, train acc:  0.087,train loss:  2.398]  

Not a better score


Epoch 23,phase: Training: 24it [00:09,  2.51it/s, train acc:  0.076,train loss:  2.398]  

Not a better score


Epoch 24,phase: Training: 25it [00:09,  2.54it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 25,phase: Training: 26it [00:10,  2.54it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 26,phase: Training: 27it [00:10,  2.60it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 27,phase: Training: 28it [00:10,  2.63it/s, train acc:  0.079,train loss:  2.398]  

Not a better score


Epoch 28,phase: Training: 29it [00:11,  2.66it/s, train acc:  0.096,train loss:  2.398]  

Not a better score


Epoch 29,phase: Training: 30it [00:11,  2.66it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 30,phase: Training: 31it [00:11,  2.63it/s, train acc:  0.109,train loss:  2.398]  

Not a better score


Epoch 31,phase: Training: 32it [00:12,  2.56it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 32,phase: Training: 33it [00:12,  2.52it/s, train acc:  0.079,train loss:  2.398]  

Not a better score


Epoch 33,phase: Training: 34it [00:13,  2.52it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 34,phase: Training: 35it [00:13,  2.49it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 35,phase: Training: 36it [00:13,  2.49it/s, train acc:  0.062,train loss:  2.398]  

New best loss, loss is:  2.3968003590901694 acc is:  0.16483516483516483


Epoch 36,phase: Training: 37it [00:14,  2.49it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 37,phase: Training: 38it [00:14,  2.46it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 38,phase: Training: 39it [00:15,  2.40it/s, train acc:  0.091,train loss:  2.397]  

Not a better score


Epoch 39,phase: Training: 40it [00:15,  2.32it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 40,phase: Training: 41it [00:16,  2.32it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 41,phase: Training: 42it [00:16,  2.37it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 42,phase: Training: 43it [00:16,  2.47it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 43,phase: Training: 44it [00:17,  2.46it/s, train acc:  0.106,train loss:  2.397]  

Not a better score


Epoch 44,phase: Training: 45it [00:17,  2.44it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 45,phase: Training: 46it [00:18,  2.50it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 46,phase: Training: 47it [00:18,  2.58it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 47,phase: Training: 48it [00:18,  2.58it/s, train acc:  0.103,train loss:  2.398]  

Not a better score


Epoch 48,phase: Training: 49it [00:19,  2.60it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 49,phase: Training: 50it [00:19,  2.58it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 50,phase: Training: 51it [00:19,  2.59it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 51,phase: Training: 52it [00:20,  2.59it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 52,phase: Training: 53it [00:20,  2.64it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 53,phase: Training: 54it [00:21,  2.63it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 54,phase: Training: 55it [00:21,  2.58it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 55,phase: Training: 56it [00:21,  2.60it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 56,phase: Training: 57it [00:22,  2.65it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 57,phase: Training: 58it [00:22,  2.65it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 58,phase: Training: 59it [00:22,  2.65it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 59,phase: Training: 60it [00:23,  2.62it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 60,phase: Training: 61it [00:23,  2.63it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 61,phase: Training: 62it [00:24,  2.65it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 62,phase: Training: 63it [00:24,  2.49it/s, train acc:  0.073,train loss:  2.399]  

Not a better score


Epoch 63,phase: Training: 64it [00:24,  2.44it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 64,phase: Training: 65it [00:25,  2.32it/s, train acc:  0.080,train loss:  2.398]  

Not a better score


Epoch 65,phase: Training: 66it [00:25,  2.31it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 66,phase: Training: 67it [00:26,  2.21it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 67,phase: Training: 68it [00:26,  2.21it/s, train acc:  0.045,train loss:  2.399]  

Not a better score


Epoch 68,phase: Training: 69it [00:27,  2.21it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 69,phase: Training: 70it [00:27,  2.25it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 70,phase: Training: 71it [00:28,  2.31it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 71,phase: Training: 72it [00:28,  2.43it/s, train acc:  0.079,train loss:  2.398]  

Not a better score


Epoch 72,phase: Training: 73it [00:28,  2.49it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 73,phase: Training: 74it [00:29,  2.51it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 74,phase: Training: 75it [00:29,  2.49it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 75,phase: Training: 76it [00:30,  2.53it/s, train acc:  0.099,train loss:  2.399]  

Not a better score


Epoch 76,phase: Training: 77it [00:30,  2.53it/s, train acc:  0.057,train loss:  2.398]  

Not a better score


Epoch 77,phase: Training: 78it [00:30,  2.45it/s, train acc:  0.073,train loss:  2.398]  

Not a better score


Epoch 78,phase: Training: 79it [00:31,  2.39it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 79,phase: Training: 80it [00:31,  2.27it/s, train acc:  0.051,train loss:  2.398]  

Not a better score


Epoch 80,phase: Training: 81it [00:32,  2.18it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 81,phase: Training: 82it [00:32,  2.18it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 82,phase: Training: 83it [00:33,  2.21it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 83,phase: Training: 84it [00:33,  2.18it/s, train acc:  0.075,train loss:  2.398]  

Not a better score


Epoch 84,phase: Training: 85it [00:34,  2.23it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 85,phase: Training: 86it [00:34,  2.31it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 86,phase: Training: 87it [00:34,  2.39it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 87,phase: Training: 88it [00:35,  2.35it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 88,phase: Training: 89it [00:35,  2.33it/s, train acc:  0.099,train loss:  2.398]  

Not a better score


Epoch 89,phase: Training: 90it [00:36,  2.33it/s, train acc:  0.096,train loss:  2.398]  

Not a better score


Epoch 90,phase: Training: 91it [00:36,  2.35it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 91,phase: Training: 92it [00:37,  2.41it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 92,phase: Training: 93it [00:37,  2.30it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 93,phase: Training: 94it [00:37,  2.24it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 94,phase: Training: 95it [00:38,  2.34it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 95,phase: Training: 96it [00:38,  2.39it/s, train acc:  0.073,train loss:  2.398]  

Not a better score


Epoch 96,phase: Training: 97it [00:39,  2.42it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 97,phase: Training: 98it [00:39,  2.34it/s, train acc:  0.104,train loss:  2.398]  

Not a better score


Epoch 98,phase: Training: 99it [00:40,  2.33it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 99,phase: Training: 100it [00:40,  2.24it/s, train acc:  0.071,train loss:  2.398] 

Not a better score


Epoch 100,phase: Training: 101it [00:40,  2.35it/s, train acc:  0.077,train loss:  2.398] 

Not a better score


Epoch 101,phase: Training: 102it [00:41,  2.41it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 102,phase: Training: 103it [00:41,  2.29it/s, train acc:  0.083,train loss:  2.397]  

Not a better score


Epoch 103,phase: Training: 104it [00:42,  2.33it/s, train acc:  0.076,train loss:  2.398]  

Not a better score


Epoch 104,phase: Training: 105it [00:42,  2.39it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 105,phase: Training: 106it [00:42,  2.46it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 106,phase: Training: 107it [00:43,  2.50it/s, train acc:  0.078,train loss:  2.398]  

Not a better score


Epoch 107,phase: Training: 108it [00:43,  2.37it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 108,phase: Training: 109it [00:44,  2.30it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 109,phase: Training: 110it [00:44,  2.38it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 110,phase: Training: 111it [00:45,  2.44it/s, train acc:  0.087,train loss:  2.398]  

Not a better score


Epoch 111,phase: Training: 112it [00:45,  2.49it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 112,phase: Training: 113it [00:45,  2.53it/s, train acc:  0.078,train loss:  2.399]  

Not a better score


Epoch 113,phase: Training: 114it [00:46,  2.48it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 114,phase: Training: 115it [00:46,  2.50it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 115,phase: Training: 116it [00:46,  2.57it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 116,phase: Training: 117it [00:47,  2.60it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 117,phase: Training: 118it [00:47,  2.58it/s, train acc:  0.087,train loss:  2.398]  

Not a better score


Epoch 118,phase: Training: 119it [00:48,  2.54it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 119,phase: Training: 120it [00:48,  2.54it/s, train acc:  0.058,train loss:  2.399]  

Not a better score


Epoch 120,phase: Training: 121it [00:48,  2.54it/s, train acc:  0.047,train loss:  2.398]  

Not a better score


Epoch 121,phase: Training: 122it [00:49,  2.45it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 122,phase: Training: 123it [00:49,  2.40it/s, train acc:  0.080,train loss:  2.398]  

Not a better score


Epoch 123,phase: Training: 124it [00:50,  2.37it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 124,phase: Training: 125it [00:50,  2.23it/s, train acc:  0.097,train loss:  2.398]  

Not a better score


Epoch 125,phase: Training: 126it [00:51,  2.20it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 126,phase: Training: 127it [00:51,  2.32it/s, train acc:  0.067,train loss:  2.399]  

Not a better score


Epoch 127,phase: Training: 128it [00:52,  2.42it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 128,phase: Training: 129it [00:52,  2.44it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 129,phase: Training: 130it [00:52,  2.36it/s, train acc:  0.069,train loss:  2.399]  

Not a better score


Epoch 130,phase: Training: 131it [00:53,  2.28it/s, train acc:  0.074,train loss:  2.398]  

Not a better score


Epoch 131,phase: Training: 132it [00:53,  2.29it/s, train acc:  0.053,train loss:  2.398]  

Not a better score


Epoch 132,phase: Training: 133it [00:54,  2.39it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 133,phase: Training: 134it [00:54,  2.46it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 134,phase: Training: 135it [00:54,  2.41it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 135,phase: Training: 136it [00:55,  2.30it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 136,phase: Training: 137it [00:55,  2.33it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 137,phase: Training: 138it [00:56,  2.38it/s, train acc:  0.058,train loss:  2.398]  

Not a better score


Epoch 138,phase: Training: 139it [00:56,  2.44it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 139,phase: Training: 140it [00:57,  2.43it/s, train acc:  0.077,train loss:  2.398]  

Not a better score


Epoch 140,phase: Training: 141it [00:57,  2.47it/s, train acc:  0.078,train loss:  2.398]  

Not a better score


Epoch 141,phase: Training: 142it [00:57,  2.55it/s, train acc:  0.096,train loss:  2.398]  

Not a better score


Epoch 142,phase: Training: 143it [00:58,  2.55it/s, train acc:  0.054,train loss:  2.398]  

Not a better score


Epoch 143,phase: Training: 144it [00:58,  2.64it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 144,phase: Training: 145it [00:58,  2.58it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 145,phase: Training: 146it [00:59,  2.57it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 146,phase: Training: 147it [00:59,  2.56it/s, train acc:  0.087,train loss:  2.398]  

Not a better score


Epoch 147,phase: Training: 148it [01:00,  2.45it/s, train acc:  0.099,train loss:  2.398]  

Not a better score


Epoch 148,phase: Training: 149it [01:00,  2.40it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 149,phase: Training: 150it [01:01,  2.39it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 150,phase: Training: 151it [01:01,  2.37it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 151,phase: Training: 152it [01:01,  2.40it/s, train acc:  0.072,train loss:  2.398]  

Not a better score


Epoch 152,phase: Training: 153it [01:02,  2.40it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 153,phase: Training: 154it [01:02,  2.43it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 154,phase: Training: 155it [01:03,  2.47it/s, train acc:  0.086,train loss:  2.398]  

Not a better score


Epoch 155,phase: Training: 156it [01:03,  2.30it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 156,phase: Training: 157it [01:04,  2.24it/s, train acc:  0.074,train loss:  2.397]  

Not a better score


Epoch 157,phase: Training: 158it [01:04,  2.25it/s, train acc:  0.073,train loss:  2.399]  

Not a better score


Epoch 158,phase: Training: 159it [01:04,  2.28it/s, train acc:  0.089,train loss:  2.398]  

Not a better score


Epoch 159,phase: Training: 160it [01:05,  2.31it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 160,phase: Training: 161it [01:05,  2.37it/s, train acc:  0.067,train loss:  2.398]  

Not a better score


Epoch 161,phase: Training: 162it [01:06,  2.40it/s, train acc:  0.085,train loss:  2.398]  

Not a better score


Epoch 162,phase: Training: 163it [01:06,  2.45it/s, train acc:  0.091,train loss:  2.398]  

Not a better score


Epoch 163,phase: Training: 164it [01:06,  2.50it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 164,phase: Training: 165it [01:07,  2.46it/s, train acc:  0.076,train loss:  2.398]  

Not a better score


Epoch 165,phase: Training: 166it [01:07,  2.48it/s, train acc:  0.048,train loss:  2.398]  

Not a better score


Epoch 166,phase: Training: 167it [01:08,  2.47it/s, train acc:  0.094,train loss:  2.397]  

Not a better score


Epoch 167,phase: Training: 168it [01:08,  2.51it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 168,phase: Training: 169it [01:08,  2.50it/s, train acc:  0.091,train loss:  2.397]  

Not a better score


Epoch 169,phase: Training: 170it [01:09,  2.43it/s, train acc:  0.082,train loss:  2.398]  

Not a better score


Epoch 170,phase: Training: 171it [01:09,  2.43it/s, train acc:  0.068,train loss:  2.398]  

Not a better score


Epoch 171,phase: Training: 172it [01:10,  2.23it/s, train acc:  0.049,train loss:  2.398]  

Not a better score


Epoch 172,phase: Training: 173it [01:10,  2.02it/s, train acc:  0.062,train loss:  2.397]  

Not a better score


Epoch 173,phase: Training: 174it [01:11,  1.79it/s, train acc:  0.109,train loss:  2.398]  

Not a better score


Epoch 174,phase: Training: 175it [01:12,  1.71it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 175,phase: Training: 176it [01:12,  1.65it/s, train acc:  0.078,train loss:  2.397]  

Not a better score


Epoch 176,phase: Training: 177it [01:13,  1.58it/s, train acc:  0.098,train loss:  2.398]  

Not a better score


Epoch 177,phase: Training: 178it [01:14,  1.56it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 178,phase: Training: 179it [01:14,  1.58it/s, train acc:  0.102,train loss:  2.398]  

Not a better score


Epoch 179,phase: Training: 180it [01:15,  1.55it/s, train acc:  0.089,train loss:  2.397]  

Not a better score


Epoch 180,phase: Training: 181it [01:16,  1.57it/s, train acc:  0.055,train loss:  2.399]  

Not a better score


Epoch 181,phase: Training: 182it [01:16,  1.58it/s, train acc:  0.090,train loss:  2.398]  

Not a better score


Epoch 182,phase: Training: 183it [01:17,  1.59it/s, train acc:  0.098,train loss:  2.398]  

Not a better score


Epoch 183,phase: Training: 184it [01:18,  1.54it/s, train acc:  0.094,train loss:  2.398]  

Not a better score


Epoch 184,phase: Training: 185it [01:18,  1.56it/s, train acc:  0.102,train loss:  2.398]  

Not a better score


Epoch 185,phase: Training: 186it [01:19,  1.58it/s, train acc:  0.109,train loss:  2.398]  

Not a better score


Epoch 186,phase: Training: 187it [01:20,  1.57it/s, train acc:  0.083,train loss:  2.398]  

Not a better score


Epoch 187,phase: Training: 188it [01:20,  1.59it/s, train acc:  0.062,train loss:  2.397]  

Not a better score


Epoch 188,phase: Training: 189it [01:21,  1.63it/s, train acc:  0.069,train loss:  2.398]  

Not a better score


Epoch 189,phase: Training: 190it [01:21,  1.63it/s, train acc:  0.070,train loss:  2.399]  

Not a better score


Epoch 190,phase: Training: 191it [01:22,  1.65it/s, train acc:  0.056,train loss:  2.398]  

Not a better score


Epoch 191,phase: Training: 192it [01:23,  1.65it/s, train acc:  0.070,train loss:  2.398]  

Not a better score


Epoch 192,phase: Training: 193it [01:23,  1.60it/s, train acc:  0.055,train loss:  2.399]  

Not a better score


Epoch 193,phase: Training: 194it [01:24,  1.59it/s, train acc:  0.036,train loss:  2.398]  

Not a better score


Epoch 194,phase: Training: 195it [01:25,  1.49it/s, train acc:  0.036,train loss:  2.399]  

Not a better score


Epoch 195,phase: Training: 196it [01:25,  1.35it/s, train acc:  0.080,train loss:  2.399]  

Not a better score


Epoch 196,phase: Training: 197it [01:26,  1.40it/s, train acc:  0.062,train loss:  2.399]  

Not a better score


Epoch 197,phase: Training: 198it [01:27,  1.48it/s, train acc:  0.062,train loss:  2.398]  

Not a better score


Epoch 198,phase: Training: 199it [01:27,  1.53it/s, train acc:  0.097,train loss:  2.398]  

Not a better score


Epoch 199,phase: Training: 200it [01:28,  1.60it/s, train acc:  0.071,train loss:  2.398]  

Not a better score


Epoch 199,phase: Validation: 200it [01:28,  2.25it/s, val acc:  0.156,val loss:  2.397]    

Not a better score
Training ended returning the best model
Best val acc: 0.16483516483516483, Best val loss: 2.3968003590901694, Best train acc: 0.07777777777777778, Best train loss: 2.397966198299242 





**ANSWER**<br>
Applying pooling reduces the spatial dimensions, helping the model to focus on the most important features. This also helps with avoiding overfitting as the number of parameters are decreased.<br><br>
Without pooling:<br>
Best val acc: 0.4835164835164835, Best val loss: 2.047486503918966, Best train acc: 0.6722222222222223, Best train loss: 1.8831380450207253<br><br>
With max pooling:<br>
Best val acc: 0.4945054945054945, Best val loss: 2.036665161450704, Best train acc: 0.8277777777777777, Best train loss: 1.718546924383744<br><br>
The training accuracy improved significantly. There was also a slight improvement in the validation accuracy.<br>
The model with max pooling has increased accuracy but struggles to generalize to unseen data.

### Part 1.4: Regularize Your Network!
Because this is such a small dataset, your network is likely to overfit the data. Implement the following ways of regularizing your network. Test each one individually, and discuss how it affects your results.

- __Dropout__: In PyTorch, this is implemented using the `torch.nn.dropout` class, which takes a value called the `keep_prob`, representing the probability that an activation will be dropped out. This value should be between 0.1 and 0.5 during training, and 0 for evaluation and testing. An example of how this works is available here. You should add this to your network and try different values to find one that works well.

- __Weight Regularization__: You should try different optimizers, and different weight decay values for optimizers.

- __Early Stopping__: Stop training your model after your validation accuracy starts to plateau or decrease (so you do not overtrain your model). The number of steps can be controlled through the `patience` hyperparameter in the code.

- __Learning Rate Scheduling__: Learning rate scheduling is an important part of training neural networks. There are a lot of techniques for learning rate scheduling. You should try
different schedulers such as `StepLR`, `CosineAnnealing`, etc.

Give your results for each of these regularization techniques, and discuss which ones were the most effective.

**ANSWER**<br>
Testing each individually:<br><br>
Dropout (p=0.5) between the two fully connected layers:

In [None]:
# uncomment the dropout in forward function of the network declaration
model_dropout = SimpleCNN(device=device,pooling=True)
trainer_m_dropout = Trainer(model_dropout, criterion, train_loader, val_loader, optimizer, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_dropout = trainer_m_dropout.train()

Even though the training accuracy is similar to no dropout, the validation accuracy increased significantly meaning that the model generalizes well to new data.<br>
<br>
Weight regularization:<br>
(commenting out the dropout in CNN)




In [None]:
model_sgd = SimpleCNN(device=device,pooling=True)
opt_sgd = torch.optim.SGD(model_sgd.parameters(), lr=LR, weight_decay=0.0005)
trainer_m_sgd = Trainer(model_sgd, criterion, train_loader, val_loader, opt_sgd, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_sgd= trainer_m_sgd.train()

In [None]:
model_adam1 = SimpleCNN(device=device,pooling=True)
opt_adam1 = torch.optim.Adam(model_adam1.parameters(), lr=LR, weight_decay=0.0005)
trainer_m_adam1 = Trainer(model_adam1, criterion, train_loader, val_loader, opt_adam1, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_adam1= trainer_m_adam1.train()
# Best val acc: 0.5054945054945055, Best val loss: 2.046719173590342, Best train acc: 0.6888888888888889, Best train loss: 1.8724546173344487
# Best val acc: 0.5384615384615384, Best val loss: 1.9763936599095662, Best train acc: 0.7694444444444445, Best train loss: 1.7913675826528799 

In [None]:
model_adam2 = SimpleCNN(device=device,pooling=True)
opt_adam2 = torch.optim.Adam(model_adam2.parameters(), lr=LR, weight_decay=0.01)
trainer_m_adam2 = Trainer(model_adam2, criterion, train_loader, val_loader, opt_adam2, num_epoch=NUM_EPOCHS, patience=PATIENCE,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_adam2= trainer_m_adam2.train()

Weight regularization penalizes large weights to avoid strong dependancy on certain features thus prevent overfitting.<br>
Best val acc: 0.5274725274725275, Best val loss: 2.022157371044159, Best train acc: 0.8472222222222222, Best train loss: 1.6982376005338586<br>
With SGD as optimizer and weight_decay=0.0005, validation accuracy incerases from 0.516 to 0.527 when compared to no weight decay, therefore it is safe to say that the model generalizes better.<br>
<br>
Best val acc: 0.5384615384615384, Best val loss: 2.0050004720687866, Best train acc: 0.8944444444444445, Best train loss: 1.6490948096565579<br>
With Adam as optimizer and weight_decay=0.005, validation accuracy increases further when compared to SGD, and reaches 0.538. I believe that this improvement is mainly caused by the adaptive learning rate of Adam, yielding a better convergence.<br>
<br>
Best val acc: 0.5384615384615384, Best val loss: 2.0050004720687866, Best train acc: 0.8944444444444445, Best train loss: 1.6490948096565579 <br>
With Adam as optimizer and weight_decay=0.01, validation accuracy<br>
<br>
Early Stopping:<br>
(without weight decay and dropout)

In [None]:
# es -> early stopping
model_es = SimpleCNN(device=device,pooling=True)
opt_es = torch.optim.Adam(model_es.parameters(), lr=LR)
trainer_m_es = Trainer(model_es, criterion, train_loader, val_loader, opt_es, num_epoch=NUM_EPOCHS, patience=7,batch_size=BATCH_SIZE,lr_scheduler= None)
best_model_es= trainer_m_es.train()

bla bla bla<br>
<br>
Learning Rate Scheduling:<br>

In [None]:
model_lr1 = SimpleCNN(device=device,pooling=True)
opt_lr1 = torch.optim.Adam(model_lr1.parameters(), lr=LR)
scheduler1 = torch.optim.lr_scheduler.StepLR(opt_lr1, step_size=5, gamma=0.1)
trainer_m_lr1 = Trainer(model_lr1, criterion, train_loader, val_loader, opt_lr1, num_epoch=NUM_EPOCHS, patience=7,batch_size=BATCH_SIZE,lr_scheduler= scheduler1)
best_model_lr1= trainer_m_lr1.train()

In [None]:
model_lr2 = SimpleCNN(device=device,pooling=True)
opt_lr2 = torch.optim.Adam(model_lr2.parameters(), lr=LR)
scheduler2 =  torch.optim.lr_scheduler.ExponentialLR(opt_lr2, gamma=0.95)
trainer_m_lr2 = Trainer(model_lr2, criterion, train_loader, val_loader, opt_lr2, num_epoch=NUM_EPOCHS, patience=7,batch_size=BATCH_SIZE,lr_scheduler= scheduler2)
best_model_lr2= trainer_m_lr2.train()

### Part 1.5: Experiment with Your Architecture

All those parameters at the top of `SimpleCNN` still need to be set. You cannot possibly explore all combinations; so try to change some of them individually to get some feeling for their effect (if any).
Optionally, you can explore adding more layers. Report which changes led to the biggest increases and decreases in performance. In particular, what is the effect of making the convolutional layers have (a) a larger filter size, (b) a larger stride and (c) greater depth? How does a pyramidal-shaped network in which the feature maps gradually decrease in height and width but increase in depth compare to a flat architecture, or one with the opposite shape?

### Part 1.6: Optimize Your Architecture
Based on your experience with these tests, try to achieve the best performance that you can on the validation set by varying the hyperparameters, architecture, and regularization methods. You can even (optionally) try to think of additional ways to augment the data, or experiment with techniques like local response normalization layers using `torch.nn.LocalResponseNorm` or weight normalization using the implementation [here](https://pytorch.org/docs/stable/_modules/torch/nn/utils/weight_norm.html#weight_norm). Report the best performance you are able to achieve, and the settings you used to obtain it.

### Part 1.7: Test Your Final Architecture on Variations of the Data
In PyTorch data augmentation can be done dynamically while loading the data using what they call `transforms`. Note that some of the transforms are already implemented. You can
try other transformations, such as the ones shown in Figure 3 and also try different probabilities for these transformations. You may find [this link](https://pytorch.org/vision/stable/transforms.html) helpful. Note that the PyTorch data loader refreshes the
data in each epoch and apply different transformations to the different instances.

Now that you have optimized your architecture, you are ready to test it on augmented data!
Report your performance on each of the transformed datasets. Are you surprised by any of the results?
Which transformations is your network most invariant to, and which lead it to be unable to recognize the images? What does that tell you about what features your network has learned to use to recognize artists’ images?

## Part 2: Transfer Learning with Deep Network

In this part, you will fine-tune AlexNet model pretrained on ImageNet to recognize faces. For the sake of simplicity you may use [the pretrained AlexNet model](https://pytorch.org/hub/pytorch_vision_alexnet/) provided in PyTorch Hub. You will
work with a subset of the FaceScrub dataset. The subset of male actors is [here](http://www.cs.toronto.edu/~guerzhoy/321/proj1/subset_actors.txt) and the subset of female actors is [here](http://www.cs.toronto.edu/~guerzhoy/321/proj1/subset_actresses.txt). The dataset consists of URLs of images with faces, as well as the bounding boxes of the faces. The format of the bounding box is as follows (from the FaceScrub `readme.txt` file):

` 
The format is x1,y1,x2,y2, where (x1,y1) is the coordinate of the top-left corner of the bounding box and (x2,y2) is that of the bottom-right corner, with (0,0) as the top-left corner of the image. Assuming the image is represented as a Python NumPy array I, a face
in I can be obtained as I[y1:y2, x1:x2].
`

You may find it helpful to use and/or modify [this script](www.cs.toronto.edu/~guerzhoy/321/proj1/get_data.py) for downloading the image data. Note that you should crop out the images of the faces and resize them to appropriate size before proceeding further. Make sure to check the SHA-256 hashes, and make sure to only keep faces for which the hashes match. You should set aside 70 images per faces for the training set, and use the rest for the test and validation set.

### Part 2.1: Train a Multilayer Perceptron
First resize the images to 28 × 28 pixels. Use a fully-connected neural network with a single hidden layer of size 300 units.
Below, include the learning curve for the test, training, and validation sets, and the final performance classification on the test set. Include a text description of your system. In particular, describe how you preprocessed the input and initialized the weights, what activation function you used, and what the exact architecture of the network that you selected was. You might get performances close to 80-85% accuracy rate.

### Part 2.2: AlexNet as a Fixed Feature Extractor
Extract the values of the activations of AlexNet on the face images. Use those as features in order to perform face classification: learn a fully-connected neural network that takes in the activations of the units in the AlexNet layer as inputs, and outputs the name of the person. Below, include a description of the system you built and its performance. It is recommended to start out with only using the `conv4` activations. Using `conv4` is sufficient here.

### Part 2.3: Visualize Weights
Train two networks the way you did in Part 2.1. Use 300 and 800 hidden units in the hidden layer. Visualize 2 different hidden features (neurons) for each of the two settings, and briefly explain why they are interesting. A sample visualization of a hidden feature is shown below. Note that you probably need to use L2 regularization while training to obtain nice weight visualizations.

![](figures/figure2.jpg)

### Part 2.4: Finetuning AlexNet
Train two networks the way you did in Part 2.1. Use 300 and 800 hidden units in the hidden layer. Visualize 2 different hidden features (neurons) for each of the two settings, and briefly explain why they are interesting. A sample visualization of a hidden feature is shown in Figure 4. Note that you probably need to use L2 regularization while training to obtain nice weight visualizations.

### Part 2.5: Bonus: Gradient Visualization
Here, you will use [Utku Ozbulak’s PyTorch CNN Visualizations Library](https://github.com/utkuozbulak/pytorch-cnn-visualizations/) to visualize the important parts of the input image for a particular output class. In particular, just select a specific picture of an actor, and then using your trained network in Part 2.4, perform Gradient visualization with guided backpropagation to understand the prediction for that actor with respect to the input image. Comment on your results.

## What to Turn In
You have two options for submission:
1) Provide all the relevant answers to questions, images, figures, etc, in this Jupyter notebook, convert the jupyter notebook into a PDF, and upload the PDF.
2) Write all the answers to the questions and any relevant figures in a LaTeX report, convert the report to a PDF, and upload a zip file containing both the jupyter notebook and the report. 

## Grading
The assignment will be graded out of `100` points: `0` (no submission), `20` (an attempt at a solution), `40` (a partially correct solution), `60` (a mostly correct solution), `80` (a correct solution), `100` (a particularly creative or insightful solution). The grading depends on both the content and clarity of your report.