<a href="https://colab.research.google.com/github/armanheydari/Advance-Deep-Learning_Winter-2024/blob/main/Assignment2/cmpt489_828_a2_q2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**CMPT 489/828 Assignment 2**

Follow the instructions in this notebook and complete the missing code.

**NOTE: Do Not Change Any Provided Code or Given Variable Names! Except changing gpu.**

**Q2**. Create a simple convolutional neural network using **torch.nn** module (**20 points**)


In [1]:
import torch
from torch import nn
import torch.optim as optim
from torch.nn import functional as F
import torchvision
from torchvision import transforms

# select gpu if possible
# you can change "cuda:0" to select other gpus
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cpu


In [2]:
# load CIFAR-10 dataset with pytorch
# set batch_size
batch_size = 100
# convert to tensor, normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_id = list(range(4000))
val_id = list(range(4000, 5000))
test_id = list(range(500))

# subset dataset and create dataloader with batch_size
train_sub_set = torch.utils.data.Subset(trainset, train_id)
val_sub_set = torch.utils.data.Subset(trainset, val_id)
test_sub_set = torch.utils.data.Subset(testset, test_id)

train_loader = torch.utils.data.DataLoader(train_sub_set, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_sub_set, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_sub_set, batch_size=batch_size, shuffle=True)

# check data size, should be (C,H,W), class map only useful for visualization and sanity checks
image_size = trainset[0][0].size()
class_map = {0: 'plane', 1: 'car', 2: 'bird', 3: 'cat', 4: 'deer', 5: 'dog', 6: 'frog', 7: 'horse', 8: 'ship',
             9: 'truck'}

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:06<00:00, 25614879.55it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


a. Implement a CNN model (**4 points**)

In [3]:
class SimpleCnn(nn.Module):
    def __init__(self, nb_hidden):
        super().__init__()
        ###############################################################################
        # TODO:                                                                       #
        # 1. create conv1 layer:                                                      #
        #   with 32 output channels, 5x5 kernels, use default for others              #
        # 2. create conv2 layer:                                                      #
        #   with 64 output channels, 5x5 kernels, use default for others              #
        # 3. create linear layer fc1: with nb_hidden output channels                  #
        # 4. create linear layer fc2: with 10 output channels                         #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5)
        self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5)
        self.fully_connected1 = nn.Linear(in_features=64*2*2, out_features=nb_hidden)
        self.fully_connected2 =  nn.Linear(in_features=nb_hidden, out_features=10)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    def forward(self, x):
        """
        forward step
        :param x: input tensor
        :return: output tensor
        """
        ###############################################################################
        # TODO:                                                                       #
        # 1. create first convolution block c1_out:                                   #
        #   relu(max_pool(conv1(), kernel_size=3, stride=3)                           #
        # 2. create second convolution block c2_out:                                  #
        #   relu(max_pool(conv2(), kernel_size=2, stride=2)                           #
        # 3. create fully connected block fc1_out: relu(fc1())                        #
        # 4. connect last fully connected layer fc2_out: fc2()                        #
        # 5. return fc2_out                                                           #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        c1_out = torch.relu(torch.max_pool2d(self.conv_layer1(x), kernel_size=3, stride=3))
        c2_out = torch.relu(torch.max_pool2d(self.conv_layer2(c1_out), kernel_size=2, stride=2))
        # We have to flat the output of convolutions but we keep the first dimension which is the number of batch size
        flat=c2_out.flatten(start_dim=1)
        fc1_out = torch.relu(self.fully_connected1(flat))
        fc2_out = self.fully_connected2(fc1_out)
        return fc2_out
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

b. Train function (**8 points**)

In [4]:
# train function
def train_model(model, train_loader, val_loader, nb_epochs=100):
    ###############################################################################
    # TODO:                                                                       #
    # 1. create loss: criterion use CrossEntropyLoss()                            #
    # 2. create optimizer: optimizer use optim.Adam()                             #
    ###############################################################################
    # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters())
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # initialize loss/acc dict storage
    history = {
        "train_loss": [],
        "train_acc": [],
        "val_loss": [],
        "val_acc": []
    }
    train_steps = len(train_loader.dataset) // batch_size
    val_steps = len(val_loader.dataset) // batch_size

    # run for nb_epochs
    for e in range(nb_epochs):
        # set the model in training mode
        model.train()
        # initialize the total training and validation loss
        epoch_train_loss = 0
        epoch_val_loss = 0
        # initialize the number of correct predictions in the training
        # and validation step
        train_correct = 0
        val_correct = 0

        for x, y in train_loader:
            ###############################################################################
            # TODO:                                                                       #
            # 1. move x, y to device                                                      #
            # 2. clear optimizer gradients                                                #
            # 3. predict batch x, save in pred                                            #
            # 4. calculate batch loss, save in loss                                       #
            # 5. step optimizer                                                           #
            ###############################################################################
            # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
            x.to(device)
            y.to(device)
            optimizer.zero_grad()
            pred = model.forward(x)
            loss = criterion(pred, y)
            loss.backward()
            optimizer.step()
            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # add the loss to the total training loss so far and
            # calculate the number of correct predictions
            epoch_train_loss += loss
            train_correct += (pred.argmax(1) == y).type(torch.float).sum().item()

        # switch off autograd for validation
        with torch.no_grad():
            # set the model in evaluation mode
            model.eval()
            # loop over the validation set
            for (x, y) in val_loader:
                ###############################################################################
                # TODO:                                                                       #
                # 1. move x, y to device                                                      #
                # 2. predict batch x, save in pred                                            #
                # 3. update epoch_val_loss                                                    #
                # 5. update val_correct                                                       #
                ###############################################################################
                # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
                x.to(device)
                y.to(device)
                pred = model.forward(x)
                epoch_val_loss += criterion(pred, y)
                val_correct += (pred.argmax(1) == y).type(torch.float).sum().item()
                # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


        # calculate the average epoch training and validation loss
        mean_train_loss = epoch_train_loss / train_steps
        mean_val_loss = epoch_val_loss / val_steps
        # calculate the training and validation accuracy
        train_correct = train_correct / len(train_loader.dataset)
        val_correct = val_correct / len(val_loader.dataset)
        # update our training history
        history["train_loss"].append(mean_train_loss.cpu().detach().numpy())
        history["train_acc"].append(train_correct)
        history["val_loss"].append(mean_val_loss.cpu().detach().numpy())
        history["val_acc"].append(val_correct)
        # print the model training and validation information
        print("[INFO] EPOCH: {}/{}".format(e + 1, nb_epochs))
        print("Train loss: {:.6f}, Train accuracy: {:.4f}".format(
            mean_train_loss, train_correct))
        print("Val loss: {:.6f}, Val accuracy: {:.4f}\n".format(
            mean_val_loss, val_correct))

c. Test function (**3 points**)

In [5]:
def test(model, test_loader):
    # we can now evaluate the network on the test set
    print("[INFO] testing SimpleCnn...")
    # turn off autograd for testing evaluation
    ###############################################################################
    # TODO:                                                                       #
    # 1. initialize test correct counter: test_correct                            #
    # 2. switch off autograd                                                      #
    # 3. put model in evaluation mode                                             #
    # 4. loop over test_loader                                                    #
    # 5. move data to device                                                      #
    # 6. predict batch x, save in pred                                            #
    # 7. update test_correct                                                      #
    # 8. calculate average test accuracy                                          #
    # 9. print average test accuracy                                              #
    ###############################################################################
    # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    test_correct=0
    torch.no_grad()
    model.eval()
    for x,y in test_loader:
      x.to(device)
      y.to(device)
      pred = model.forward(x)
      test_correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_accuracy = 100*test_correct/len(test_loader.dataset)
    print('Test accuracy:', test_accuracy)
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

d. Train and test your model (**3 points**)

In [None]:
###############################################################################
# TODO:                                                                       #
# 1. create model instance with 500 neurons for first linear layer            #
# 2. call training loop, train for 300 epochs                                 #
# 3. call test function                                                       #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(500)
train_model(model, train_loader, val_loader, 300)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/300
Train loss: 2.040175, Train accuracy: 0.2497
Val loss: 1.858273, Val accuracy: 0.3210

[INFO] EPOCH: 2/300
Train loss: 1.734391, Train accuracy: 0.3693
Val loss: 1.645134, Val accuracy: 0.4000

[INFO] EPOCH: 3/300
Train loss: 1.582768, Train accuracy: 0.4243
Val loss: 1.636455, Val accuracy: 0.4230

[INFO] EPOCH: 4/300
Train loss: 1.468895, Train accuracy: 0.4705
Val loss: 1.544477, Val accuracy: 0.4290

[INFO] EPOCH: 5/300
Train loss: 1.389578, Train accuracy: 0.5042
Val loss: 1.601836, Val accuracy: 0.4370

[INFO] EPOCH: 6/300
Train loss: 1.327555, Train accuracy: 0.5212
Val loss: 1.460035, Val accuracy: 0.4550

[INFO] EPOCH: 7/300
Train loss: 1.196701, Train accuracy: 0.5737
Val loss: 1.434071, Val accuracy: 0.4970

[INFO] EPOCH: 8/300
Train loss: 1.116244, Train accuracy: 0.6012
Val loss: 1.473824, Val accuracy: 0.5080

[INFO] EPOCH: 9/300
Train loss: 1.051683, Train accuracy: 0.6225
Val loss: 1.442233, Val accuracy: 0.4940

[INFO] EPOCH: 10/300
Train loss: 0.97

d. Experiment with model architecture (**2 points**)

Try different number of neurons for the first linear layer. Do you notice any performance difference?

Also try a CNN with an additional conv block (conv+pool+relu).

You can use different kernerl sizes for your conv/pool layer.


Do you notice any performance difference?

With 100 neurons is the following. The validation is not different and the model is still overfitting, so I also tried with less neurons. but, the train is less than 1.0 in 30 epochs which is because of decreasing the parameters.

In [None]:
###############################################################################
# TODO:                                                                       #
# 1. Experiment with different nb_hidden values for your CNN                  #
# Take note of the performance changes                                        #
# 2. Create another CNN with one more conv block                              #
# Take note of the performance changes                                        #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(100)
train_model(model, train_loader, val_loader, 30)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/30
Train loss: 2.099136, Train accuracy: 0.2240
Val loss: 1.881614, Val accuracy: 0.3110

[INFO] EPOCH: 2/30
Train loss: 1.773329, Train accuracy: 0.3515
Val loss: 1.751537, Val accuracy: 0.3600

[INFO] EPOCH: 3/30
Train loss: 1.629617, Train accuracy: 0.4158
Val loss: 1.654263, Val accuracy: 0.4000

[INFO] EPOCH: 4/30
Train loss: 1.535014, Train accuracy: 0.4462
Val loss: 1.543830, Val accuracy: 0.4480

[INFO] EPOCH: 5/30
Train loss: 1.475860, Train accuracy: 0.4708
Val loss: 1.558472, Val accuracy: 0.4460

[INFO] EPOCH: 6/30
Train loss: 1.394842, Train accuracy: 0.5010
Val loss: 1.494333, Val accuracy: 0.4710

[INFO] EPOCH: 7/30
Train loss: 1.326843, Train accuracy: 0.5260
Val loss: 1.481632, Val accuracy: 0.4900

[INFO] EPOCH: 8/30
Train loss: 1.262958, Train accuracy: 0.5495
Val loss: 1.449636, Val accuracy: 0.4840

[INFO] EPOCH: 9/30
Train loss: 1.202575, Train accuracy: 0.5705
Val loss: 1.432058, Val accuracy: 0.5010

[INFO] EPOCH: 10/30
Train loss: 1.163404, Trai

With 1000 neurons is the following. it's not different from 500.

In [None]:
###############################################################################
# TODO:                                                                       #
# 1. Experiment with different nb_hidden values for your CNN                  #
# Take note of the performance changes                                        #
# 2. Create another CNN with one more conv block                              #
# Take note of the performance changes                                        #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(1000)
train_model(model, train_loader, val_loader, 30)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/30
Train loss: 2.021288, Train accuracy: 0.2645
Val loss: 1.855216, Val accuracy: 0.3020

[INFO] EPOCH: 2/30
Train loss: 1.701333, Train accuracy: 0.3777
Val loss: 1.680357, Val accuracy: 0.3700

[INFO] EPOCH: 3/30
Train loss: 1.568032, Train accuracy: 0.4377
Val loss: 1.588967, Val accuracy: 0.4310

[INFO] EPOCH: 4/30
Train loss: 1.437113, Train accuracy: 0.4773
Val loss: 1.534688, Val accuracy: 0.4380

[INFO] EPOCH: 5/30
Train loss: 1.331827, Train accuracy: 0.5245
Val loss: 1.553002, Val accuracy: 0.4500

[INFO] EPOCH: 6/30
Train loss: 1.250993, Train accuracy: 0.5523
Val loss: 1.476245, Val accuracy: 0.4730

[INFO] EPOCH: 7/30
Train loss: 1.144886, Train accuracy: 0.5885
Val loss: 1.430284, Val accuracy: 0.4900

[INFO] EPOCH: 8/30
Train loss: 1.034532, Train accuracy: 0.6362
Val loss: 1.456648, Val accuracy: 0.4770

[INFO] EPOCH: 9/30
Train loss: 0.970047, Train accuracy: 0.6472
Val loss: 1.444267, Val accuracy: 0.4990

[INFO] EPOCH: 10/30
Train loss: 0.822246, Trai

With 10 neurons is the following. both train and validation are decreased, so it's not the best choise. But we still see the overfitting signs as the accuracy of validation and train are far from each other. It means maybe the problem is not the hidden layer's neurons here and we have to solve it in other place of our model architecture.

In [None]:
###############################################################################
# TODO:                                                                       #
# 1. Experiment with different nb_hidden values for your CNN                  #
# Take note of the performance changes                                        #
# 2. Create another CNN with one more conv block                              #
# Take note of the performance changes                                        #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(10)
train_model(model, train_loader, val_loader, 30)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/30
Train loss: 2.244178, Train accuracy: 0.1583
Val loss: 2.149923, Val accuracy: 0.1940

[INFO] EPOCH: 2/30
Train loss: 2.018936, Train accuracy: 0.2492
Val loss: 1.945027, Val accuracy: 0.2760

[INFO] EPOCH: 3/30
Train loss: 1.849908, Train accuracy: 0.3370
Val loss: 1.820676, Val accuracy: 0.3210

[INFO] EPOCH: 4/30
Train loss: 1.736654, Train accuracy: 0.3775
Val loss: 1.749640, Val accuracy: 0.3560

[INFO] EPOCH: 5/30
Train loss: 1.649577, Train accuracy: 0.3987
Val loss: 1.695055, Val accuracy: 0.3720

[INFO] EPOCH: 6/30
Train loss: 1.594398, Train accuracy: 0.4180
Val loss: 1.653428, Val accuracy: 0.4020

[INFO] EPOCH: 7/30
Train loss: 1.525172, Train accuracy: 0.4535
Val loss: 1.630250, Val accuracy: 0.4040

[INFO] EPOCH: 8/30
Train loss: 1.475613, Train accuracy: 0.4612
Val loss: 1.596313, Val accuracy: 0.4010

[INFO] EPOCH: 9/30
Train loss: 1.418852, Train accuracy: 0.4888
Val loss: 1.558313, Val accuracy: 0.4330

[INFO] EPOCH: 10/30
Train loss: 1.378600, Trai

Now, we add a new convolution block. as the last output of conv layers shape is [100, 64, 2, 2], then we can only add a convolution layer with kernel=2 or kernel=1, and it doesn't make sense to use a max pooling. I tried both of them in seprate cells and the results didn't enhance so much. I didn't change the relu or Adam, as they are best practice. I also added a dropout layer after the linear to prevent overfitting.

In [14]:
class SimpleCnn(nn.Module):
    def __init__(self, nb_hidden):
        super().__init__()
        ###############################################################################
        # TODO:                                                                       #
        # 1. create conv1 layer:                                                      #
        #   with 32 output channels, 5x5 kernels, use default for others              #
        # 2. create conv2 layer:                                                      #
        #   with 64 output channels, 5x5 kernels, use default for others              #
        # 3. create linear layer fc1: with nb_hidden output channels                  #
        # 4. create linear layer fc2: with 10 output channels                         #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5)
        self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5)
        self.conv_layer3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=2)
        self.fully_connected1 = nn.Linear(in_features=128*1*1, out_features=nb_hidden)
        self.fully_connected2 =  nn.Linear(in_features=nb_hidden, out_features=10)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    def forward(self, x):
        """
        forward step
        :param x: input tensor
        :return: output tensor
        """
        ###############################################################################
        # TODO:                                                                       #
        # 1. create first convolution block c1_out:                                   #
        #   relu(max_pool(conv1(), kernel_size=3, stride=3)                           #
        # 2. create second convolution block c2_out:                                  #
        #   relu(max_pool(conv2(), kernel_size=2, stride=2)                           #
        # 3. create fully connected block fc1_out: relu(fc1())                        #
        # 4. connect last fully connected layer fc2_out: fc2()                        #
        # 5. return fc2_out                                                           #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        c1_out = torch.relu(torch.max_pool2d(self.conv_layer1(x), kernel_size=3, stride=3))
        c2_out = torch.relu(torch.max_pool2d(self.conv_layer2(c1_out), kernel_size=2, stride=2))
        c3_out = torch.relu(self.conv_layer3(c2_out))
        # We have to flat the output of convolutions but we keep the first dimension which is the number of batch size
        flat = c3_out.flatten(start_dim=1)
        fc1_out = torch.relu(self.fully_connected1(flat))
        fc2_out = self.fully_connected2(fc1_out)
        return fc2_out
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


###############################################################################
# TODO:                                                                       #
# 1. Experiment with different nb_hidden values for your CNN                  #
# Take note of the performance changes                                        #
# 2. Create another CNN with one more conv block                              #
# Take note of the performance changes                                        #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(128)
train_model(model, train_loader, val_loader, 30)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/30
Train loss: 2.126860, Train accuracy: 0.1965
Val loss: 2.003457, Val accuracy: 0.2270

[INFO] EPOCH: 2/30
Train loss: 1.849889, Train accuracy: 0.3150
Val loss: 1.781437, Val accuracy: 0.3370

[INFO] EPOCH: 3/30
Train loss: 1.702299, Train accuracy: 0.3673
Val loss: 1.672747, Val accuracy: 0.3900

[INFO] EPOCH: 4/30
Train loss: 1.604128, Train accuracy: 0.3960
Val loss: 1.638407, Val accuracy: 0.4080

[INFO] EPOCH: 5/30
Train loss: 1.503809, Train accuracy: 0.4472
Val loss: 1.558676, Val accuracy: 0.4410

[INFO] EPOCH: 6/30
Train loss: 1.428762, Train accuracy: 0.4795
Val loss: 1.558974, Val accuracy: 0.4350

[INFO] EPOCH: 7/30
Train loss: 1.356375, Train accuracy: 0.5032
Val loss: 1.524072, Val accuracy: 0.4630

[INFO] EPOCH: 8/30
Train loss: 1.292637, Train accuracy: 0.5272
Val loss: 1.511995, Val accuracy: 0.4540

[INFO] EPOCH: 9/30
Train loss: 1.234295, Train accuracy: 0.5513
Val loss: 1.512863, Val accuracy: 0.4650

[INFO] EPOCH: 10/30
Train loss: 1.195991, Trai

In [15]:
class SimpleCnn(nn.Module):
    def __init__(self, nb_hidden):
        super().__init__()
        ###############################################################################
        # TODO:                                                                       #
        # 1. create conv1 layer:                                                      #
        #   with 32 output channels, 5x5 kernels, use default for others              #
        # 2. create conv2 layer:                                                      #
        #   with 64 output channels, 5x5 kernels, use default for others              #
        # 3. create linear layer fc1: with nb_hidden output channels                  #
        # 4. create linear layer fc2: with 10 output channels                         #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5)
        self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5)
        self.fully_connected1 = nn.Linear(in_features=128*2*2, out_features=nb_hidden)
        self.fully_connected2 =  nn.Linear(in_features=nb_hidden, out_features=10)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    def forward(self, x):
        """
        forward step
        :param x: input tensor
        :return: output tensor
        """
        ###############################################################################
        # TODO:                                                                       #
        # 1. create first convolution block c1_out:                                   #
        #   relu(max_pool(conv1(), kernel_size=3, stride=3)                           #
        # 2. create second convolution block c2_out:                                  #
        #   relu(max_pool(conv2(), kernel_size=2, stride=2)                           #
        # 3. create fully connected block fc1_out: relu(fc1())                        #
        # 4. connect last fully connected layer fc2_out: fc2()                        #
        # 5. return fc2_out                                                           #
        ###############################################################################
        # *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        c1_out = torch.relu(torch.max_pool2d(self.conv_layer1(x), kernel_size=3, stride=3))
        c2_out = torch.relu(torch.max_pool2d(self.conv_layer2(c1_out), kernel_size=2, stride=2))
        c3_out = torch.relu(nn.Conv2d(in_channels=64, out_channels=128, kernel_size=1)(c2_out))
        # We have to flat the output of convolutions but we keep the first dimension which is the number of batch size
        flat=c3_out.flatten(start_dim=1)
        fc1_out = torch.relu(self.fully_connected1(flat))
        drop_out = nn.Dropout(0.5)(fc1_out)
        fc2_out = self.fully_connected2(drop_out)
        return fc2_out
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


###############################################################################
# TODO:                                                                       #
# 1. Experiment with different nb_hidden values for your CNN                  #
# Take note of the performance changes                                        #
# 2. Create another CNN with one more conv block                              #
# Take note of the performance changes                                        #
###############################################################################
# *****BEGIN YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
model = SimpleCnn(128*2*2)
train_model(model, train_loader, val_loader, 30)
test(model, test_loader)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

[INFO] EPOCH: 1/30
Train loss: 2.294317, Train accuracy: 0.1008
Val loss: 2.241954, Val accuracy: 0.1090

[INFO] EPOCH: 2/30
Train loss: 2.211147, Train accuracy: 0.1232
Val loss: 2.160130, Val accuracy: 0.1710

[INFO] EPOCH: 3/30
Train loss: 2.144796, Train accuracy: 0.1852
Val loss: 2.086205, Val accuracy: 0.2140

[INFO] EPOCH: 4/30
Train loss: 2.079909, Train accuracy: 0.2092
Val loss: 2.024335, Val accuracy: 0.2460

[INFO] EPOCH: 5/30
Train loss: 2.036463, Train accuracy: 0.2100
Val loss: 2.020935, Val accuracy: 0.2160

[INFO] EPOCH: 6/30
Train loss: 2.004089, Train accuracy: 0.2198
Val loss: 1.958112, Val accuracy: 0.2320

[INFO] EPOCH: 7/30
Train loss: 1.986571, Train accuracy: 0.2157
Val loss: 1.945525, Val accuracy: 0.2330

[INFO] EPOCH: 8/30
Train loss: 1.952010, Train accuracy: 0.2367
Val loss: 1.908889, Val accuracy: 0.2410

[INFO] EPOCH: 9/30
Train loss: 1.929418, Train accuracy: 0.2355
Val loss: 1.929426, Val accuracy: 0.2440

[INFO] EPOCH: 10/30
Train loss: 1.904181, Trai

In [16]:
import gc
# clean up, release memory
model.cpu()
del model
gc.collect()
torch.cuda.empty_cache()