# PyTorch Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: Fabian Alejandro Toledo

### Convolutional Neural Network

Adapt the CNN example for MNIST digit classfication from Notebook 3A. 
Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> flatten ->
fully connected (256 hidden units) -> nonlinearity (ReLU) ->  
fully connected (10 hidden units) -> softmax 

Note: The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

In [1]:
### YOUR CODE HERE ###
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from tqdm.notebook import tqdm, trange

class MNIST_CNN2(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(64*7*7, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        # conv layer 1
        x = self.conv1(x)
        x = F.relu(x)

        # conv layer 2
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)

        # conv layer 3
        x = self.conv3(x)
        x = F.relu(x)

        # conv layer 4
        x = self.conv4(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        
        # fc layer 1
        x = x.view(-1, 64*7*7)
        x = self.fc1(x)
        x = F.relu(x)
        
        # fc layer 2
        x = self.fc2(x)
        return x       


In [2]:

# Load the data
mnist_train = datasets.MNIST(root="./datasets", train=True, transform=transforms.ToTensor(), download=True)
mnist_test = datasets.MNIST(root="./datasets", train=False, transform=transforms.ToTensor(), download=True)
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=100, shuffle=False)

## Training
# Instantiate model  
model = MNIST_CNN2()

epochs = 3

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Iterate through train set minibatchs 
for epoch in trange(epochs):
    for images, labels in tqdm(train_loader):
        # Zero out the gradients
        optimizer.zero_grad()

        # Forward pass
        x = images
        y = model(x)
        loss = criterion(y, labels)
        # Backward pass
        loss.backward()
        optimizer.step()
        
        print(f'Epoch {epoch} Training Loss: {loss}', end='\r')
    print()
    
    ## Testing
    correct = 0
    total = len(mnist_test)

    with torch.no_grad():
        # Iterate through test set minibatchs 
        for images, labels in tqdm(test_loader):
            # Forward pass
            x = images
            y = model(x)

            predictions = torch.argmax(y, dim=1)
            correct += torch.sum((predictions == labels).float())

    print(f'Epoch {epoch} Test accuracy: {correct/total}')
    print()

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/600 [00:00<?, ?it/s]

Epoch 0 Training Loss: 0.056323856115341187


  0%|          | 0/100 [00:00<?, ?it/s]

Epoch 0 Test accuracy: 0.9872000217437744



  0%|          | 0/600 [00:00<?, ?it/s]

Epoch 1 Training Loss: 0.0437178723514080057


  0%|          | 0/100 [00:00<?, ?it/s]

Epoch 1 Test accuracy: 0.987500011920929



  0%|          | 0/600 [00:00<?, ?it/s]

Epoch 2 Training Loss: 0.0248646251857280734


  0%|          | 0/100 [00:00<?, ?it/s]

Epoch 2 Test accuracy: 0.9909999966621399



### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

Test Accuracy:

Logistic regression: aprox 90%

MLP: aprox 92%

CNN: aprox 98% in 1 epoch and 99% in 3 epochs


Training time:

Logistic regression: aprox 5s per epoch

MLP: aprox 9s per epoch  (2 times Logistic time)

CNN: aprox 130s per epoch (26 times Logistic time)


2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: The total of trainable parameters counts each element in a tensor. For example, a weight matrix that is 10x5 has 50 trainable parameters.*

CONV1: ( 1 * 32 * 3 * 3 + 32) = 320

CONV2: (32 * 32 * 3 * 3 + 32) = 9248

CONV3: (32 * 64 * 3 * 3 + 64) = 18496

CONV4: (64 * 64 * 3 * 3 + 64) = 36928

FC1:   (64 * 7 * 7 * 256 + 256) = 803072

FC2:   (256 * 10 + 10) = 2570

<b>Total: 870634 parameters</b>


3\. When would you use a CNN versus a logistic regression model or an MLP?

I would use a CNN when the input are images or the input data has the same characteristics of an image.
