<img src="img/dsci572_header.png" width="600">

# Lab 3: Convolutional Neural Networks

## Instructions
<hr>

rubric={mechanics:3}

- Follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/)

- Upload a PDF version of your lab notebook to Gradescope, in addition to the .ipynb file.

- Add a link to your GitHub repository here:

## Imports
<hr>

In [None]:
import numpy as np
import pandas as pd
import torch
from torch import nn, optim
from torchvision import datasets, transforms, utils
from torchsummary import summary
import matplotlib.pyplot as plt

plt.rcParams.update({'axes.grid': False})

## Exercise 1: Filters and Convolutions
<hr>

Since I'm a huge fan of cats (small or big), we're going to use the image of a Cheetah ([source](https://unsplash.com/photos/hksj-fvUVek)) for this exercise:

In [None]:
image = torch.from_numpy(plt.imread("img/cheetah.png"))[:, :, 0]
plt.imshow(image, cmap='gray')
plt.axis('off');

In [None]:
image.shape

For each of the filters given below, convolve the above image with the given filter/kernel and briefly discuss why the results look the way they do. You'll need to:

1. Created a `Conv2D` layer with PyTorch.

2. Manually change the kernel weights. Weights in a `Conv2D` layer are 4D tensors (the 4 dimensions are: `[num_images=1, num_channels=1, kernel_rows, kernel_cols]`). I've given example code defining such a 4D tensor below. Functions that will help you create tensors: `torch.ones()`, `torch.zeros()`, `torch.full()`, etc. (we have much of the same functionality as we did in `NumPy`).
3. Use the provided code to plot the original and convolved images.
4. Explain the result in 1-2 sentences.

I've provided an example below to get you started. You can assume the default `stride` and `padding` in the `Conv2D` layer.

> The pedagogical goal here is to help you better understand what filters/kernels actually are and how they help us identify useful structure (like lines, curves, shapes, etc.) in images.

In [None]:
def plot_convolution(image, conv_layer):
    """
    Convolve kernel over image and plot.

    Parameters
    ----------
    image : torch.Tensor
        Image to filter with kernel.
    conv_layer : function
        A PyTorch Conv2D layer to apply to image.
    
    Returns
    -------
    matplotlib.image
    """

    conv_image = conv_layer(image[None, None, :]).detach().squeeze()
    fig, (ax1, ax2) = plt.subplots(figsize=(8, 4), ncols=2)
    ax1.imshow(image, cmap='gray'); ax1.axis('off'); ax1.set_title("Original")
    ax2.imshow(conv_image, cmap='gray'); ax2.axis('off'); ax2.set_title("Filtered")
    plt.tight_layout()

**Example:**

The kernel is a row vector of ten 0.1's, shape `(1, 10)`:

$$\text{kernel} = \begin{bmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{bmatrix}$$

In [None]:
conv_layer = torch.nn.Conv2d(1, 1, kernel_size=(1, 10))
conv_layer.weight.detach_()
kernel = torch.full((1, 1, 1, 10), 0.1)
conv_layer.weight[:] = kernel
plot_convolution(image, conv_layer)

**Example answer:**

The filter is a horizontal bar of 0.1s. Therefore I would expect a blurring in the horizontal direction, meaning the _vertical_ edges get blurred (because these are the ones that change rapidly in the horizontal direction). This seems to be happening in the result. 

### 1.1
rubric={accuracy:2}

The kernel is a column vector of ten 0.1's, shape (10, 1):

$$\text{kernel} = \begin{bmatrix} 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \end{bmatrix}$$

In [None]:
...

### 1.2
rubric={accuracy:2}

The kernel is a matrix of 0's but with a 1 in the centre, shape (5, 5):

$$\text{kernel} = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix}$$

In [None]:
...

### 1.3
rubric={accuracy:2}

The kernel is a matrix of 0.01's, shape (10, 10):

$$\text{kernel} = \begin{bmatrix} 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & \\ 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 \end{bmatrix}$$

In [None]:
...

### 1.4
rubric={accuracy:2}

The kernel is a matrix of -0.125's, shape (3, 3):

$$\text{kernel} = \begin{bmatrix} -0.125 & -0.125 & -0.125 \\ -0.125 & -0.125 & -0.125 \\ -0.125 & -0.125 & -0.125 \end{bmatrix}$$

In [None]:
...

## Exercise 2: CNNs and Image Permutations
<hr>

Below is some code that trains a CNN on the classic [MNIST digits dataset](https://en.wikipedia.org/wiki/MNIST_database). This dataset contains 28 x 28 pixel images of hand written digits. Run through the code, it may take a few minutes to run the code, our training dataset has 60,000 samples and our validation dataset has 10,000.

In [None]:
BATCH_SIZE = 256

# Download data
transform = transforms.Compose([transforms.ToTensor()])
trainset = datasets.MNIST('data/', download=True, train=True, transform=transform)
validset = datasets.MNIST('data/', download=True, train=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
validloader = torch.utils.data.DataLoader(validset, batch_size=BATCH_SIZE, shuffle=True)

# Sample plot
X, y = next(iter(trainloader))
plt.imshow(X[0, 0, :, :], cmap="gray")
plt.title(f"Number: {y[0].item()}");

In [None]:
class MNIST_classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.main = nn.Sequential(
            nn.Conv2d(1, 16, (5, 5)),
            nn.ReLU(),
            nn.MaxPool2d((2, 2)),
            nn.Dropout(0.2),
            nn.Flatten(),
            nn.Linear(2304, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
        
    def forward(self, x):
        out = self.main(x)
        return out

In [None]:
def trainer(model, criterion, optimizer, trainloader, validloader, epochs=5, verbose=True):
    """Simple training wrapper for PyTorch network."""
    
    train_loss, valid_loss, valid_accuracy = [], [], []
    for epoch in range(epochs):  # for each epoch
        train_batch_loss = 0
        valid_batch_loss = 0
        valid_batch_acc = 0
        
        # Training
        for X, y in trainloader:
            optimizer.zero_grad()       # Zero all the gradients w.r.t. parameters
            y_hat = model(X)            # Forward pass to get output
            loss = criterion(y_hat, y)  # Calculate loss based on output
            loss.backward()             # Calculate gradients w.r.t. parameters
            optimizer.step()            # Update parameters
            train_batch_loss += loss.item()  # Add loss for this batch to running total
        train_loss.append(train_batch_loss / len(trainloader))
        
        # Validation
        with torch.no_grad():  # this stops pytorch doing computational graph stuff under-the-hood and saves memory and time
            for X, y in validloader:
                y_hat = model(X)
                _, y_hat_labels = torch.softmax(y_hat, dim=1).topk(1, dim=1)
                loss = criterion(y_hat, y)
                valid_batch_loss += loss.item()
                valid_batch_acc += (y_hat_labels.squeeze() == y).type(torch.float32).mean().item()
        valid_loss.append(valid_batch_loss / len(validloader))
        valid_accuracy.append(valid_batch_acc / len(validloader))  # accuracy
        
        # Print progress
        if verbose:
            print(f"Epoch {epoch + 1}:",
                  f"Train Loss: {train_loss[-1]:.3f}.",
                  f"Valid Loss: {valid_loss[-1]:.3f}.",
                  f"Valid Accuracy: {valid_accuracy[-1]:.2f}.")
    
    results = {"train_loss": train_loss,
               "valid_loss": valid_loss,
               "valid_accuracy": valid_accuracy}
    return results

In [None]:
model = MNIST_classifier()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())
results = trainer(model, criterion, optimizer, trainloader, validloader)

You probably got pretty good accuracy there. Now, answer the questions below.

### 2.1
rubric={accuracy:1}

How many parameters does the model have (you don't need to compute that by hand)?

In [None]:
...

### 2.2
rubric={reasoning:1}

In Lecture 6 I talk about how, when doing image classification with fully connected neural networks, the order of the pixels in the flattened image we feed into the network doesn't matter. In contrast, CNNs try to use the structure in the data to make predictions. Let's do an experiment and vertically flip all our MNIST training images like this:

In [None]:
transform = transforms.Compose([transforms.RandomVerticalFlip(p=1), transforms.ToTensor()])
trainset = datasets.MNIST('data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)

# Sample plot
X, y = next(iter(trainloader))
plt.imshow(X[0, 0, :, :], cmap="gray")
plt.title(f"Number: {y[0].item()}");

We've only flipped the training images, not the validation images. What do you think would happen to the validation scores if we train our network on these new flipped images?

In [None]:
...

### 2.3
rubric={reasoning:2}

Re-train your network using the new `trainloader` of flipped images we defined above to test your answer to 2.2. Are the results what you expected? Can you justify the accuracy of your model?

In [None]:
model = MNIST_classifier()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())
results = trainer(model, criterion, optimizer, trainloader, validloader, epochs=5)

In [None]:
...

## Exercise 3: Bitmoji CNN
<hr>

rubric={accuracy:15}

In the course repository is a folder (`lectures/data/bitmoji_rgb`) containing the Bitmoji dataset we saw in Lecture 5. Make sure you clone that repo and then copy the folder `bitmoji_rgb` to the `data` directory of this lab.

We have 1228 images of both "`arman`" and "`not_arman`" for training (1714 images total), and 300 images of both "`arman`" and "`not_arman`" for validation (600 images total). We will resize the images to be 64 x 64 pixels.

Your task here is to create and train a CNN on this data. You can create any architecture you wish, but you need to show that you have at least 90% accuracy on the validation dataset after training your CNN.

I have prepared the training and validation loaders for you below:

In [None]:
TRAIN_DIR = "data/bitmoji_rgb/train/"
VALID_DIR = "data/bitmoji_rgb/valid/"

IMAGE_SIZE = (64, 64)
BATCH_SIZE = 64

# Load data and create dataloaders
data_transforms = transforms.Compose([transforms.Resize(IMAGE_SIZE), transforms.ToTensor()])

train_dataset = datasets.ImageFolder(root=TRAIN_DIR, transform=data_transforms)
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

valid_dataset = datasets.ImageFolder(root=VALID_DIR, transform=data_transforms)
validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=True)

print(train_dataset.class_to_idx)  # See which labels are assigned to each class

# Plot samples (don't worry too much about this code)
sample_batch = next(iter(trainloader))
plt.figure(figsize=(10, 8)); plt.axis("off"); plt.title("Sample Training Images")
plt.imshow(np.transpose(utils.make_grid(sample_batch[0], padding=1, normalize=True), (1,2,0)))
print(sample_batch[1])

> Note that `datasets.ImageFolder()` assigns integer labels to your classes according to the alphabetic order of the data folders. For example, in this case label `0` is assigned to class `arman`, whereas label `1` is assigned to `not_arman`.

In [None]:
...

Here's some code you can use to plot your model's predictions for a random image from the validation set. Run the following cell as many times as you like to see how well your CNN model works for different inputs:

In [None]:
model.eval()
model.to('cpu')
with torch.no_grad():
    img = next(iter(trainloader))[0][0]
    y_prob = torch.sigmoid(model(img[None, :, :, :]))
    y_class = int(y_prob > 0.5)
    print(f"Prbability of being Arman: {1 - y_prob.item():g}")
    
    plt.imshow(img.permute(1, 2, 0))
    plt.axis("off")
    plt.title(f"{['Arman', 'Not Arman'][y_class]}", pad=10);

## (CHALLENGING) Exercise 4: Neural Networks From Scratch
<hr>

rubric={accuracy:1% of total grade}

From scratch using only `NumPy` and `SciPy`, implement a one-hidden-layer neural network for classification using ReLUs and [scikit-learn's digit dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html).

You're pretty much on your own for this question. It could be a real test of how good your coding skills are and your knowledge of how neural networks actually work. You'll need to code up the gradients and implement backpropagation yourself. **This is a lot of work for 1% of the grade; use your time wisely**. 

> Note: there are probably a lot of resources out there where people give their "raw" neural network implementations. If you're going to do this and you consult any sources, make sure you cite them. 

In [None]:
...