# 2. Design CNN Architecture for Classification

In this notebook, we will design a Convolutional Neural Network (CNN) architecture to classify the [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) images that we prepared in the previous notebook.

### Outline of this notebook
>1. Load the data (already covered in previous notebook)
<br>
<br>
2. Define the CNN architecture
    - 2.1: Feedforward
    - 2.2: Loss function and optimiser

---

### 1. Load the data

Refer to the previous notebook for a detailed walkthrough of the following code.

In [4]:
import numpy as np
import torch
import torchvision
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

train_set = datasets.FashionMNIST(root = 'FashionMNIST_data', train = True, download = True)
test_set = datasets.FashionMNIST(root = 'FashionMNIST_data', train = False, download = True)

validation = 0.2

training_size = len(train_set)
indices = list(range(training_size))
np.random.shuffle(indices)
split = int(np.floor(validation * training_size))
train_index = indices[:split]
validation_index = indices[split:]
train_sampler = SubsetRandomSampler(train_index)
validation_sampler = SubsetRandomSampler(validation_index)

train_transform = transforms.Compose([transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.5, ), (0.5, ))])

test_transform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, ), (0.5, ))])

train_set = datasets.FashionMNIST(root='./FashionMNIST_data', train=True, download=False, transform=train_transform)
test_set = datasets.FashionMNIST(root='./FashionMNIST_data', train=False, download=False, transform=test_transform)

batch_size = 64

train_loader = torch.utils.data.DataLoader(train_set,
                                           batch_size = batch_size,
                                           sampler = train_sampler, 
                                           shuffle = False)

validation_loader = torch.utils.data.DataLoader(train_set, 
                                                batch_size = batch_size, 
                                                sampler = validation_sampler, 
                                                shuffle = False)

test_loader = torch.utils.data.DataLoader(test_set, 
                                          batch_size = batch_size, 
                                          shuffle = True)

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

---

### 2. Design the CNN architecture

For a CNN, the typical series of layers are:
- _Convolutional layers_: Apply a convolution operation to the input.
- _Maxpooling layers_: Combines the outputs of clusters of neurons into a single neuron in the next layer.
- _Fully-connected layers_: Connect every neuron in one layer to every neuron in the next layer.

We will use `nn.Conv2d` to create the convolutional layers. From the [documentation](https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html), the arguments that we are interested in are `nn.Conv2d('in_channels', 'out_channels', 'kernel_size', 'padding')`, where:
- `in_channels` is the depth of the input matrix from this particular layer.
- `out_channels` is the depth of the output matrix from this particular layer.
- `kernel_size` is the width (or height) of the filter.
- `padding` is the width of the padding around the image, measured in number of pixels.

#### 2.1: Feedforward

In [5]:
import torch.nn as nn
import torch.nn.functional as F

num_filters = 10
kernel_size = 3
padding = 1

class CNN(nn.Module):
    
    def __init__(self):
        super(CNN, self).__init__()
        # Convolutional layer 1 (sees 28x28x1 image tensor)
        self.conv1 = nn.Conv2d(1, num_filters, kernel_size, padding = padding)
        # Convolutional layer 2 (sees a 14x14x10 tensor)
        self.conv2 = nn.Conv2d(num_filters, 20, kernel_size, padding = padding)
        # Maxpooling layer of size 2x2
        self.pool = nn.MaxPool2d(2, 2)
        # Fully-connected linear layer 1 (sees a 7x7x20 tensor -> 300)
        self.fc1 = nn.Linear(7 * 7 * 20, 300)
        # Fully-connected linear layer 2 (300 -> 10)
        self.fc2 = nn.Linear(300, 10)
        # Dropout layer (p=0.2)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        # Design sequence of convolutional and pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # Flatten image input into a row vector to feed into the fully-connected layers
        x = x.view(-1, 7 * 7 * 20)
        # Add dropout layer
        x = self.dropout(x)
        # Add fully-connected layer 1
        x = F.relu(self.fc1(x))
        # Add dropout layer
        x = self.dropout(x)
        # Add fully-connected layer 2
        x = F.relu(self.fc2(x))
        return x

In [6]:
# Instantiate our CNN
model = CNN()
print(model)

CNN(
  (conv1): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(14, 20, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=980, out_features=300, bias=True)
  (fc2): Linear(in_features=300, out_features=10, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)


#### 2.2: Loss Function and Optimiser

In [7]:
# Specify loss function
criterion = nn.CrossEntropyLoss()

# Specify optimiser
optimiser = torch.optim.SGD(model.parameters(), lr=0.03)