# Convolutional Neural Network

In this notebook, we train a CNN to classify images from the CIFAR-10 database.

The images in this database are small color images that fall into one of ten classes; some example images are pictured below.

<img src='../../assets/cifar-10.png' width=50% height=50%/>

## Test for [CUDA](http://pytorch.org/docs/stable/cuda.html)
Since these are larger (32x32x3) images, it may prove useful to speed up your training time by using a GPU. CUDA is a parallel computing platform and CUDA Tensors are the same as typical Tensors, only they utilize GPU's for computation.

In [None]:
import numpy as np
import torch

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if train_on_gpu:
    print("Training on GPU device mode on.")
else:
    print("Training on CPU device.")

## Load the Data(http://pytorch.org/docs/stable/torchvision/datasets.html)
Downloading may take a minute. We load in the training and test data, split the training data into a training and validation set, then create DataLoaders for each of these sets of data.

In [None]:
from torchvision import datasets # sample available torchvision datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import DataLoader

# define dataloader args
num_workers = 2
batch_size = 40
val_set_ratio = 0.2

# define afine image transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

# choose the training and test dataset from Pytorch CIFAR
train_set = datasets.CIFAR('data/', train=True, 
                           transform=transform, download=True)
test_set = datasets.CIFAR('data/', train=False,
                         transform=transform, download=True)

# split train - validation set 80/20
n_train = len(train_set)
split_idx = int(np.floor(n_train * val_set_ratio))
indices = list(range(n_train))
np.random.shuffle(indices)
train_idx, val_idx = indices[split_idx:], indices[:split_idx]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(val_idx)

# define data loader for train - val - test
train_loader = DataLoader(train_set, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers)
val_loader = DataLoader(train_set, batch_size=batch_size, sampler=val_sampler, num_workers=num_workers)
test_loader = DataLoader(test_set, batch_size=batch_size, num_workers=num_workers)

# specify image classes
classes = ["airplane", "automobile", "bird", "cat",
           "deer", "dog", "frog", "horse", "ship", "truck"]

## Visualize a Batch of Training Data

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# helper function to un-normalize and display an image
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0)))  # convert from Tensor image

In [None]:
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
# display 20 images
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    imshow(images[idx])
    ax.set_title(classes[labels[idx]])

In [None]:
# get image size
rgb_img = np.squeeze(images[0])
n_channel = rgb_img.shape[0]
rimg = rgb_img[0]
cifar_size = rimg.shape[0] # shape: size x size

In [None]:
rgb_img = np.squeeze(images[0])
channels = ['red channel', 'green channel', 'blue channel']

fig = plt.figure(figsize = (36, 36)) 
for idx in np.arange(rgb_img.shape[0]): #rgb
    ax = fig.add_subplot(1, 3, idx + 1)
    img = rgb_img[idx]
    ax.imshow(img, cmap='gray')
    ax.set_title(channels[idx])
    width, height = img.shape
    thresh = img.max()/2.5
    for x in range(width):
        for y in range(height):
            val = round(img[x][y],2) if img[x][y] !=0 else 0
            ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center', size=8,
                    color='white' if img[x][y]<thresh else 'black')

## Defining the Network Architecture
This time, you'll define a CNN architecture. Instead of an MLP, which used linear, fully-connected layers, you'll use the following:

Convolutional layers, which can be thought of as stack of filtered images.
Maxpooling layers, which reduce the x-y size of an input, keeping only the most active pixels from the previous layer.
The usual Linear + Dropout layers to avoid overfitting and produce a 10-dim output.
A network with 2 convolutional layers is shown in the image below and in the code, and you've been given starter code with one convolutional and one maxpooling layer.

<img src='../../assets/3-layer-conv.png' height=50% width=50% />

The more convolutional layers you include, the more complex patterns in color and shape a model can detect. It's suggested that your final model include 2 or 3 convolutional layers as well as linear layers + dropout in between to avoid overfitting.

It's good practice to look at existing research and implementations of related models as a starting point for defining your own models. You may find it useful to look at this PyTorch classification example or this, more complex Keras example to help decide on a final structure.

### Output volume for a convolutional layer


We can compute the spatial size of the output volume as a function of the input volume size ($W$), the kernel/filter size ($F$), the stride with which they are applied ($S$), and the amount of zero padding used ($P$) on the border. The correct formula for calculating how many neurons define the output $W$ is given by $$W = {(W-F+2P)\over S}+1 $$
For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, input_size, out_class=10):

        # if input_size is given in (s,s) format
        if isinstance(input_size, (list,tuple)):
            input_size = input_size[0]

        super(SimpleCNN, self).__init__()

        # 1st convolutional layer -> output size: (16 x in_size/2 x in_size/2) 
        self.conv2d_1 = nn.Sequential([
            nn.Conv2d(3, 16, kernel_size=(3,3), padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2,2), stride=2)
        ])
        # 2nd convolutional layer -> output size: (32 x in_size/4 x in_size/4)
        self.conv2d_2 = nn.Sequential([
            nn.Conv2d(16, 32, kernel_size=(5,5), padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2,2), stride=2)
        ])
        
        # flatten size for fc layer
        flatten_size = 32*int(input_size/4)*int(input_size/4)
        
        # fully-connected layer, remember to flatten input
        self.fc = nn.Linear(flatten_size, out_class)

    def forward(self, X):
        # forward flow to convolutional layers
        X = self.conv2d_1(X)
        X = self.conv2d_2(X)
        # flatten size
        X = X.reshape(X.size(0), -1)
        
        return F.log_softmax(X, dim=1)

model = SimpleCNN(input_size=size, out_class=10)


## Specify Loss Function and Optimizer
Decide on a loss and optimization function that is best suited for this classification task. The linked code examples from above, may be a good starting point; this PyTorch classification example or this, more complex Keras example. Pay close attention to the value for learning rate as this value determines how your model converges to a small error.
