#Convolutional Neural Network (CNN)

The CNN design that we are going to implement is pretty basic with only one convolutional layer as follows.

* Input image of size 28 x 28
* Convolutional layer with 32 filters of size 5x5 with stride=1 and padding = 0 and ReLU activation
* Pooling layer taking the max over 2x2 patches
* Dropout layer with probability = 0.2
* Fully connected layer of 128 neurons and ReLU activation
* Fully connected layer of 128 neurons 
* Output layer of 10 neurons


#Import what is needed

In [None]:
import torch
import torchvision.transforms as transform
import torch.nn.functional as F
from torch import nn
from torch import optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import datasets

seed = 7
torch.manual_seed(seed)
torch.backends.cudnn.deterministic=True



#Download data, transform and set DataLoaders

In [None]:
transformCustom = transform.Compose([
      transform.ToTensor()
    ]) #Q: why we don't need to flatten the images into a vector here unlike MLP ?

train = MNIST(root=".",train=True, download=True, transform=transformCustom)
test = MNIST(root=".", train=False, download=True, transform=transformCustom)

train_loader = DataLoader(train, batch_size=128, shuffle=True)
test_loader = DataLoader(test, batch_size=128, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



#Define the CNN model 

* What is the difference between conv1d, conv2d, conv3d?
* Why use Sequential?
* Try using Dropouts, different activation function, different pooling methods. What are the effects?

In [None]:
class CNN(nn.Module):
  def __init__(self):
    super(CNN,self).__init__()
    self.conv1 = nn.Conv2d(in_channels=1,out_channels=32,kernel_size=(5,5),padding=0) #find out what is conv1d, conv2d, conv3d
    self.fullyConnected1 = nn.Linear(in_features=32*12*12, out_features=128)
    self.fullyConnected2 = nn.Linear(in_features=128, out_features=10)

  def forward(self,x):
    out = self.conv1(x)
    out = F.relu(out)
    out = F.max_pool2d(out,(2,2))  
    out = F.dropout(out,0.2) #dropping off some connections so that the model generalizes well. this is regularization
    out = out.view(out.shape[0],-1) #Flatten. What is the size of this?


    out = self.fullyConnected1(out)
    out = F.relu(out)
    out = self.fullyConnected2(out) 
    return out

#Implement CNN using nn.Sequential 
class CNN_seq(nn.Module):
  def __init__(self):
    super(CNN_seq,self).__init__()
    self.conv1 = nn.Sequential(
    nn.Conv2d(in_channels=1,out_channels=32,kernel_size=(5,5),padding=0),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout(0.2))


    #self.conv2 = nn.Sequential()

    self.fullyConnected1 = nn.Linear(in_features=32*12*12, out_features=128)
    self.fullyConnected2 = nn.Linear(in_features=128, out_features=10)

  def forward(self,x):
    out = self.conv1(x)
    out = out.view(out.shape[0],-1) 

    out = self.fullyConnected1(out)
    out = F.relu(out)
    out = self.fullyConnected2(out)

    return out

#Train and learn

During training
  * Pass the output of CNN as raw logits(linear function)
  * Calculate cross entropy (grd truth vs predicted output). Here softmax
and cross entropy is computed
  * Back propagate cross entropy loss
  * Update the weights accordingly



In [None]:
model=CNN()
#model=CNN_seq()


#Set loss and optimiser 
loss_fn = nn.CrossEntropyLoss()

optimiser = optim.Adam(model.parameters())


model.train()
epochSize=5
for epoch in range(epochSize):
  
  for batch_id, (input,target) in enumerate(train_loader):

    optimiser.zero_grad() 
    output = model(input) #forward 
 
    loss = loss_fn(output,target)
    
    
    loss.backward() #back prop
    optimiser.step()#update weights

    if batch_id%100==0:
      print(f'Epoch:{epoch}/{epochSize} Batch:{batch_id+1} Loss:{loss.item()}')

    




Epoch:0/5 Batch:1 Loss:2.2978975772857666
Epoch:0/5 Batch:101 Loss:0.3820204734802246
Epoch:0/5 Batch:201 Loss:0.12674130499362946
Epoch:0/5 Batch:301 Loss:0.1165335476398468
Epoch:0/5 Batch:401 Loss:0.10750804096460342
Epoch:1/5 Batch:1 Loss:0.20184464752674103
Epoch:1/5 Batch:101 Loss:0.25573524832725525
Epoch:1/5 Batch:201 Loss:0.03021995536983013
Epoch:1/5 Batch:301 Loss:0.030995670706033707
Epoch:1/5 Batch:401 Loss:0.032761894166469574
Epoch:2/5 Batch:1 Loss:0.06611060351133347
Epoch:2/5 Batch:101 Loss:0.09288962930440903
Epoch:2/5 Batch:201 Loss:0.038047730922698975
Epoch:2/5 Batch:301 Loss:0.034574348479509354
Epoch:2/5 Batch:401 Loss:0.06636887043714523
Epoch:3/5 Batch:1 Loss:0.030163239687681198
Epoch:3/5 Batch:101 Loss:0.030661001801490784
Epoch:3/5 Batch:201 Loss:0.029255518689751625
Epoch:3/5 Batch:301 Loss:0.023818660527467728
Epoch:3/5 Batch:401 Loss:0.048072487115859985
Epoch:4/5 Batch:1 Loss:0.011083846911787987
Epoch:4/5 Batch:101 Loss:0.022380346432328224
Epoch:4/5 Ba

#Evaluation of model against training and test dataset

#Evaluate on training set

60000 samples 

correctly predicted: 59513

accuracy: 0.9918833374977112


In [None]:
#Your implementation here

#Evaluate on test set
10000 samples

correctly predicted: 9870

accuracy: 0.9869999885559082

In [None]:
#Your implementation here

# A better and deeper CNN
Now that we have seen how to create a simple CNN, let’s take a look at a model capable of close to state of the art results. This time you will implement a large CNN architecture with additional convolutional, max pooling layers and fully connected layers. The network topology of the model is summarised as follows:

* Convolutional layer with 30 feature maps of size 5×5 and ReLU activation.
* Pooling layer taking the max over 2×2 patches.
* Convolutional layer with 15 feature maps of size 3×3 and ReLU activation.
* Pooling layer taking the max over 2×2 patches.
* Dropout layer with a probability of 20%.
* Flatten layer.
* Fully connected layer with 128 neurons and ReLU activation.
* Fully connected layer with 50 neurons and ReLU activation.
* Linear output layer.

In [None]:
#Your implementation here