# Categorical classification of handwriting digit (MNIST dataset) with CNN
### In this task, we will implement categorical classification with two different network with the same number of learnable layers: one is fully-connected netwrok and the other is convolutional neural network, then we will compare their performance (loss and accuracy)

Before we start, we should assure that we have activated CUDA -- otherwise training might take very long.
In Google Colaboratory:

1. Check the options Runtime -> Change Runtime Type on top of the page.
2. In the popup window, select hardware accelerator GPU.

Afterward, the following command should run successfully:

In [None]:
import torch
if torch.cuda.is_available():
  print("Successfully enabled CUDA processing")
else:
  print("CUDA processing not available. Things will be slow :-(")

Successfully enabled CUDA processing


##**Dataset preparation**

- **MNIST dataset**: the inputs are X[n]∈ R 28×28 and T [n] ∈ {0, . . . , 9}.
Each data in dataset is provided in form of **PIL.Image.Image**,
which represents an image class with some more functionality, and pixel values in range [0, 255]. 
- You can download the dataset directily from **torchvision.datasets** :https://pytorch.org/vision/stable/datasets.html
- In PyTorch, a dataset stores a list of input and target tensors (X[n], T[n]).
-In order to convert these images into torch.Tensor’s in range [0, 1], we can use the **torchvision.transforms.ToTensor** transform. 


###**1. Load the dataset with torch Dataloader**


In [None]:
import torch
import torchvision

def datasets(transform):
  trainset = torchvision.datasets.MNIST(
      root="./data",
      train=True,
      download=True,
      transform=transform
  )
  testset = torchvision.datasets.MNIST(
      root="./data",
      train=False,
      download=True,
      transform=transform
  )

  return trainset, testset

In [None]:
trainset, testset = datasets(torchvision.transforms.ToTensor())

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, num_workers=2, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, num_workers=2, shuffle=False)

### **2. Test training set**
- Check that all batches of the training set are in the required batch size – except for the
last batch. 
- check that all input and target batches are of type torch.Tensor.
- Check that all inputs are
in range [0, 1] and that all target values are in {0, . . . , 9}.


In [None]:
num_batches = len(trainset) / 64

for b, (x,t) in enumerate(trainloader):
  # check datatype, size and content of x and t
  assert isinstance(x, torch.Tensor)
  assert isinstance(t, torch.Tensor)

  assert (torch.all(x>=0) and torch.all(x<=1))

  assert len(x) == len(t)
  if b < num_batches-1 : 
    assert len(x) == 64

## **Design Fully-Connected Network**

$D$ : the number of inputs  
$K$ : the number of hidden neurons  
$O$ : the number of outputs   
  
Our network architecture is:
 
1. A `torch.nn.Flatten` layer to turn the $28\times28$ pixel image (2D) into a $28*28$ pixel vector (1D)
2. A fully-connected layer with $D$ input neurons and $K$ outputs.
3. A $\tanh$ activation function.
4. A fully-connected layer with $K$ input neurons and $K$ outputs.
5. A $\tanh$ activation function.
6. A fully-connected layer with $K$ input neurons and $O$ outputs.

In [None]:
def fully_connected(D, K, O):
  
  return torch.nn.Sequential(
    torch.nn.Flatten(), # image into 1D vector
    torch.nn.Linear(D,K,bias=True), 
    torch.nn.Tanh(),
    torch.nn.Linear(K,K,bias=True),
    torch.nn.Tanh(),
    torch.nn.Linear(K,O,bias=True)
  )

## **Design Convolutional Neural Network**

our CNN architecture is:
1. 2D convolutional layer with $Q_1$ channels, kernel size $5\times5$, stride 1 and padding 2. (output dim = 28)
2. 2D maximum pooling with pooling size $2\times2$ and stride 2 (output dim = 28/2)
3. $\tanh$ activation
4. 2D convolutional layer with $Q_2$ channels, kernel size $5\times5$, stride 1 and padding 2.  (output dim = 28/2)
5. 2D maximum pooling with pooling size $2\times2$ and stride 2  (output dim = 28/2/2)
6. $\tanh$ activation
7. A flattening layer to turn the 3D image into 1D vector
8. A fully-connected layer with the appropriate number of inputs and $O$ outputs.

In [None]:
def convolutional(Q1, Q2, O):
  return torch.nn.Sequential(
    torch.nn.Conv2d(in_channels=1, out_channels=Q1, kernel_size=5, stride=1, padding=2), # output = 28
    torch.nn.MaxPool2d(kernel_size=(2,2), stride=2), # output = 28/2
    torch.nn.Tanh(),
    torch.nn.Conv2d(in_channels=Q1, out_channels=Q2, kernel_size=5, stride=1, padding=2), # output = 28/2
    torch.nn.MaxPool2d(kernel_size=(2,2), stride=2), # outpur = 28/2/2
    torch.nn.Tanh(),
    torch.nn.Flatten(),
    torch.nn.Linear(7*7*Q2,O,bias=True) 
  )

## **Create Taringing and validation loop**

- Implement a function that takes `the network`, `the number of epochs` and `the learning rate`.
- Select the correct `loss function` for categorical classification, and `SGD optimizer`.
  

Iterate the following steps for the given number of epochs:

1. **Train** the network with all batches of **the training data**
2. Compute the **test set loss** and **test set accuracy**
3. Store both in a validation vector


In [None]:
def train(network, epochs=100, eta=0.01):
  # 1. select loss function and optimizer
  loss = torch.nn.CrossEntropyLoss()
  optimizer = torch.optim.SGD(network.parameters(), lr=eta, momentum=0.9)

  # 2. instantiate the correct device
  device = torch.device("cuda")
  network = network.to(device)

  # 3. collect loss values and accuracies over the training epochs
  val_loss, val_acc = [], []

  for epoch in range(epochs):
    # 3-1. train network on training data
    for x,t in trainloader:
      optimizer.zero_grad()
      z = network(x.to(device))
      j = loss(z, t.to(device))
      j.backward()
      optimizer.step()

    # 3-2. test network on test data
    with torch.no_grad():
      cur_loss, cur_acc = 0., 0.
      for x,t in testloader:
        z = network(x.to(device))
        j = loss(z, t.to(device)) # normalized over the number of samples in batch
        cur_loss += j.item() * len(t) 
        cur_acc += torch.sum(torch.argmax(z,dim=1)==t.to(device)).item() 
        
      val_loss.append(cur_loss/len(testset))
      val_acc.append(cur_acc/len(testset))

  return val_loss, val_acc

## **Train FCN and CNN**

### 1. Train FCN
- Create a fully-connected network with $K=10$ hidden and $O=10$ output neurons.
- Train the network for 10 epochs with $\eta=0.01$ 
- Save the obtained test losses and accuracies.


In [None]:
fc = fully_connected(28*28, 100,10)
fc_loss, fc_acc = train(fc)

### 2. Train CNN
- Create a convolutional network with $Q_1=32$ and $Q_2=64$ convolutional channels and $O=10$ output neurons.
- Train the network for 10 epochs with $\eta=0.01$ 
- Save the obtained test losses and accuracies.

In [None]:
cv = convolutional(32,64,10)
cv_loss, cv_acc = train(cv)

## **Visualize the loss and accuracy curve**

In [None]:
from matplotlib import pyplot
pyplot.figure(figsize=(10,3))
ax = pyplot.subplot(121)
# plot loss values of FC and CV network over epochs
ax.plot(fc_loss, "g-", label="Fully-connecte loss")
ax.plot(cv_loss, "b-", label="Convolutional loss")
ax.set_xlabel("Epoch")
ax.legend()

ax = pyplot.subplot(122)
# plot accuracy values of FC and CV network over epochs
ax.plot(fc_acc, "g-", label="Fully-connecte accuracy")
ax.plot(cv_acc, "b-", label="Convolutional accuracy")
ax.set_xlabel("Epoch")
ax.legend()

## **Compute the number of learnable parameters**

 Analytically Estimate how many learnable parameters the two networks have.


Fully-connected Network:
- first fully-connected layer: 28 * 28 inputs * 100 hidden neurons + 1 bias
- second fully-connected layer: 100 hidden neurons * 100 outpyts + 1 bias
- third fully-connected layer: about 89400 + 1 
- total: ...

Convolutional Network:
- first convolutional layer: 32 * 1 * 5 * 5 + 1
- second convolutional layer: 64 * 32 * 5 * 5 + 1
- fully-connected layer: 64 * 7 * 7 inputs * 10 outputs + 1 
- total: 83360

Compute the number of parameters in the networks by summing the number of parameters in each layer using pytorch functionality.
You can use the `numel()` function from a `torch.Tensor` to provide the number of (learnable) parameters stored in a tensor.

In [None]:
def parameter_count(network):
  return sum(p.numel() for p in network.parameters())

print("Fully-connected Network:", parameter_count(fc))
print("Convolutional Network:", parameter_count(cv))

**Conclusion** : Even though fully-connected network has slightly more parameters, the performance of CNN is much better (one reason : FCN is shift-invariant) 