# Training our first Convolutional Neural Network

With this notebook we are going to build and train our first Convolutional Neural Network (CNN). In particular, we will borrow the architecture proposed as [LeNet](https://ieeexplore.ieee.org/document/726791).


<img src="https://drive.google.com/uc?export=view&id=1BimodSCOzNtpy76yE4QjJ5aCtsEzNNfX" width="900"></br></br>

We start, as usual, by importing the necessary libraries.

In [1]:
import torch
import torchvision
from torchvision import transforms as T
import torch.nn.functional as F
%pip install wandb -q
import wandb

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.0/190.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.8/224.8 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


## LeNet-5

In order to build this model, we are going to need some **convolutional** and some **fully connected** layers. The former can be easily defined by exploiting the `torch.nn.Conv2D` module from PyTorch. Remember you can always take a look at the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)! We will also be using pooling operations (Max Pooling) to reduce the size of the feature maps. In particular, we are going to use the `torch.nn.functional.max_pool2d` module (details can be found, as usual, in the [docs](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)). Furthermore, for this model we are going to need the Rectified Linear Unit (ReLU) activation, available in the `torch.nn.functional.relu` module (details [here](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html#torch.nn.functional.relu)).

In [22]:
class LeNet(torch.nn.Module):
  def __init__(self):
    super(LeNet, self).__init__()

    # input channel = 1, output channels = 6, kernel size = 5
    # input image size = (28, 28), image output size = (24, 24)
    # have a look at torch.nn.Conv2d

    self.conv1 = torch.nn.Conv2d(1, 6, 5)
    # input channel = 6, output channels = 16, kernel size = 5
    # input image size = (12, 12), output image size = (8, 8)
    # have a look at torch.nn.Linear
    self.conv2 = torch.nn.Conv2d(6, 16, 5)

    # input dim = 4 * 4 * 16 ( H x W x C), output dim = 120
    self.L1 = torch.nn.Linear(256, 120)

    # input dim = 120, output dim = 84
    self.L2 = torch.nn.Linear(120,  84)

    # input dim = 84, output dim = 10
    self.L3 =  torch.nn.Linear(84, 10)

  def forward(self, x):

    # first convolutional layer + relu
    # have a look at torch.nn.functional.relu

    out = torch.nn.functional.relu(self.conv1(x))

    # Max Pooling with kernel size = 2
    # output size = (12, 12)
    # have a look at torch.nn.functional.max_pool2d
    out = torch.nn.functional.max_pool2d(out, 2)

    # second convolutional layer + relu
    out = torch.nn.functional.relu(self.conv2(out))

    # Max Pooling with kernel size = 2
    # output size = (4, 4)
    out = torch.nn.functional.max_pool2d(out, 2)

    # flatten the feature maps into a long vector (-> (bs, 4*4*16))
    #flatten = torch.nn.Flatten(out, 256)
    flatten = out.view(out.shape[0], -1)

    # first linear layer + relu
    out = torch.nn.functional.relu(self.L1(flatten))

    # second linear layer + relu
    out = torch.nn.functional.relu(self.L2(out))

    # output layer (linear)
    x = self.L3(out)

    return x

## Optimizer & cost function
We are going to use the familiar [Stochastic Gradient Descent (SGD)](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) optimizer and the [Cross Entropy Loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) for our optimization.

In [3]:
def get_cost_function():
  cost_function = torch.nn.CrossEntropyLoss()
  return cost_function

def get_optimizer(net, lr, wd, momentum):
  optimizer = torch.optim.SGD(net.parameters(), lr=lr, weight_decay=wd, momentum=momentum)
  return optimizer

In [6]:
net = LeNet()

# Настраиваем loss function
loss = get_cost_function()

# Настраиваем оптимизатор
optimizer = get_optimizer(net, lr=0.001, wd=0.0, momentum=0.9)


## Training and test steps
We are going to implement our training and test pipelines as discussed in the previous lab sessions.

In [36]:
def training_step(net, data_loader, optimizer, cost_function, device='cuda'):

  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  # set the network to training mode
  net.train()

  # iterate over the training set
  for batch_idx, (inputs, targets) in enumerate(data_loader):

    # load data into GPU
    inputs = inputs.to(device)
    targets = targets.to(device)

    # forward pass
    outputs = net(inputs)

    # loss computation
    loss = cost_function(outputs, targets)

    # backward pass
    loss.backward()

    # parameters update
    optimizer.step()

    # gradients reset
    optimizer.zero_grad()



    # fetch prediction and loss value
    samples += inputs.shape[0]
    cumulative_loss += loss.item()
    _, predicted = outputs.max(dim=1) # max() returns (maximum_value, index_of_maximum_value)

    # compute training accuracy
    cumulative_accuracy += predicted.eq(targets).sum().item()

  return cumulative_loss/samples, cumulative_accuracy/samples*100


def test_step(net, data_loader, cost_function, device='cuda'):

  samples = 0.0
  cumulative_loss = 0.0
  cumulative_accuracy = 0.0

  # set the network to evaluation mode
  net.eval()

  # disable gradient computation (we are only testing, we do not want our model to be modified in this step!)
  with torch.no_grad():

    # iterate over the test set
    for batch_idx, (inputs, targets) in enumerate(data_loader):

      # load data into GPU
      inputs = inputs.to(device)
      targets = targets.to(device)

      # forward pass
      outputs = net(inputs)

      # loss computation
      loss = cost_function(outputs, targets)

      # fetch prediction and loss value
      samples += inputs.shape[0]
      cumulative_loss += loss.item() # Note: the .item() is needed to extract scalars from tensors
      _, predicted = outputs.max(1)

      # compute accuracy
      cumulative_accuracy += predicted.eq(targets).sum().item()

  avg_loss = cumulative_loss / samples
  avg_accuracy = 100 * cumulative_accuracy / samples

  return avg_loss, avg_accuracy

## Data loading
In this block we are going to define our **data loading** utility. Differently from last time, in this case we are going to introduce **normalization**. This step is needed in order **bound** our values to the `[-1,1]` range, and obtain a **stable** training process for our network. This can be achieved by using the `torchvision.transforms.Normalize()` module (details [here](https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html)).

In [37]:
def get_data(batch_size, test_batch_size=256):

  # prepare data transformations and then combine them sequentially
  transform = list()
  transform.append(T.ToTensor())                            # convert Numpy to Pytorch Tensor
  transform.append(T.Normalize(mean=[0.5], std=[0.5]))      # normalize the Tensors between [-1, 1]
  transform = T.Compose(transform)                          # compose the above transformations into one

  # load data
  full_training_data = torchvision.datasets.MNIST('./data', train=True, transform=transform, download=True)
  test_data = torchvision.datasets.MNIST('./data', train=False, transform=transform, download=True)

  # create train and validation splits
  num_samples = len(full_training_data)
  training_samples = int(num_samples*0.5+1)
  validation_samples = num_samples - training_samples

  training_data, validation_data = torch.utils.data.random_split(full_training_data, [training_samples, validation_samples])

  # initialize dataloaders
  train_loader = torch.utils.data.DataLoader(training_data, batch_size, shuffle=True, num_workers=4)
  val_loader = torch.utils.data.DataLoader(validation_data, test_batch_size, shuffle=False, num_workers=4)
  test_loader = torch.utils.data.DataLoader(test_data, test_batch_size, shuffle=False, num_workers=4)

  return train_loader, val_loader, test_loader

## Putting it all together!
We are now ready to combine all the ingredients defined so far into our **training procedure**. We define a main function that **initializes** everything, **trains** the model over multiple epochs and **logs** the results.

In [38]:
'''
Input arguments
  batch_size: size of a mini-batch
  device: GPU where you want to train your network
  weight_decay: weight decay co-efficient for regularization of weights
  momentum: momentum for SGD optimizer
  epochs: number of epochs for training the network
'''

def main(batch_size=128,
         device='cuda:0',
         learning_rate=0.01,
         weight_decay=0.000001,
         momentum=0.9,
         epochs=10):

  # get dataloaders
  train_loader, val_loader, test_loader = get_data(batch_size)

  # instantiate model and send it to cuda device
  net = LeNet().to(device)


  # instatiate optimizer and cost function
  optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
  cost_function = torch.nn.CrossEntropyLoss()

  # wandb logger
  wandb.login()
  wandb.init(project="Lab_02_TrainLeNet", name="Exp1") # UPDATE USING YOUR WANDB CREDENTIALS

  # run a single test step beforehand and print metrics
  print('Before training:')
  train_loss, train_accuracy = test_step(net, train_loader, cost_function)
  val_loss, val_accuracy = test_step(net, val_loader, cost_function)
  test_loss, test_accuracy = test_step(net, test_loader, cost_function)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Validation loss {:.5f}, Validation accuracy {:.2f}'.format(val_loss, val_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')

  # iterate over the number of epochs
  for e in range(epochs):

    # train & log
    train_loss, train_accuracy = training_step(net, train_loader, optimizer, cost_function)
    val_loss, val_accuracy = test_step(net, val_loader, cost_function)
    wandb.log({
        "Epoch": e+1,
        "training/loss": train_loss,
        "training/accuracy": train_accuracy,
        "validation/loss": val_loss,
        "validation/accuracy": val_accuracy,
    })
    print('Epoch: {:d}'.format(e+1))
    print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
    print('\t Validation loss {:.5f}, Validation accuracy {:.2f}'.format(val_loss, val_accuracy))
    print('-----------------------------------------------------')

  # compute and print final metrics
  print('After training:')
  train_loss, train_accuracy = test_step(net, train_loader, cost_function)
  val_loss, val_accuracy = test_step(net, val_loader, cost_function)
  test_loss, test_accuracy = test_step(net, test_loader, cost_function)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Validation loss {:.5f}, Validation accuracy {:.2f}'.format(val_loss, val_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')

## Run!

In [39]:
main()



VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

Before training:
	 Training loss 0.01806, Training accuracy 9.85
	 Validation loss 0.00906, Validation accuracy 9.98
	 Test loss 0.00922, Test accuracy 10.09
-----------------------------------------------------
Epoch: 1
	 Training loss 0.01799, Training accuracy 9.85
	 Validation loss 0.00899, Validation accuracy 10.02
-----------------------------------------------------
Epoch: 2
	 Training loss 0.01776, Training accuracy 21.17
	 Validation loss 0.00879, Validation accuracy 39.79
-----------------------------------------------------
Epoch: 3
	 Training loss 0.01613, Training accuracy 48.51
	 Validation loss 0.00632, Validation accuracy 60.16
-----------------------------------------------------
Epoch: 4
	 Training loss 0.00674, Training accuracy 77.74
	 Validation loss 0.00196, Validation accuracy 85.22
-----------------------------------------------------
Epoch: 5
	 Training loss 0.00323, Training accuracy 87.76
	 Validation loss 0.00139, Validation accuracy 89.36
------------------