# Deep Learning Course: Lab Exercises

In this lab exercise you will:

a) Learn about PyTorch DataLoaders and how to build your own on a custom dataset.

b) Learn how to train and evaluate a convolutional neural network.



# Q1 DataLoaders

### Connect your drive account

In order to upload the numpy files of the given folder, you can use the ‘drive’ package from google.colab library like this:

- from google.colab import drive

Then, you define the destination of your Drive account like this:

- drive=drive.mount('/content/drive')

Finally, you can load files from the given folder in this way:

- arr_train_labels=np.load('drive/.._folderdestination_../train_labels.npy')

In [None]:
import os
import numpy as np
import pandas as pd
from google.colab import drive

drive = drive.mount('/content/drive')

In [None]:
## add the name of the folder you uploaded to Colab
drive_folder = 'drive/MyDrive/...'

In [None]:
## load train, test and validation label arrays
train_images = np.load(os.path.join(drive_folder, 'train_images.npy'))
val_images = np.load(os.path.join(drive_folder, 'val_images.npy'))
test_images = np.load(os.path.join(drive_folder, 'test_images.npy'))

train_labels = np.load(os.path.join(drive_folder, 'train_labels.npy'))
val_labels = np.load(os.path.join(drive_folder, 'val_labels.npy'))
test_labels = np.load(os.path.join(drive_folder, 'test_labels.npy'))

In [None]:
print('train images', train_images.shape)
print('val images', val_images.shape)
print('train labels', train_labels.shape)
print('val labels', val_labels.shape)

Print a random training image.

In [None]:
print(train_images[124])

In [None]:
from matplotlib import pyplot as plt

##print a random image
# *****START CODE

# *****END CODE

Print the corresponding label for the image that you printed.

In [None]:
##print the corresponding label
# *****START CODE

# *****END CODE

Iterate through the training images with a for loop and collate batches of size 10 (using slicing)

In [None]:
# *****START CODE
batchsize =




# *****END CODE

### Familiarize with .csv files

Here, we show you how to create a csv file with the the following table:

| Name         | Surname        | Gender |
| ------------ | ------------ | ------- |
| Mary         | Smith        | Female       |
| James        | Williams     | Male     |
| Sarah        | Martin       | Female      |
| Peter        | Miller       | Male     |

In [None]:
import pandas as pd

# First create each column as a list

names = ['Mary', 'James', 'Sarah', 'Peter']
surnames = ['Smith', 'Williams', 'Martin', 'Miller']
genders = ['Female', 'Male', 'Female', 'Male']

# Then create a pandas DataFrame from these lists and set the columns names

df = pd.DataFrame({'Name': names ,
                   'Surname': surnames ,
                   'Gender': genders})

# Finally save the DataFrame to a .csv file
df.to_csv(os.path.join(drive_folder, 'toy.csv'), index=False, columns = ['Name', 'Surname', 'Gender'])

# You can now read the .csv file to get a DataFrame again
toy = pd.read_csv(os.path.join(drive_folder, 'toy.csv'))

# And show few rows of the the DataFrame
toy.head()

Use iloc command from 	[pandas library](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html 'Learn .iloc') pandas library to access specific items

In [None]:
##show first column
toy.iloc[:, 0]

In [None]:
##show second row
# *****START CODE

# *****END CODE

In [None]:
##show element in first row and second column
# *****START CODE

# *****END CODE

#Create your own custom DataLoader

Use the numpy files to define your own custom dataloader.

a) Create a DataFrame for each split (train, val and test) with 2 columns :
- ‘image_ID’, which involves the image index locations
- ‘label’, which involves the corresponding label for every image index

b) Plot the distribution of labels for each split with [groupby().count()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and [pandas.Series.plot()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html?highlight=plot#pandas.Series.plot 'pandas.Series.plot') functions.

c) Save the DataFrames as .csv files.

In [None]:
## Create the DataFrame for the train split and show the first few rows
# *****START CODE

# *****END CODE

In [None]:
## Plot the distribution of labels using a bar plot
# *****START CODE

# *****END CODE

In [None]:
## Save the DataFrame as a .csv file
# *****START CODE

# *****END CODE

In [None]:
## Do the same for both val and test splits
## HINT : you can wrap the previous steps into a function to do all 3 at once
# *****START CODE
def create_csv(labels, destination):
  df = 
  counts = 
  counts.plot.bar()

  df.to_csv(destination, index=False, columns = ['image_ID', 'label'])

create_csv(val_labels, os.path.join(drive_folder, 'val.csv'))
create_csv(test_labels, os.path.join(drive_folder, 'test.csv'))



# *****END CODE

d) Now create your own custom dataloader. When building a custom dataloader, it is necessary to define a dataset class first which involves 3 required functions:

- __init__, for the class initialization

- __getitem__, for data extraction according to given indexes

-  __len__, for calculating the total number of data samples

In [None]:
from torch.utils.data.dataset import Dataset

class MyDataset(Dataset):
    def __init__(self, csv_path, images_file):
        ## Read the csv file with pandas library
        # *****START CODE
        self.data_info =

        ##Load the numpy array of images
        self.images =
        # *****END CODE

    def __getitem__(self, index):
        ## Get the image-label set using the given index
        ## Hint: use iloc command from pandas library
        # *****START CODE
        image_id =
        image =
        label =
        # *****END CODE
        ## bring all image spectral values to the range of [0,1]
        image = image/255.0

        return image, label

    def __len__(self):
        ## return the total number of data samples
        return len(self.data_info)

c) Call the dataloader for both the training and the validation sets

In [None]:
from torch.utils.data import DataLoader

## training and validation .csv paths
##here put your custom destination folder
train_csv_file = os.path.join(drive_folder, 'train.csv')
val_csv_file = os.path.join(drive_folder, 'val.csv')

## paths for training and validation numpy array images
##here put your custom destination folder
train_images_file = os.path.join(drive_folder, 'train_images.npy')
val_images_file = os.path.join(drive_folder, 'val_images.npy')

## define and create training and validation dataloaders using MyDataset (fill the blanks)
# *****START CODE
## define and create training and validation dataloaders using MyDataset (fill the blanks)
train_dataset = MyDataset(, )
val_dataset = MyDataset(, )

train_dataloader = DataLoader(dataset=, batch_size=, shuffle=)
val_dataloader = DataLoader(dataset=, batch_size=, shuffle=)
# *****END CODE

Iterate through your dataloader using a for loop.

In [None]:
# *****START CODE


# *****END CODE

# Q2 Classification problem

Define your custom convolutional neural network.

In [None]:
!pip install torchnet

In [None]:
import torch
import torch.nn as nn
import torchnet as tnt
import torch.nn.functional as F
import matplotlib.pyplot as plt

In [None]:
class ConvNet(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(in_ch, 32, 3, 1) #W2=(W1−F+2P)/S+1
        ##define a second convolutional layer, which outputs 64 channels
        # *****START CODE
        self.conv2 =
        # *****END CODE
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        ##create the last two fully connected layers
        # *****START CODE
        self.fc1 =
        self.fc2 =
        # *****END CODE

##define the forward propagation of the data
##conv1--relu--conv2--maxpool--dropout1--fc1--dropout2--fc2
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.max_pool2d(x, 2) #W2=(W1−F)/S+1
        x = self.dropout1(x)
        x = torch.flatten(x, 1)

        # *****START CODE



         # *****END CODE

        output = F.softmax(x, dim=1)
        return output

In [None]:
# define model
# *****START CODE
model =
# *****END CODE

In [None]:
# define optimizer, criterion and number of training epochs
# *****START CODE



# *****END CODE

In [None]:
# define confusion matrix using tnt package
confusion_matrix = tnt.meter.ConfusionMeter(10)

In [None]:
# create a directory for saving the models and the training progress
save_folder = os.path.join(drive_folder, 'models')
if not os.path.exists(save_folder):
  os.mkdir(save_folder)

In [None]:
##function which saves the overall accuracy and average loss at the end of each epoch,
##both for the training and the validation set
def write_results(save_folder, epoch, train_acc, val_acc, train_loss, val_loss):
    with open('./{}/progress.txt'.format(save_folder),'a') as ff:
      ff.write(' E: ')
      ff.write(str(epoch))
      ff.write('         ')
      ff.write(' TRAIN_OA: ')
      ff.write(str('%.3f' % train_acc))
      ff.write(' VAL_OA: ')
      ff.write(str('%.3f' % val_acc))
      ff.write('         ')
      ff.write(' TRAIN_LOSS: ')
      ff.write(str('%.3f' % train_loss))
      ff.write(' VAL_LOSS: ')
      ff.write(str('%.3f' % val_loss))
      ff.write('\n')

In [None]:
#function that creates the train-val loss graph
#variables 'train_loss' and 'val_loss' are losts containing the average losses for all the epochs
def save_graph(train_loss, val_loss, nb_epochs, save_folder):
    plt.plot(list(range(nb_epochs+1))[1:], train_loss)
    plt.plot(list(range(nb_epochs+1))[1:], val_loss)
    plt.legend(['train', 'val'])
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.savefig('{}/chart.png'.format(save_folder))

Now let's train our model !

We will do a validation step after each training epoch (do not forget to disable gradients during the validation step).

In [None]:
from tqdm import tqdm
import torch.nn.functional as F
total_train_losses = []
total_val_losses = []


for epoch in range(1,epochs+1):
    ##TRAINING##
    model.train()
    train_losses = []
    confusion_matrix.reset()

    for i, batch, in enumerate(tqdm(train_dataloader)):
        img_batch, lbl_batch = batch

        ##implement the forward and backward backpropagation
        # *****START CODE




        # *****END CODE

        train_losses.append(loss.item())
        confusion_matrix.add(outputs.data.squeeze(), lbl_batch.long())

        if i % 100 == 0:
            print('Train (epoch {}/{}) [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, epochs, i, len(train_dataloader),100.*i/len(train_dataloader), loss.item()))

    train_acc=(np.trace(confusion_matrix.conf)/float(np.ndarray.sum(confusion_matrix.conf))) *100
    train_loss_mean = np.mean(train_losses)
    total_train_losses.append(train_loss_mean)

    ##VALIDATION##
    model.eval()
    val_losses = []
    confusion_matrix.reset()

    for i, batch, in enumerate(tqdm(val_dataloader)):
        img_batch, lbl_batch = batch

        ##pass the images to the model and calculate the loss
        # *****START CODE


        # *****END CODE

        confusion_matrix.add(outputs.data.squeeze(), lbl_batch.long())
        val_losses.append(loss.item())

    print('Confusion Matrix:')
    print(confusion_matrix.conf)

    val_acc=(np.trace(confusion_matrix.conf)/float(np.ndarray.sum(confusion_matrix.conf))) *100
    val_loss_mean = np.mean(val_losses)
    total_val_losses.append(val_loss_mean)

    print('TRAIN_LOSS: ', '%.3f' % train_loss_mean, 'TRAIN_ACC: ', '%.3f' % train_acc)
    print('VAL_LOSS: ', '%.3f' % val_loss_mean, 'VAL_ACC: ', '%.3f' % val_acc)

    write_results(save_folder, epoch, train_acc, val_acc, train_loss_mean, val_loss_mean)

    torch.save(model.state_dict(), save_folder + '/model_{}.pt'.format(epoch))

save_graph(total_train_losses, total_val_losses, epochs, save_folder)



Similarly, test the model on the test split.
You will have to load the model previously saved (see [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html) on how to save and load PyTorch models).

In [None]:
## Initialize your test DataLoader
# *****START CODE



# *****END CODE

##define the model, load it and put it in evaluation mode
# *****START CODE



# *****END CODE

confusion_matrix = tnt.meter.ConfusionMeter(10)
confusion_matrix.reset()


for i, batch, in enumerate(tqdm(test_dataloader)):
    # *****START CODE


    # *****END CODE

    confusion_matrix.add(outputs.data.squeeze(), lbl_batch.long())

print(confusion_matrix.conf)

Plot the first six testing images along with their true labels.

In [None]:
##plot the testing images
# from torchvision.transforms import ToPILImage
iterator = iter(test_dataset)
n_images = 6
plt.figure(figsize=(10, n_images))
for i in range(n_images):
  plt.subplot(1, n_images, i+1)
  image, title = next(iterator)
  plt.imshow(image[0], cmap='gray')
  plt.title(title)

Print the model's predicted labels for the above images.

In [None]:
##print the predicted labels for the above images
# *****START CODE

# *****END CODE

# Bonus questions

## Build-in PyTorch DataLoader for MNIST dataset

In [None]:
import torch
from torchvision import datasets, transforms

transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

dataset1 = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
dataset2 = datasets.MNIST('../data', train=False,
                       transform=transform)

In [None]:
train_loader = torch.utils.data.DataLoader(dataset1, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(dataset2, batch_size=32, shuffle=False)

In [None]:
for batch_idx, batch in enumerate(train_loader):
        data, target = batch
        print('data', data.shape)
        print('target', target.shape)

## Link to Weight and Biaises (WandB)
Visualisation plateform to log metrics during training and evaluation

First, create an account on [WandB](https://wandb.ai/site) (you can set it up in 2 mins using your Google account)

In [None]:
# Install wandb on the device (only needed once if you are in local)
!pip install wandb --quiet

In [None]:
# Import the library
import wandb

In [None]:
# Then connect to your W&B account
def wandb_connect():
    wandb_api_key_label = "wandb_api_key"
    wandb_api_key = "YOUR API KEY" # here use your API key from WandB interface

    wandb_conx = wandb.login(key = wandb_api_key)
    print(f"Connected to Wandb online interface : {wandb_conx}")

wandb_connect()

In [None]:
# define model, optimizer, criterion and number of training epochs
# *****START CODE
model =
optimizer =
criterion =
epochs =
# *****END CODE

In [None]:
# Complete the hyperparams dict with the infos of your run
# *****START CODE
hyperparams = {"Batch size": ,
               "Learning rate": ,
               "Epochs":}
# *****END CODE

# Init the WandB run with hyperparams
wandb.init(config=hyperparams)

In [None]:
for epoch in range(1,epochs+1):
    ##TRAINING##
    model.train()
    train_losses = []
    confusion_matrix.reset()

    for i, batch, in enumerate(tqdm(train_dataloader)):
        img_batch, lbl_batch = batch

        ##implement the forward and backward backpropagation
        # *****START CODE




        # *****END CODE

        # log the training loss at each batch
        wandb.log({"train_loss":loss.item()})
        confusion_matrix.add(outputs.data.squeeze(), lbl_batch.long())

    train_acc=(np.trace(confusion_matrix.conf)/float(np.ndarray.sum(confusion_matrix.conf))) *100

    ##VALIDATION##
    model.eval()
    val_losses = []
    confusion_matrix.reset()

    for i, batch, in enumerate(tqdm(val_dataloader)):
        img_batch, lbl_batch = batch

        ##pass the images to the model and calculate the loss
        # *****START CODE


        # *****END CODE

        confusion_matrix.add(outputs.data.squeeze(), lbl_batch.long())
        val_losses.append(loss.item())

    val_acc=(np.trace(confusion_matrix.conf)/float(np.ndarray.sum(confusion_matrix.conf))) *100
    val_loss_mean = np.mean(val_losses)

    # log the train & val accuracy and the val loss at each epoch
    wandb.log({"train_acc":train_acc, "val_acc":val_acc, "val_loss":val_loss_mean})

Go on your WanB account and plot the train and val accuracy on the same chart

## Hyper-parameter tuning

In [None]:
# Train a model from scratch for all these different learning rates
# and store the final validation accuracy in a array
rates = [10**8, 10**6, 10**4, 10**2, 1, 10**(-2), 10**(-4), 10**(-6)]

# *****START CODE




# *****END CODE

In [None]:
# Plot the validation error with respect to the learning rate

# *****START CODE


# *****END CODE