# Training VGG16 - Tutorial

In this tutorial we will present how to train a VGG16 network and create a checkpoint for the trained network.

### Imports
Firstly, we start by importing all the necessary functions

In [1]:
import torch
import numpy as np
import torchvision
from torch import nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.optim as optim
import sys

from smithers.ml.vgg import VGG
from smithers.ml.utils import get_seq_model, save_checkpoint

import warnings
warnings.filterwarnings("ignore")

### Setting the proper device
The following lines will detect if a gpu is available in the system running this tutorial. If that is the case, all the objects of the following tutorial will be allocated in the gpu, thus speeding up the training process.

In [2]:
sys.path.insert(0, '../')
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print(f"{device} has been detected as the device which the script will be run on.")

cuda has been detected as the device which the script will be run on.


### Instatiation of the VGG16 network object, as defined in smithers.ml.vgg

In [3]:
VGGnet = VGG(    cfg=None,
                 classifier='cifar',
                 batch_norm=False,
                 num_classes=10,
                 init_weights=False,
                 pretrain_weights=None).to(device)
VGGnet.make_layers()
VGGnet._initialize_weights()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(VGGnet.parameters(), lr=0.001, momentum=0.9)
seq_model = get_seq_model(VGGnet).to(device)


Loaded base model.



### Loading of the CIFAR10 dataset
As stated before, we use the CIFAR10 dataset (already implemented in PyTorch) to test our technique. It is a computer-vision dataset used for object recognition. It consists of 60000 32 × 32 colour images divided in 10 non-overlapping classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

See https://www.cs.toronto.edu/~kriz/cifar.html for more details on this dataset and on how to download it.

In [4]:
batch_size = 8 #this can be changed
data_path = '../datasets/' 
# transform functions: take in input a PIL image and apply this
# transformations
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
train_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                 train=True,
                                 download=True,
                                 transform=transform_train)
train_loader = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                train=False,
                                download=True,
                                transform=transform_test)
test_loader = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)
train_labels = torch.tensor(train_loader.dataset.targets).to(device)
targets = list(train_labels)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../datasets/CIFAR10/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ../datasets/CIFAR10/cifar-10-python.tar.gz to ../datasets/CIFAR10/
Files already downloaded and verified


### Preliminary accuracy test
We will now compute the accuracy of the randomly initialized network on the test images before training it

In [5]:
total = 0
correct = 0
count = 0
seq_model.eval()
for test, y_test in iter(test_loader):
    with torch.no_grad():
        output = seq_model(test.to(device)).to(device)
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).sum().item()
        count += 1
print('Accuracy of network on test images is {:.2f}%'.format(100*correct/total), flush=True)

Accuracy of network on test images is 13.89%


### Training phase 
The following lines of code will train the network on the train images from the CIFAR10 dataset. Beware that training time can be very long and a gpu is recommended.

It is possible to change the number of epochs for the training: if a large number is given, the network will perform better on the train images but will take longer to train, vice versa for a low number.

In [6]:
n_epochs = 10
for epoch in range(n_epochs):
    print(f"Epoch number {epoch+1} is now starting.", flush=True)
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)


        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = VGGnet(inputs)
        outputs = outputs[1]
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch number {epoch+1} is now completed.", flush=True)
print(f"The network has been successfully trained for {n_epochs} epochs.")

Epoch number 1 is now starting.
Epoch number 1 is now completed.
Epoch number 2 is now starting.
Epoch number 2 is now completed.
Epoch number 3 is now starting.
Epoch number 3 is now completed.
Epoch number 4 is now starting.
Epoch number 4 is now completed.
Epoch number 5 is now starting.
Epoch number 5 is now completed.
Epoch number 6 is now starting.
Epoch number 6 is now completed.
Epoch number 7 is now starting.
Epoch number 7 is now completed.
Epoch number 8 is now starting.
Epoch number 8 is now completed.
Epoch number 9 is now starting.
Epoch number 9 is now completed.
Epoch number 10 is now starting.
Epoch number 10 is now completed.
The network has been successfully trained for 10 epochs.


### Accuracy after the training phase
We can check whether the accuracy on the test images has improved.

In [7]:
total = 0
correct = 0
count = 0
seq_model.eval()
for test, y_test in iter(test_loader):
    with torch.no_grad():
        output = seq_model(test.to(device)).to(device)
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).sum().item()
        count += 1
print('Accuracy of network on test images is {:.2f}%'.format(100*correct/total), flush=True)

Accuracy of network on test images is 80.93%


### Creating a checkpoint of the state of the network
Once the training is done, it would be nice to save the state of the network for a later use. To do so, we will use the save_checkpoint function defined in smither.ml.utils, that has been initially imported in this tutorial.

In [None]:
path = # write here the desired path for the checkpoint
save_checkpoint(n_epochs, VGGnet, path, optimizer)