# Dimensionality Reduction of VGG16
In this tutorial we will present how to create a reduced version of VGG16 using the techniques described in the article ''A Dimensionality Reduction Approach for Convolutional Neural Networks'', Meneghetti L., Demo N., Rozza G., https://arxiv.org/abs/2110.09163 (2021)

### Imports
We start by importing all the necessary libraries and functions.

In [1]:
import sys
import os
import torch
import numpy as np
import torchvision
from torch import nn

import torchvision.transforms as transforms
import torchvision.datasets as datasets
import pandas as pd
import torch.optim as optim

from smithers.ml.vgg import VGG
from smithers.ml.utils import load_checkpoint, save_checkpoint, get_seq_model, Total_param, Total_flops, compute_loss, train_kd
from smithers.ml.netadapter import NetAdapter


import warnings
warnings.filterwarnings("ignore")

### Setting the proper device
The following lines will detect if a gpu is available in the system running this tutorial. If that is the case, all the objects of the following tutorial will be allocated in the gpu, thus speeding up the training process.

In [3]:
sys.path.insert(0, '../')
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print(f"{device} has been detected as the device which the script will be run on.")

cuda has been detected as the device which the script will be run on.


### VGG16 initialization
Instatiation of the VGG16 network object, as defined in smithers.ml.vgg

In [3]:
VGGnet = VGG(    cfg=None,
                 classifier='cifar',
                 batch_norm=False,
                 num_classes=10,
                 init_weights=False,
                 pretrain_weights=None).to(device)
VGGnet.make_layers()
VGGnet._initialize_weights()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(VGGnet.parameters(), lr=0.001, momentum=0.9)


Loaded base model.



## Loading of the dataset
### CIFAR10 Dataset
As stated before, we use the CIFAR10 dataset (already implemented in PyTorch) to test our technique. It is a computer-vision dataset used for object recognition. It consists of 60000 32 × 32 colour images divided in 10 non-overlapping classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

See https://www.cs.toronto.edu/~kriz/cifar.html for more details on this dataset and on how to download it.

In [4]:
batch_size = 8 #this can be changed
data_path = '../datasets/' 
# transform functions: take in input a PIL image and apply this
# transformations
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
train_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                 train=True,
                                 download=True,
                                 transform=transform_train)
train_loader = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                train=False,
                                download=True,
                                transform=transform_test)
test_loader = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)
train_labels = torch.tensor(train_loader.dataset.targets)
targets = list(train_labels)

Files already downloaded and verified
Files already downloaded and verified


## Loading of the model
First of all we need to load the model we want to use (in this case VGG16) starting from a checkpoint file, i.e. a file containing the status of the model after a training process with a chosen dataset. Here we will use the CIFAR10 dataset, but we will also show how to generalize everythong using a custom dataset.

It is important to highlight that the models of VGG-nets implemented in PyTorch (https://pytorch.org/hub/pytorch_vision_vgg/), e.g. 
```
model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True),
```

are models pre-trained on the ImageNet dataset, that consists of images of dimensions 224x224. Therefore, in order to use datasets like the CIFAR10, composed of images 32x32, we need to change the architecture of VGG-nets, as was done in the file 'vgg.py'. 

In [5]:
sys.path.insert(0, '../')
pretrained = '/u/s/szanin/Smithers/smithers/ml/tutorials/check_vgg_cifar10_60_v2.pth.tar'
# pretrained = insert here the proper path for your device
model = VGGnet.to(device)
load_checkpoint(model, pretrained)
seq_model = get_seq_model(model).to(device)

### Training phase
For further details on this phase, see the relative tutorial 'training_VGG16.ipynb'.
(skip this cell if a trained VGGnet has been loaded in the previous cell)

In [6]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(VGGnet.parameters(), lr=0.001, momentum=0.9)


for epoch in range(10):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = VGGnet(inputs)
        outputs = outputs[1]
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

[1,  2000] loss: 2.107
[1,  4000] loss: 1.783
[1,  6000] loss: 1.605


### Custom dataset
If we want to use a custom dataset, we need firstly to construct it, following for example the tutorial on the construction of a custom dataset for the problem of Image Recognition. Hence, the previuous cell will be substitute with the following one.

In [None]:
from torch.utils.data.sampler import SubsetRandomSampler
from collections import OrderedDict
from smithers.ml.imagerec_dataset import Imagerec_Dataset

# load custom dataset for training and testing
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

data = pd.read_csv('../dataset_imagerec/dataframe.csv')
data_path = '../dataset_imagerec/'
# SPLIT OF THE DATASET
batch_size = 128
validation_split = .2
shuffle_dataset = True
random_seed = 42

dataset_size = len(data)
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset:
    np.random.seed(random_seed)
    np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]
print('train data', len(train_indices))
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)
resize_dim = [32, 32]

dataset_imagerec = Imagerec_Dataset(data, data_path, resize_dim, transform)
train_dataset = dataset_imagerec.getdata(train_indices)
train_loader = torch.utils.data.DataLoader(dataset_imagerec,
                                           batch_size=batch_size,
                                           sampler=train_sampler)
test_loader = torch.utils.data.DataLoader(dataset_imagerec,
                                          batch_size=batch_size,
                                          sampler=valid_sampler)

classes = ('Cups', 'Dishes', 'Glass', 'Mixed')
#classes = ('class_1', 'class_2', 'class_3', 'class_4')
n_class = len(classes)
targets = list(dataset_imagerec.targets[train_indices])
train_labels = torch.tensor(targets)

## Reduction of VGG16
We now perform the reduction of VGG16 using the module NetAdapter. In this case we use 5 as cut off index and 50 as dimension of the reduced space. For the reduced method and the input-output mapping there are multiple choices: 'POD', 'AS', 'RandSVD' or 'HOSVD' for the first one, and 'PCE' or 'FNN' for the latter.

Let's start by computing the current accuracy of the network.

In [6]:
total = 0
correct = 0
count = 0
seq_model.eval()
for test, y_test in iter(test_loader):
#Calculate the class probabilities (softmax) for img
    with torch.no_grad():
        output = seq_model(test.to(device)).to(device)
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).sum().item()
        count += 1
        if count % 250 == 0:
            print("Accuracy of network on test images is {:.4f}....count: {}".format(100*correct/total,  count ))

Accuracy of network on test images is 89.0500....count: 250
Accuracy of network on test images is 89.0250....count: 500
Accuracy of network on test images is 88.8167....count: 750
Accuracy of network on test images is 88.7750....count: 1000
Accuracy of network on test images is 88.8200....count: 1250


The following cells have be designed to illustrate the different approaches that one can follow in the reduction process

## POD + FNN
To get more details on this, please refer to fnn.py and the torch.svd function. General informations can be found in netadapter.py.

In [8]:
cutoff_idx = 7 
red_dim = 50 
red_method = 'POD' 
inout_method = 'FNN'
n_class = 10
netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class)
print(red_model)

Initializing reduction. Chosen reduction method is: POD
Initializing dataset forwarding
Dataset forwarding complete
FNN training initialized
FNN training completed
RedNet(
  (premodel): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14)

## RandSVD + FNN
To get more details on this, please refer to fnn.py and the torch.svd function. General informations can be found in netadapter.py.

In [6]:
cutoff_idx = 7 
red_dim = 50 
red_method = 'RandSVD' 
inout_method = 'FNN'
n_class = 10
netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class)
print(red_model)

Initializing reduction. Chosen reduction method is: RandSVD
Initializing dataset forwarding
Dataset forwarding complete
FNN training initialized
FNN training completed
RedNet(
  (premodel): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    

## AHOSVD + FNN
To get more details on this, please refer to fnn.py, AHOSVD.py, hosvd.py and tensor_product_layer.py. General informations can be found in netadapter.py.

In [7]:
cutoff_idx = 7
mode_list_batch=[25, 35, 3, 3]
red_method= 'HOSVD'
red_dim = 3 * 3 * 35    # red_dim, in the case of HOSVD, must be the product of all but the first enteries of mode_list_batch
inout_method = 'FNN'
n_class = 10  

netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class, device = device, mode_list_batch = mode_list_batch).to(device) 
print(red_model, flush=True)

Initializing reduction. Chosen reduction method is: HOSVD
Initializing dataset forwarding
Dataset forwarding complete
FNN training initialized
FNN training completed
RedNet(
  (premodel): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (1

### Training of the reduced network
Now that the reduced network has been defined, we can train it. The technique used is referred to as "knowledge distillation". In it, two model are used: the trained model, called student model, and an already trained module, the teacher model, which the student model will learn from. In our case, the full and trained VGG16 network will act as teacher model.

At the end of the training the model will be saved.

In [8]:
optimizer = torch.optim.Adam([{
            'params': red_model.premodel.parameters(),
            'lr': 1e-4
            }, {
            'params': red_model.proj_model.parameters(),
            'lr': 1e-5
            }, {
            'params': red_model.inout_map.parameters(),
            'lr': 1e-5
            }])

train_loss = []
test_loss = []
train_loss.append(compute_loss(red_model, device, train_loader))
test_loss.append(compute_loss(red_model, device, test_loader))

        
epochs = 5
filename = './cifar10_VGG16_RedNet'+red_method+'_cutIDx_%d.pth'%(cutoff_idx)
for epoch in range(1, epochs + 1):                     
    print('EPOCH {}'.format(epoch), flush=True)
    train_loss.append(
            train_kd(red_model,
            model,
            device,
            train_loader,
            optimizer,
            train_max_batch=200,
            alpha=0.1,
            temperature=1.,
            epoch=epoch))
    test_loss.append(compute_loss(red_model, device, test_loader))
#torch.save([red_model.state_dict(), train_loss, test_loss], filename)

Test Loss -6.695368256926536e-06
 Top 1:  Accuracy: 5970.0/50000 (11.94%)
Test Loss: -0.3347684128463268
Test Loss -3.5530161729455e-05
 Top 1:  Accuracy: 1137.0/10000 (11.37%)
Test Loss: -0.35530161729454995
EPOCH 1
Train Loss kd: 7.700555324554444e-06
Test Loss -0.001128806265487671
 Top 1:  Accuracy: 8473.0/10000 (84.73%)
Test Loss: -11.28806265487671
EPOCH 2
Train Loss kd: 5.962002873420716e-06
Test Loss -0.001205775994644165
 Top 1:  Accuracy: 8714.0/10000 (87.14%)
Test Loss: -12.057759946441651
EPOCH 3
Train Loss kd: 1.0420675575733186e-06
Test Loss -0.0014075436742401123
 Top 1:  Accuracy: 8774.0/10000 (87.74%)
Test Loss: -14.075436742401124
EPOCH 4
Train Loss kd: 2.5630873441696166e-06
Test Loss -0.0014386886726379395
 Top 1:  Accuracy: 8793.0/10000 (87.93%)
Test Loss: -14.386886726379394
EPOCH 5
Train Loss kd: 7.660793513059617e-07
Test Loss -0.0015669711363220215
 Top 1:  Accuracy: 8833.0/10000 (88.33%)
Test Loss: -15.669711363220214


### Loading a reduced network checkpoint
If a reduced network has been already defined and saved on the computer, it can be loaded with the following instructions

In [15]:
if os.path.isfile(filename):
    [rednet_pretrained, train_loss,test_loss] = torch.load(filename)
    red_model.load_state_dict(rednet_pretrained)
    print('RedNet checkpoint (trained for {} epoches) was correctly loaded.'.format(epochs))

RedNet checkpoint (trained for 5 epoches) was correctly loaded.


### Accuracy testing
We can further test the accuracy of the network with the following code.

In [9]:
total = 0
correct = 0
count = 0
for test, y_test in iter(test_loader):
    with torch.no_grad():
        output = red_model(test.to(device))
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).to(device).sum().item()
        count += 1
        if count%250 == 0:
            print("Accuracy of network on test images is {:.4f}....count: {}".format(100*correct/total,  count ))

Accuracy of network on test images is 87.1000....count: 250
Accuracy of network on test images is 87.7750....count: 500
Accuracy of network on test images is 88.0833....count: 750
Accuracy of network on test images is 88.1625....count: 1000
Accuracy of network on test images is 88.3300....count: 1250


### Storage and flops needed for the model
The following lines of code provide the amounts of storage needed to save the reduced model together with the number of floating point operations needed to compute them.

We start by counting the number of non-zero entries (nnz) of the three components of the reduced network (this method only concerns the POD+FNN and RandSVD+FNN techniques, regarding the storage) and the flops.

In [10]:
rednet_storage = torch.zeros(3)
rednet_flops = torch.zeros(3)

rednet_storage[0], rednet_storage[1], rednet_storage[2] = [
    Total_param(red_model.premodel),
    Total_param(red_model.proj_model),
    Total_param(red_model.inout_map)]

rednet_flops[0], rednet_flops[1], rednet_flops[2] = [
    Total_flops(red_model.premodel, device),
    Total_flops(red_model.proj_model, device),
    Total_flops(red_model.inout_map, device)]


print('Pre nnz = {:.2f}, proj_model nnz={:.2f}, FNN nnz={:.4f}'.format(
      rednet_storage[0], rednet_storage[1],
      rednet_storage[2]))
print('flops:  Pre = {:.2f}, proj_model = {:.2f}, FNN ={:.2f}'.format(
       rednet_flops[0], rednet_flops[1], rednet_flops[2]))


Pre nnz = 6.62, proj_model nnz=0.03, FNN nnz=0.0249
flops:  Pre = 190.51, proj_model = 0.00, FNN =0.01


Now we can define another method that counts the storage needed for saving the reduced model (in MB).

In [11]:
print('Computing the storage needed by the RedNet model.\nComponents summary:')
storage = 0
for param_tensor in red_model.state_dict():
    print(param_tensor, "\t", red_model.state_dict()[param_tensor].size())
    storage += torch.prod(torch.tensor(list(red_model.state_dict()[param_tensor].size())))
print(f"\n\nThe MB used are: {4 * storage / 10 ** 6}")

Computing the storage needed by the RedNet model.
Components summary:
premodel.0.weight 	 torch.Size([64, 3, 3, 3])
premodel.0.bias 	 torch.Size([64])
premodel.2.weight 	 torch.Size([64, 64, 3, 3])
premodel.2.bias 	 torch.Size([64])
premodel.5.weight 	 torch.Size([128, 64, 3, 3])
premodel.5.bias 	 torch.Size([128])
premodel.7.weight 	 torch.Size([128, 128, 3, 3])
premodel.7.bias 	 torch.Size([128])
premodel.10.weight 	 torch.Size([256, 128, 3, 3])
premodel.10.bias 	 torch.Size([256])
premodel.12.weight 	 torch.Size([256, 256, 3, 3])
premodel.12.bias 	 torch.Size([256])
premodel.14.weight 	 torch.Size([256, 256, 3, 3])
premodel.14.bias 	 torch.Size([256])
proj_model.param0 	 torch.Size([35, 256])
proj_model.param1 	 torch.Size([3, 4])
proj_model.param2 	 torch.Size([3, 4])
inout_map.model.0.weight 	 torch.Size([20, 315])
inout_map.model.0.bias 	 torch.Size([20])
inout_map.model.2.weight 	 torch.Size([10, 20])
inout_map.model.2.bias 	 torch.Size([10])


The used MB are: 7.004007816314697