# Dimensionality Reduction of VGG16
In this tutorial we will present how to create a reduced version of VGG16 using the techniques described in the article 

''A Dimensionality Reduction Approach for Convolutional Neural Networks'', Meneghetti L., Demo N., Rozza G., https://arxiv.org/abs/2110.09163 (2021).

and in the paper ''Deep neural network compression via tensor decomposition'' by S. Zanin, L. Meneghetti, N. Demo and G. Rozza, that is currently in preparation.

### Imports
We start by importing all the necessary libraries and functions.

In [1]:
import sys
sys.path.insert(0, '/scratch/lmeneghe/Smithers/')
import os
import torch
import numpy as np
import torchvision
from torch import nn

import torchvision.transforms as transforms
import torchvision.datasets as datasets
import pandas as pd
import torch.optim as optim

from smithers.ml.models.vgg import VGG
from smithers.ml.models.utils_rednet import get_seq_model, Total_param, Total_flops, compute_loss, train_kd
from smithers.ml.models.netadapter import NetAdapter


import warnings

### Setting the proper device
The following lines will detect if a gpu is available in the system running this tutorial. If that is the case, all the objects of the following tutorial will be allocated in the gpu, thus speeding up the training process.

In [2]:
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print(f"{device} has been detected as the device which the script will be run on.")

cuda has been detected as the device which the script will be run on.


## Loading of the dataset
### CIFAR10 Dataset
We use the CIFAR10 dataset (already implemented in PyTorch) to test our technique. It is a computer-vision dataset used for object recognition. It consists of 60000 32 × 32 colour images divided in 10 non-overlapping classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

See https://www.cs.toronto.edu/~kriz/cifar.html for more details on this dataset and on how to download it.

In [3]:
batch_size = 8 
data_path = '../cifar/' 
# transform functions: take in input a PIL image and apply this
# transformations
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
train_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                 train=True,
                                 download=True,
                                 transform=transform_train)
train_loader = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_dataset = datasets.CIFAR10(root=data_path + 'CIFAR10/',
                                train=False,
                                download=True,
                                transform=transform_test)
test_loader = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)
train_labels = torch.tensor(train_loader.dataset.targets)
targets = list(train_labels)

Files already downloaded and verified
Files already downloaded and verified


### Custom dataset
If we want to use a custom dataset, we need firstly to construct it, following for example the tutorial on the construction of a custom dataset for the problem of Image Recognition (***customdata_imagerec***). 

## Loading of the model
First of all we need to load the model we want to use (in this case VGG16) starting from a checkpoint file, i.e. a file containing the status of the model after a training process with a chosen dataset. Here we will use the CIFAR10 dataset, but everythong can be also generalized for a custom dataset or another benchmark dataset.

It is important to highlight that the models of VGG-nets implemented in PyTorch (https://pytorch.org/hub/pytorch_vision_vgg/), e.g. 
```
model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True),
```

are models pre-trained on the ImageNet dataset, that consists of images of dimensions 224x224. Therefore, in order to use datasets like the CIFAR10, composed of images 32x32, we need to change the architecture of VGG-nets, as was done in the file ***smithers/ml/vgg.py***.

In order to obtain a checkpoint file, the tutorial ***training_VGG16*** can be followed.

In [4]:
# pretrained = insert here the proper path for your device
pretrained = 'check_vgg.pth'
VGGnet = torch.load(pretrained)
seq_model = get_seq_model(VGGnet).to(device)

## Reduction of VGG16
We now perform the reduction of VGG16 using the module ***NetAdapter***. For the reduced method and the input-output mapping there are multiple choices: 'POD', 'AS', 'RandSVD' or 'HOSVD' for the first one, and 'PCE' or 'FNN' for the latter. We are now going to provide some examples of possible combinations of the aforementioned techniques. The Figure below summarizes the reduction method proposed, as described in the article ''A Dimensionality Reduction Approach for Convolutional Neural Networks'', Meneghetti L., Demo N., Rozza G., https://arxiv.org/abs/2110.09163 (2021).

<img src = "images/red_cnn.png" style = "height:400px">

NOTE: To use the Active Subspace as reduction method, the Python package ATHENA should be downloaded from https://github.com/mathLab/ATHENA.


Let's start by computing the current accuracy of the network.

In [5]:
total = 0
correct = 0
seq_model.eval()
for test, y_test in iter(test_loader):
#Calculate the class probabilities (softmax) for img
    with torch.no_grad():
        output = seq_model(test.to(device)).to(device)
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).sum().item()
    
print("Accuracy of the full network on test images is {:.4f}".format(100*correct/total))

Accuracy of the full network on test images is 87.5100


## POD + FNN
The first method we describe uses POD as reduction technique and a Feedforward Neural Network (FNN) as input-output mapping.

In [None]:
cutoff_idx = 7 
red_dim = 50 
red_method = 'POD' 
inout_method = 'FNN'
n_class = 10
netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class)
print(red_model)

## RandSVD + FNN
A small variant of the previous case can be obtained using Random SVD as reduction technique and a Feedforward Neural Network (FNN) as input-output mapping.

In [None]:
cutoff_idx = 5 
red_dim = 50 
red_method = 'RandSVD' 
inout_method = 'FNN'
n_class = 10
netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class)
print(red_model)

## AHOSVD + FNN
A different choice is represented by the introduction of HOSVD as reduction technique that keeps into account the tensorial structure of the objects under consideration. Hence, in this case we are using a variant of HOSVD, called Averaged HOSVD (AHOSVD), which performs HOSVD in batches and then computes the average between them to overcome the high computational effort needed. In particular, we are the n coupling AHOSVD with FNN as before.

In [6]:
cutoff_idx = 7
red_method= 'HOSVD'
red_dim = [35, 3, 3]
inout_method = 'FNN'
n_class = 10  

netadapter = NetAdapter(cutoff_idx, red_dim, red_method, inout_method)
red_model = netadapter.reduce_net(seq_model, train_dataset, train_labels, train_loader, n_class, device = device).to(device) 
print(red_model, flush=True)

FNN training initialized
FNN training completed
RedNet(
  (premodel): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool

### Training of the reduced network
Now that the reduced network has been defined, we can train it. The technique used is "knowledge distillation", i.e. try to use the knowledge contained in the original full model, also referred as the teacher model, to train the the reduced model, called student model.

In [7]:
import copy

optimizer = torch.optim.Adam([{
            'params': red_model.premodel.parameters(),
            'lr': 1e-4
            }, {
            'params': red_model.proj_model.parameters(),
            'lr': 1e-5
            }, {
            'params': red_model.inout_map.parameters(),
            'lr': 1e-5
            }])

train_loss = []
test_loss = []
train_loss.append(compute_loss(red_model, device, train_loader))
test_loss.append(compute_loss(red_model, device, test_loader))

        
epochs = 2
filename = './cifar10_VGG16_RedNet'+red_method+'_cutIDx_%d.pth'%(cutoff_idx)
for epoch in range(1, epochs + 1):                     
    print('EPOCH {}'.format(epoch), flush=True)
    train_loss.append(
            train_kd(red_model,
            VGGnet,
            device,
            train_loader,
            optimizer,
            train_max_batch=200,
            alpha=0.1,
            temperature=1.,
            epoch=epoch))
    test_loss.append(compute_loss(red_model, device, test_loader))
torch.save(copy.deepcopy(red_model), filename)

Test Loss 0.4235516825962067
 Top 1:  Accuracy: 4614.0/50000 (9.23%)
Loss Value: 8.471033651924133e-06
Test Loss 0.5047410353899002
 Top 1:  Accuracy: 841.0/10000 (8.41%)
Loss Value: 5.047410353899002e-05
EPOCH 1




Train Loss kd: 1.5311928987503053e-05
Test Loss -0.029346163740754126
 Top 1:  Accuracy: 8097.0/10000 (80.97%)
Loss Value: -2.9346163740754128e-06
EPOCH 2
Train Loss kd: 3.4757274389266966e-06
Test Loss 0.4364694202244282
 Top 1:  Accuracy: 8363.0/10000 (83.63%)
Loss Value: 4.364694202244282e-05


### Loading a reduced network checkpoint
If a reduced network has been already defined and saved on the computer, it can be loaded with the following instructions

In [10]:
checkpoint = filename #path to the checkpoint file
red_model = torch.load(checkpoint)

### Accuracy testing
We can further test the accuracy of the network with the following code.

In [12]:
total = 0
correct = 0
for test, y_test in iter(test_loader):
    with torch.no_grad():
        output = red_model(test.to(device))
        ps = torch.exp(output)
        _, predicted = torch.max(output.data,1)
        total += y_test.size(0)
        correct += (predicted == y_test.to(device)).to(device).sum().item()

print("Accuracy of the full network on test images is {:.4f}".format(100*correct/total))

Accuracy of the full network on test images is 83.6300


### Storage and flops needed for the model
The following lines of code provide the amounts of storage needed to save the reduced model together with the number of floating point operations needed to compute them.

We start by counting the number of non-zero entries (nnz) of the three components of the reduced network (this method only concerns the POD+FNN and RandSVD+FNN techniques, regarding the storage) and the flops.

In [13]:
rednet_storage = torch.zeros(3)
rednet_flops = torch.zeros(3)

rednet_storage[0], rednet_storage[1], rednet_storage[2] = [
    Total_param(red_model.premodel),
    Total_param(red_model.proj_model),
    Total_param(red_model.inout_map)]

rednet_flops[0], rednet_flops[1], rednet_flops[2] = [
    Total_flops(red_model.premodel, device),
    Total_flops(red_model.proj_model, device),
    Total_flops(red_model.inout_map, device)]


print('Pre nnz = {:.2f}, proj_model nnz={:.2f}, FNN nnz={:.4f}'.format(
      rednet_storage[0], rednet_storage[1],
      rednet_storage[2]))
print('flops:  Pre = {:.2f}, proj_model = {:.2f}, FNN ={:.2f}'.format(
       rednet_flops[0], rednet_flops[1], rednet_flops[2]))


Pre nnz = 6.62, proj_model nnz=0.03, FNN nnz=0.0249
flops:  Pre = 190.51, proj_model = 0.00, FNN =0.01


Now we can define another method that counts the storage needed for saving the reduced model (in MB).

In [14]:
print('Computing the storage needed by the RedNet model.\nComponents summary:')
storage = 0
for param_tensor in red_model.state_dict():
    print(param_tensor, "\t", red_model.state_dict()[param_tensor].size())
    storage += torch.prod(torch.tensor(list(red_model.state_dict()[param_tensor].size())))
print(f"\n\nThe MB used are: {4 * storage / 10 ** 6}")

Computing the storage needed by the RedNet model.
Components summary:
premodel.0.weight 	 torch.Size([64, 3, 3, 3])
premodel.0.bias 	 torch.Size([64])
premodel.2.weight 	 torch.Size([64, 64, 3, 3])
premodel.2.bias 	 torch.Size([64])
premodel.5.weight 	 torch.Size([128, 64, 3, 3])
premodel.5.bias 	 torch.Size([128])
premodel.7.weight 	 torch.Size([128, 128, 3, 3])
premodel.7.bias 	 torch.Size([128])
premodel.10.weight 	 torch.Size([256, 128, 3, 3])
premodel.10.bias 	 torch.Size([256])
premodel.12.weight 	 torch.Size([256, 256, 3, 3])
premodel.12.bias 	 torch.Size([256])
premodel.14.weight 	 torch.Size([256, 256, 3, 3])
premodel.14.bias 	 torch.Size([256])
proj_model.param0 	 torch.Size([35, 256])
proj_model.param1 	 torch.Size([3, 4])
proj_model.param2 	 torch.Size([3, 4])
inout_map.model.0.weight 	 torch.Size([20, 315])
inout_map.model.0.bias 	 torch.Size([20])
inout_map.model.2.weight 	 torch.Size([10, 20])
inout_map.model.2.bias 	 torch.Size([10])


The MB used are: 7.004007816314697