# Model evaluation and re-training with AdaPT on Cifar10 dataset

In this notebook you can evaluate different approximate multipliers on various models based on Cifar10 dataset

Steps:
* Select models to load 
* Select number of threads to use
* Choose approximate multiplier 
* Load model for evaluation
* Load dataset
* Run model calibration for quantization
* Run model evaluation
* Run approximate-aware re-training
* Rerun model evaluation

**Note**:
* This notebook should be run on a X86 machine

* Please make sure you have run the installation steps first

In [1]:
import os
import zipfile
import torch

import requests
from torch.utils.data import DataLoader
from torchvision import transforms as T
from torchvision.datasets import CIFAR10
from tqdm import tqdm
import torch.nn as nn
import torchvision.models as models

## Select models to load 

The weights must be downloaded in state_dicts folder.


In [2]:
from models.resnet import resnet18,resnet34,resnet50
from models.vggmp import vgg11_bn, vgg13_bn, vgg19_bn
from models.densenet import densenet121, densenet161, densenet169
#from models.squeezenet import squeezenet_cifar10
#from models.inception import inception_v3 # slow, propably bad cifar10 implementation of inception for PT

## Select number of threads to use

For optimal performance set them as the number of your cpu threads (not cpu cores)

In [3]:
threads = 8
torch.set_num_threads(threads)

#maybe better performance
%env OMP_PLACES=cores
%env OMP_PROC_BIND=close
%env OMP_WAIT_POLICY=active

env: OMP_PLACES=cores
env: OMP_PROC_BIND=close
env: OMP_WAIT_POLICY=active


## Choose approximate multiplier 

Two approximate multipliers are already provided

**mul8s_acc** - (header file: mul8s_acc.h)   <--  default

**mul8s_1L2H** - (header file: mul8s_1L2H.h)



In order to use your custom multiplier you need to use the provided tool (LUT_generator) to easily create the C++ header for your multiplier. Then you just place it inside the adapt/cpu-kernels/axx_mults folder. The name of the axx_mult here must match the name of the header file. The same axx_mult is used in all layers. 

Tip: If you want explicitly to set for each layer a different axx_mult you must do it from the model definition using the respective AdaPT_Conv2d class of each layer.

In [4]:
axx_mult = 'mul8s_1L2H'

## Load model for evaluation

Jit compilation method loads 'on the fly' the C++ extentions of the approximate multipliers. Then the pytorch model is loaded

In [5]:
#model = densenet121(pretrained=True, progress=True, device="cpu")
#model = resnet34(pretrained=True, progress=True, device="cpu", axx_mult_list=['mul8s_1KR3', 'mul8s_acc','mul8s_1KR3', 'mul8s_acc'])

#model.eval()


In [26]:
###################Only use this when using VGG grouped into 5 blocks###########################

# Define the approximate multipliers for each block
axx_mult_list = ['mul8s_1KR6','mul8s_1L2H','mul8s_1KR6', 'mul8s_acc','mul8s_1KR6']

# Load the VGG16 model with the specified multipliers
model = vgg13_bn(pretrained=True, progress=True, device="cpu", block_multipliers=axx_mult_list)

# Set the model to evaluation mode
model.eval()


Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module PyInit_conv2d_mul8s_1KR6, skipping build step...
Loading extension module PyInit_conv2d_mul8s_1KR6...
Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module PyInit_conv2d_mul8s_1KR6, skipping build step...
Loading extension module PyInit_conv2d_mul8s_1KR6...
Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module PyInit_conv2d_mul8s_1L2H, skipping build step...
Loading extension module PyInit_conv2d_mul8s_1L2H...
Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module PyInit_conv2d_mul8s_1L2H, skipping build step...
Loading extension module PyInit_conv2d_mul8s_1L2H...
Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-l

VGG(
  (features): Sequential(
    (0): AdaPT_Conv2d(
      3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
      (quantizer): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
      (quantizer_w): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
    )
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): AdaPT_Conv2d(
      64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
      (quantizer): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
      (quantizer_w): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
    )
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): AdaPT_Conv2d(
      64, 128, kernel_size=(3, 

In [None]:
# Example of creating a SqueezeNet model for CIFAR-10 with different approximate multipliers
#model = squeezenet_cifar10(pretrained=True, axx_mult_initial='mul8s_acc', axx_mult_fire='mul8s_acc', axx_mult_pool='mul8s_acc', axx_mult_final='mul8s_acc')

# Set the model to evaluation mode
#model.eval()

# If you want to run the model on a specific device
#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#model.to(device)


In [None]:

#####################Use this when using vgg with multipliers for each layer#############################

# Define the approximate multipliers for each layer
#layer_multipliers = ['mul8s_1KV8', 'mul8s_1KV8', 'mul8s_1L2H', 'mul8s_1KR6', 'mul8s_acc', 
 #                    'mul8s_1KV8', 'mul8s_1KVB', 'mul8s_1KVA', 'mul8s_1L2D', 'mul8s_1KR6', 
  #                   'mul8s_1L1G', 'mul8s_1KR3']

# Load the VGG16 model with the specified multipliers for each layer
#model = vgg11_bn(pretrained=True, progress=True, device="cpu", layer_multipliers=layer_multipliers)

# Set the model to evaluation mode
#model.eval()


In [None]:
print(model)

## Load dataset


In [27]:
def val_dataloader(mean = (0.4914, 0.4822, 0.4465), std = (0.2471, 0.2435, 0.2616)):

    transform = T.Compose(
        [
            T.ToTensor(),
            T.Normalize(mean, std),
        ]
    )
    dataset = CIFAR10(root="datasets/cifar10_data", train=False, download=True, transform=transform)
    dataloader = DataLoader(
        dataset,
        batch_size=128,
        num_workers=0,
        drop_last=True,
        pin_memory=False,
    )
    return dataloader

transform = T.Compose(
        [
            T.RandomCrop(32, padding=4),
            T.RandomHorizontalFlip(),
            T.ToTensor(),
            T.Normalize(mean = (0.4914, 0.4822, 0.4465), std = (0.2471, 0.2435, 0.2616)),
        ]
    )
dataset = CIFAR10(root="datasets/cifar10_data", train=True, download=True, transform=transform)

evens = list(range(0, len(dataset), 10))
trainset_1 = torch.utils.data.Subset(dataset, evens)

data = val_dataloader()

# data_t is used for calibration purposes and is a subset of train-set
data_t = DataLoader(trainset_1, batch_size=128,
                                            shuffle=False, num_workers=0)

Files already downloaded and verified
Files already downloaded and verified


## Run model calibration for quantization

Calibrates the quantization parameters 

Need to re-run it each time the model changes

In [28]:
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import calib

def collect_stats(model, data_loader, num_batches):
     """Feed data to the network and collect statistic"""

     # Enable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.disable_quant()
                 module.enable_calib()
             else:
                 module.disable()

     for i, (image, _) in tqdm(enumerate(data_loader), total=num_batches):
         model(image.cpu())
         if i >= num_batches:
             break

     # Disable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.enable_quant()
                 module.disable_calib()
             else:
                 module.enable()

def compute_amax(model, **kwargs):
 # Load calib result
 for name, module in model.named_modules():
     if isinstance(module, quant_nn.TensorQuantizer):
         if module._calibrator is not None:
             if isinstance(module._calibrator, calib.MaxCalibrator):
                 module.load_calib_amax()
             else:
                 module.load_calib_amax(**kwargs)
         print(F"{name:40}: {module}")
 model.cpu()

# It is a bit slow since we collect histograms on CPU
with torch.no_grad():
    stats = collect_stats(model, data_t, num_batches=2)
    amax = compute_amax(model, method="percentile", percentile=99.99)
    
    # optional - test different calibration methods
    #amax = compute_amax(model, method="mse")
    #amax = compute_amax(model, method="entropy")
    
    

100%|█████████████████████████████████████████████| 2/2 [00:05<00:00,  2.94s/it]
W0609 06:49:30.344012 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.395465 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.396069 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.396650 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.398214 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.399486 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.400676 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.403063 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.404444 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:49:30.408356 135901552236352 tensor_quantizer.py:173] Disable HistogramCalibrator
W0609 06:

features.0.quantizer                    : TensorQuantizer(8bit per-tensor amax=2.1255 calibrator=HistogramCalibrator quant)
features.0.quantizer_w                  : TensorQuantizer(8bit per-tensor amax=0.1404 calibrator=HistogramCalibrator quant)
features.3.quantizer                    : TensorQuantizer(8bit per-tensor amax=0.4994 calibrator=HistogramCalibrator quant)
features.3.quantizer_w                  : TensorQuantizer(8bit per-tensor amax=0.0558 calibrator=HistogramCalibrator quant)
features.7.quantizer                    : TensorQuantizer(8bit per-tensor amax=0.4290 calibrator=HistogramCalibrator quant)
features.7.quantizer_w                  : TensorQuantizer(8bit per-tensor amax=0.0371 calibrator=HistogramCalibrator quant)
features.10.quantizer                   : TensorQuantizer(8bit per-tensor amax=0.2061 calibrator=HistogramCalibrator quant)
features.10.quantizer_w                 : TensorQuantizer(8bit per-tensor amax=0.0249 calibrator=HistogramCalibrator quant)
features

## Run model evaluation

Tip: observe how the execution becomes faster and faster with each batch as the CPU achieves better cache re-use on the LUT table

In [None]:
import timeit
correct = 0
total = 0

model.eval()
start_time = timeit.default_timer()
with torch.no_grad():
    for iteraction, (images, labels) in tqdm(enumerate(data), total=len(data)):
        images, labels = images.to("cpu"), labels.to("cpu")
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(timeit.default_timer() - start_time)
print('Accuracy of the network on the 10000 test images: %.4f %%' % (
    100 * correct / total))

 94%|████████████████████████████████████████▏  | 73/78 [11:47<00:47,  9.57s/it]

## Run approximate-aware re-training


## Rerun model evaluation