# Model evaluation and re-training with AdaPT on Cifar10 dataset

In this notebook you can evaluate different approximate multipliers on various models based on Cifar10 dataset

Steps:
* Select models to load 
* Select number of threads to use
* Choose approximate multiplier 
* Load model for evaluation
* Load dataset
* Run model calibration for quantization
* Run model evaluation
* Run approximate-aware re-training
* Rerun model evaluation

**Note**:
* This notebook should be run on a X86 machine

* Please make sure you have run the installation steps first

In [1]:
import os
import zipfile
import torch

import requests
from torch.utils.data import DataLoader
from torchvision import transforms as T
from torchvision.datasets import CIFAR10
from tqdm import tqdm
import torch.nn as nn

## Select models to load 

The weights must be downloaded in state_dicts folder.


In [2]:
from models.resnet import resnet18, resnet34, resnet50
from models.vgg import vgg11_bn, vgg13_bn, vgg19_bn
from models.densenet import densenet121, densenet161, densenet169
from models.inception import inception_v3 # slow, propably bad cifar10 implementation of inception for PT

## Select number of threads to use

For optimal performance set them as the number of your cpu threads (not cpu cores)

In [3]:
threads = 40
torch.set_num_threads(threads)

#maybe better performance
%env OMP_PLACES=cores
%env OMP_PROC_BIND=close
%env OMP_WAIT_POLICY=active

env: OMP_PLACES=cores
env: OMP_PROC_BIND=close
env: OMP_WAIT_POLICY=active


## Choose approximate multiplier 

Two approximate multipliers are already provided

**mul8s_acc** - (header file: mul8s_acc.h)   <--  default

**mul8s_1L2H** - (header file: mul8s_1L2H.h)



In order to use your custom multiplier you need to use the provided tool (LUT_generator) to easily create the C++ header for your multiplier. Then you just place it inside the adapt/cpu-kernels/axx_mults folder. The name of the axx_mult here must match the name of the header file. The same axx_mult is used in all layers. 

Tip: If you want explicitly to set for each layer a different axx_mult you must do it from the model definition using the respective AdaPT_Conv2d class of each layer.

In [4]:
axx_mult = 'mul8s_acc'

## Load model for evaluation

Jit compilation method loads 'on the fly' the C++ extentions of the approximate multipliers. Then the pytorch model is loaded

In [5]:
model = resnet50(pretrained=True, axx_mult = axx_mult)

model.eval() # for evaluation

Using /root/.cache/torch_extensions as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/PyInit_conv2d_mul8s_acc...
Emitting ninja build file /root/.cache/torch_extensions/PyInit_conv2d_mul8s_acc/build.ninja...
Building extension module PyInit_conv2d_mul8s_acc...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF axx_conv2d.o.d -DTORCH_EXTENSION_NAME=PyInit_conv2d_mul8s_acc -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DAXX_MULT=mul8

ResNet(
  (conv1): AdaPT_Conv2d(
    3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
    (quantizer): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
    (quantizer_w): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
  )
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): AdaPT_Conv2d(
        64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
        (quantizer): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
        (quantizer_w): TensorQuantizer(8bit per-tensor amax=dynamic calibrator=HistogramCalibrator quant)
      )
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): AdaPT_Conv2d(
      

## Load dataset


In [6]:
def val_dataloader(mean = (0.4914, 0.4822, 0.4465), std = (0.2471, 0.2435, 0.2616)):

    transform = T.Compose(
        [
            T.ToTensor(),
            T.Normalize(mean, std),
        ]
    )
    dataset = CIFAR10(root="datasets/cifar10_data", train=False, download=True, transform=transform)
    dataloader = DataLoader(
        dataset,
        batch_size=128,
        num_workers=0,
        drop_last=True,
        pin_memory=False,
    )
    return dataloader

transform = T.Compose(
        [
            T.RandomCrop(32, padding=4),
            T.RandomHorizontalFlip(),
            T.ToTensor(),
            T.Normalize(mean = (0.4914, 0.4822, 0.4465), std = (0.2471, 0.2435, 0.2616)),
        ]
    )
dataset = CIFAR10(root="datasets/cifar10_data", train=True, download=True, transform=transform)

evens = list(range(0, len(dataset), 10))
trainset_1 = torch.utils.data.Subset(dataset, evens)

data = val_dataloader()

# data_t is used for calibration purposes and is a subset of train-set
data_t = DataLoader(trainset_1, batch_size=128,
                                            shuffle=False, num_workers=0)


Files already downloaded and verified
Files already downloaded and verified


## Run model calibration for quantization

Calibrates the quantization parameters 

Need to re-run it each time the model changes

In [7]:
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import calib

def collect_stats(model, data_loader, num_batches):
     """Feed data to the network and collect statistic"""

     # Enable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.disable_quant()
                 module.enable_calib()
             else:
                 module.disable()

     for i, (image, _) in tqdm(enumerate(data_loader), total=num_batches):
         model(image.cpu())
         if i >= num_batches:
             break

     # Disable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.enable_quant()
                 module.disable_calib()
             else:
                 module.enable()

def compute_amax(model, **kwargs):
 # Load calib result
 for name, module in model.named_modules():
     if isinstance(module, quant_nn.TensorQuantizer):
         if module._calibrator is not None:
             if isinstance(module._calibrator, calib.MaxCalibrator):
                 module.load_calib_amax()
             else:
                 module.load_calib_amax(**kwargs)
         print(F"{name:40}: {module}")
 model.cpu()

# It is a bit slow since we collect histograms on CPU
with torch.no_grad():
    stats = collect_stats(model, data_t, num_batches=2)
    amax = compute_amax(model, method="percentile", percentile=99.99)
    
    # optional - test different calibration methods
    #amax = compute_amax(model, method="mse")
    #amax = compute_amax(model, method="entropy")
    

100%|█████████████████████████████████████████████| 2/2 [00:20<00:00, 10.10s/it]
W1108 19:14:57.333475 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.333924 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.334270 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.334684 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.335093 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.335536 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.335944 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.336336 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.336743 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.337203 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:

W1108 19:14:57.377121 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.377450 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.377769 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.378094 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.378410 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.378733 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.381019 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.381440 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.381809 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.382157 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator
W1108 19:14:57.382537 140484954826560 tensor_quantizer.py:173] Disable HistogramCalibrator

W1108 19:14:57.434852 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.435732 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.436209 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.437160 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.437812 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.438928 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.439581 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.440821 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.441751 140484954826560 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1108 19:14:57.442603 140484954826560

conv1.quantizer                         : TensorQuantizer(8bit per-tensor amax=2.1255 calibrator=HistogramCalibrator quant)
conv1.quantizer_w                       : TensorQuantizer(8bit per-tensor amax=0.1625 calibrator=HistogramCalibrator quant)
layer1.0.conv1.quantizer                : TensorQuantizer(8bit per-tensor amax=0.6147 calibrator=HistogramCalibrator quant)
layer1.0.conv1.quantizer_w              : TensorQuantizer(8bit per-tensor amax=0.0804 calibrator=HistogramCalibrator quant)
layer1.0.conv2.quantizer                : TensorQuantizer(8bit per-tensor amax=0.2123 calibrator=HistogramCalibrator quant)
layer1.0.conv2.quantizer_w              : TensorQuantizer(8bit per-tensor amax=0.0355 calibrator=HistogramCalibrator quant)
layer1.0.conv3.quantizer                : TensorQuantizer(8bit per-tensor amax=0.2298 calibrator=HistogramCalibrator quant)
layer1.0.conv3.quantizer_w              : TensorQuantizer(8bit per-tensor amax=0.0544 calibrator=HistogramCalibrator quant)
layer1.0

## Run model evaluation

Tip: observe how the execution becomes faster and faster with each batch as the CPU achieves better cache re-use on the LUT table

In [8]:
import timeit
correct = 0
total = 0

model.eval()
start_time = timeit.default_timer()
with torch.no_grad():
    for iteraction, (images, labels) in tqdm(enumerate(data), total=len(data)):
        images, labels = images.to("cpu"), labels.to("cpu")
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(timeit.default_timer() - start_time)
print('Accuracy of the network on the 10000 test images: %.4f %%' % (
    100 * correct / total))

100%|███████████████████████████████████████████| 78/78 [14:48<00:00, 11.39s/it]

888.2028056679992
Accuracy of the network on the 10000 test images: 93.5597 %





## Run approximate-aware re-training


In [9]:
from adapt.references.classification.train import evaluate, train_one_epoch, load_data

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

# finetune the model for one epoch based on data_t subset 
train_one_epoch(model, criterion, optimizer, data_t, "cpu", 0, 1)

Epoch: [0]  [ 0/40]  eta: 0:34:44  lr: 0.0001  img/s: 2.4589313633531336  loss: 0.0322 (0.0322)  acc1: 100.0000 (100.0000)  acc5: 100.0000 (100.0000)  time: 52.1060  data: 0.0507
Epoch: [0]  [ 1/40]  eta: 0:33:06  lr: 0.0001  img/s: 2.5806590472004944  loss: 0.0281 (0.0301)  acc1: 100.0000 (100.0000)  acc5: 100.0000 (100.0000)  time: 50.9298  data: 0.1023
Epoch: [0]  [ 2/40]  eta: 0:35:03  lr: 0.0001  img/s: 1.9982702357262567  loss: 0.0322 (0.0334)  acc1: 100.0000 (100.0000)  acc5: 100.0000 (100.0000)  time: 55.3482  data: 0.1114
Epoch: [0]  [ 3/40]  eta: 0:35:28  lr: 0.0001  img/s: 2.0026249871504636  loss: 0.0322 (0.0454)  acc1: 100.0000 (99.6094)  acc5: 100.0000 (100.0000)  time: 57.5336  data: 0.1269
Epoch: [0]  [ 4/40]  eta: 0:35:41  lr: 0.0001  img/s: 1.9060153984495398  loss: 0.0329 (0.0429)  acc1: 100.0000 (99.6875)  acc5: 100.0000 (100.0000)  time: 59.4761  data: 0.1196
Epoch: [0]  [ 5/40]  eta: 0:35:44  lr: 0.0001  img/s: 1.8227123836482317  loss: 0.0329 (0.0432)  acc1: 100.

## Rerun model evaluation

In [10]:
correct = 0
total = 0

model.eval()
start_time = timeit.default_timer()
with torch.no_grad():
    for iteraction, (images, labels) in tqdm(enumerate(data), total=len(data)):
        images, labels = images.to("cpu"), labels.to("cpu")
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(timeit.default_timer() - start_time)
print('Accuracy of the network on the 10000 test images: %.4f %%' % (
    100 * correct / total))

 58%|████████████████████████▊                  | 45/78 [08:57<06:34, 11.95s/it]


KeyboardInterrupt: 