# Model evaluation and re-training with AdaPT on Imagenet dataset

In this notebook you can evaluate different approximate multipliers on various models based on ImageNet dataset

Steps:
* Select models to load 
* Select number of threads to use
* Choose approximate multiplier 
* Load model for evaluation
* Load dataset
* Create validation function
* Run model calibration for quantization
* Run model evaluation
* Run approximate-aware re-training
* Rerun model evaluation

**Note**:
* This notebook should be run on a X86 machine

* Please make sure you have run the installation steps first

In [None]:
import os
import zipfile
import torch

import torch.utils.data
import torchvision.transforms as transforms
import torchvision.datasets as datasets

import timeit
from tqdm import tqdm

## Select models to load 

The weights must be downloaded in state_dicts folder.


In [None]:
from models.squeezenet_imagenet import squeezenet1_1
from models.inception_imagenet import inception_v3
from models.shufflenet_imagenet import shufflenet_v2_x1_0

## Select number of threads to use

For optimal performance set them as the number of your cpu threads (not cpu cores)

In [None]:
threads = 40
torch.set_num_threads(threads)

#maybe better performance
%env OMP_PLACES=cores
%env OMP_PROC_BIND=close
%env OMP_WAIT_POLICY=active

## Choose approximate multiplier 

Two approximate multipliers are already provided

**mul8s_acc** - (header file: mul8s_acc.h)   <--  default

**mul8s_1L2H** - (header file: mul8s_1L2H.h)



In order to use your custom multiplier you need to use the provided tool (LUT_generator) to easily create the C++ header for your multiplier. Then you just place it inside the adapt/cpu-kernels/axx_mults folder. The name of the axx_mult here must match the name of the header file. The same axx_mult is used in all layers. 

Tip: If you want explicitly to set for each layer a different axx_mult you must do it from the model definition using the respective AdaPT_Conv2d class of each layer.

In [None]:
axx_mult = 'mul8s_acc'

## Load model for evaluation

Jit compilation method loads 'on the fly' the C++ extentions of the approximate multipliers. Then the pytorch model is loaded

In [None]:
model = squeezenet1_1(pretrained=True, axx_mult=axx_mult)
model.eval() # for evaluation

## Load dataset

Set your path for the ImageNet validation dataset

In [None]:
valdir = os.path.join('datasets/imagenet_data/val')

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])


data = datasets.ImageFolder(valdir, transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize,
    ]))

val_loader = torch.utils.data.DataLoader(
    data,
    batch_size=64, shuffle=False,
    num_workers=0, pin_memory=False)

    
# sub_val_loader is used for calibration purposes and is a subset of val-dataset
# you can set it to point to the train-dataset for more proper re-training
evens = list(range(0, len(data), 10))
subset = torch.utils.data.Subset(data, evens)
sub_val_loader = torch.utils.data.DataLoader(
    subset,
    batch_size=64, shuffle=False,
    num_workers=0, pin_memory=False)

## Create validation function


In [None]:
def validate(val_loader, model, criterion, print_freq=100):
    with torch.no_grad():
        batch_time = AverageMeter()
        losses = AverageMeter()
        top1 = AverageMeter()
        top5 = AverageMeter()

        # switch to evaluate mode
        model.eval()
        print("starting evaluation...")

        for i, (input, target) in tqdm(enumerate(val_loader), total=len(val_loader)):
            target = target.cpu()
            input = input.cpu()

            # compute output
            output = model(input)
            loss = criterion(output, target)

            # measure accuracy and record loss
            prec1, prec5 = accuracy(output.data, target.data, topk=(1, 5))
            losses.update(loss.data.item(), input.size(0))
            top1.update(prec1.item(), input.size(0))
            top5.update(prec5.item(), input.size(0))

            #print("iteration ", print_freq)
            if i % print_freq == 0:
                print('Test: [{0}/{1}]\t'
                      'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                      'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                      'Acc@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                      'Acc@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                       i, len(val_loader), batch_time=batch_time, loss=losses,
                       top1=top1, top5=top5))

        print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
              .format(top1=top1, top5=top5))

        return top1.avg, top5.avg
  

class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count  
 

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].reshape(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res    

criterion = torch.nn.CrossEntropyLoss().cpu()

## Run model calibration for quantization

Calibrates the quantization parameters 

Need to re-run it each time the model changes

In [None]:
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import calib
    
def collect_stats(model, data_loader, num_batches):
     """Feed data to the network and collect statistic"""

     # Enable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.disable_quant()
                 module.enable_calib()
             else:
                 module.disable()

     for i, (image, _) in tqdm(enumerate(data_loader), total=num_batches):
         model(image.cpu())
         if i >= num_batches:
             break

     # Disable calibrators
     for name, module in model.named_modules():
         if isinstance(module, quant_nn.TensorQuantizer):
             if module._calibrator is not None:
                 module.enable_quant()
                 module.disable_calib()
             else:
                 module.enable()

def compute_amax(model, **kwargs):
 # Load calib result
 for name, module in model.named_modules():
     if isinstance(module, quant_nn.TensorQuantizer):
         if module._calibrator is not None:
             if isinstance(module._calibrator, calib.MaxCalibrator):
                 module.load_calib_amax()
             else:
                #use strict=False because inception model throws error
                 module.load_calib_amax(strict=False, **kwargs)
         print(F"{name:40}: {module}")
 model.cpu()

# It is a bit slow since we collect histograms on CPU
with torch.no_grad():
    stats = collect_stats(model, val_loader, num_batches=2)
    amax = compute_amax(model, method="percentile", percentile=99.99)


## Run model evaluation

Tip: observe how the execution becomes faster and faster with each batch as the CPU achieves better cache re-use on the LUT table

In [None]:
validate(val_loader, model, criterion, print_freq=100)

## Run approximate-aware re-training


In [None]:
from adapt.references.classification.train import evaluate, train_one_epoch, load_data

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

# finetune the model for one epoch based on subset  
train_one_epoch(model, criterion, optimizer, sub_val_loader, "cpu", 0, 1)

## Rerun model evaluation


In [None]:
validate(val_loader, model, criterion, print_freq=100)