# Activation Histograms

This notebook shows an example of how to generate activation histograms for a specific model and dataset.

## But I Already Know How To Generate Histograms...

If you already generated histograms using Distiller outside this notebook, you can still use it to visualize the data:
* To load the raw data saved by Distiller and visualize it, go to [this section](#Plot-Histograms)
* If enabled saving histogram images and want to view them, go to [this section](#Load-Histogram-Images-from-Disk)

In [None]:
import torch
import matplotlib.pyplot as plt
import os
import math
import torchnet as tnt
from ipywidgets import widgets, interact

import distiller
from distiller.models import create_model

device = torch.device('cuda')
# device = torch.device('cpu')

# Load some common code and configure logging
# We do this so we can see the logging output coming from
# Distiller function calls
%run './distiller_jupyter_helpers.ipynb'
msglogger = config_notebooks_logger()

## Load Your Model

For this example we'll use a pre-trained image classification model.

### Note on Parallelism

Currently, Distiller's implementation of activations histograms collection does not accept models which contain [`DataParallel`](https://pytorch.org/docs/stable/nn.html?highlight=dataparallel#torch.nn.DataParallel) modules. So here we create the model without parallelism to begin with. If you have a model which includes `DataParallel` modules (for example, if loaded from a checkpoint), use the following utlity function to convert the model to serialized execution:
```python
model = distiller.utils.make_non_parallel_copy(model)
```

In [None]:
model = create_model(pretrained=True, dataset='imagenet', arch='resnet18', parallel=False)
model = model.to(device)  # Comment out if not applicable

## Prepare Data

Usually it is not required to collect histograms based on the entire dataset, and only a representative subset is used (that also helps reduce the runtime).
* **Subset size:** There is no golden rule for selecting the size of the subset. Anywhere between 1-10% of the validation/test set should work.
* **Representative data:** Whatever size is chosen, it is important to make sure that the subset is selected in a way that covers as much of the distribution of the data as possible. So, for example, if the dataset is organized by classes by default, we should make sure to select items randomly and not in order.

**Note:** Working on only a subset of the data can be taken care of at data preparation time, or it can be delayed to the actual model evaluation function (for example, executing only a specific number of mini-batches). In this example we take care of it during data preparation.

In [None]:
# We use Distiller's built-in data loading functionality for ImageNet,
# which takes care of randomizing the data before selecting the subset.
# While it creates train, validation and test data loaders, we're only
# interested in the test dataset in this example.
#
# Subset size: Here we'll go with 1% of the test set, mostly for the
# sake of speed. We control this with the 'effective_test_size' argument.
#
# We set the 'fixed_subset' argument to make sure we're using the
# same subset for both phases of histogram collection - more on that below

dataset = 'imagenet'
dataset_path = '~/datasets/imagenet'
batch_size = 256
num_workers = 10
subset_size = 0.01

_, _, test_loader, _ = distiller.apputils.load_data(
    dataset, os.path.expanduser(dataset_path), batch_size, num_workers,
    effective_test_size=subset_size, fixed_subset=True)

## Define the Model Evaluation Function

We define a fairly bare-bones evaluation function. Recording the loss and accuracy isn't strictly necessary for histogram collection. We record them nonetheless, so we can verify the data subset being used achieves results that are on par from what we'd expect from a representative subset.

In [None]:
def eval_model(data_loader, model):
    print('Evaluating model')
    criterion = torch.nn.CrossEntropyLoss().to(device)
    
    loss = tnt.meter.AverageValueMeter()
    classerr = tnt.meter.ClassErrorMeter(accuracy=True, topk=(1, 5))

    total_samples = len(data_loader.sampler)
    batch_size = data_loader.batch_size
    total_steps = math.ceil(total_samples / batch_size)
    print('{0} samples ({1} per mini-batch)'.format(total_samples, batch_size))

    # Switch to evaluation mode
    model.eval()

    for step, (inputs, target) in enumerate(data_loader):
        print('[{:3d}/{:3d}] ... '.format(step + 1, total_steps), end='', flush=True)
        with torch.no_grad():
            inputs, target = inputs.to(device), target.to(device)
            # compute output from model
            output = model(inputs)

            # compute loss and measure accuracy
            loss.add(criterion(output, target).item())
            classerr.add(output.data, target)
            
            print('Top1: {:.3f}  Top5: {:.3f}  Loss: {:.3f}'.format(
                classerr.value(1), classerr.value(5), loss.mean), flush=True)
    print('----------')
    print('Overall ==> Top1: {:.3f}  Top5: {:.3f}  Loss: {:.3f}'.format(
        classerr.value(1), classerr.value(5), loss.mean), flush=True)

## Collect Histograms

Histogram collection is implemented using Distiller's "Collector" mechanism, specifically in the `ActivationHistogramsCollector` class. It is stats-based, meaning it requires pre-computed min/max values per-tensor to be provided.

The min/max stats are expected as a dictionary with the following structure:
```YAML
'layer_name':
    'inputs':
        0:
            'min': value
            'max': value
        ...
        n:
            'min': value
            'max': value
    'output':
        'min': value
        'max': value
```
Where n is the number of inputs the layer has. The `QuantCalibrationStatsCollector` collector class generates stats in the required format.

To streamline this process, a utility function is provided: `distiller.data_loggers.collect_histograms`. Given a model and a test function, it will perform the required stats collection followed by histograms collection. If the user has already computed min/max stats beforehand, those can provided as a dict or as a path to a YAML file (as saved by `QuantCalibrationStatsCollector`). In that case, the stats collection pass will be skipped.

### Dataset Perparation in Context of Stats-Based Histograms

If the data used for min/max stats collection is not the same as the data used for histogram collection, it is highly likely that when collecting histograms some values will fall outside the pre-calculated min/max range. When that happens, the value is **clamped**. Assuming the subsets of data used in both cases are representative enough, this shouldn't have a major effect on the results.

One can choose to avoid this issue by making sure we use the same subset of data in both passes. How to make sure of that will, of course, differ from one use case to another. In this example we do this by using the enabling `fixed_subset` flag when calling `load_data` above.

In [None]:
# The test function passed to 'collect_histograms' must have an 
# argument named 'model' which accepts the model for which histograms
# are to be collected. 'collect_histograms' will not set any other
# arguments.
# We'll use Python's 'partial' to handle the set the rest of the
# arguments for the test function before calling 'collect_histograms'
from functools import partial
test_fn = partial(eval_model, data_loader=test_loader)

# Histogram collection parameters

# 'save_dir': Pass a valid directory path to have the histogram
#   data saved to disk. Pass None to disable saving.
# 'save_hist_imgs': If save_dir is not None, toggles whether to save
#   histogram images in addition to the raw data
# 'hist_imgs_ext': Controls the filetype for histogram images
save_dir = '.'
save_hist_imgs = True
hist_imgs_ext = '.png'

# 'activation_stats': Here we pass None so a stats collection pass
#   is performed.
activation_stats = None

# 'classes': To speed-up the calculation here we use the 'classes'
#   argument so that stats and histograms are collected only for 
#   ReLU layers in the model. Pass None to collect for all layers.
classes = [torch.nn.ReLU]

# 'nbins': Number of histogram bins to use.
nbins = 2048

hist_dict = distiller.data_loggers.collect_histograms(
    model, test_fn, save_dir=save_dir, activation_stats=activation_stats,
    classes=classes, nbins=nbins, save_hist_imgs=save_hist_imgs, hist_imgs_ext=hist_imgs_ext)

## Plot Histograms

The generated dictionary has the following structure (very similar to the structure of the min/max stats dictionary described above):
```yaml
'layer_name':
    'inputs':
        0:
            'hist': tensor             # Tensor with bin counts
            'bin_centroids': tensor    # Tensor with activation values corresponding to center of each bin
        ...
        n:
            'hist': tensor
            'bin_centroids': tensor
    'output':
        'hist': tensor
        'bin_centroids': tensor
```

In [None]:
# Uncomment this line to load saved output from a previous histogram collection run
# hist_dict = torch.load('acts_histograms.pt')

plt.style.use('seaborn') # pretty matplotlib plots

def draw_hist(layer_name, tensor_name, bin_counts, bin_centroids, normed=True, yscale='linear'):
    if normed:
        bin_counts = bin_counts / bin_counts.sum()
    plt.figure(figsize=(12, 6))
    plt.title('\n'.join((layer_name, tensor_name)), fontsize=16)
    plt.fill_between(bin_centroids, bin_counts, step='mid', antialiased=False)
    if yscale == 'linear':
        plt.ylim(bottom=0)
    plt.yscale(yscale)
    plt.xlabel('Activation Value')
    plt.ylabel('Normalized Count')
    plt.show()

@interact(layer_name=hist_dict.keys(),
          normalize_bin_counts=True,
          y_axis_scale=['linear', 'log'])
def draw_layer(layer_name, normalize_bin_counts, y_axis_scale):
    print('\nSelected layer: ' + layer_name)
    data = hist_dict[layer_name]
    for idx, od in data['inputs'].items():
        draw_hist(layer_name, 'input_{}'.format(idx), od['hist'], od['bin_centroids'],
                  normed=normalize_bin_counts, yscale=y_axis_scale)
    od = data['output']
    draw_hist(layer_name, 'output', od['hist'], od['bin_centroids'],
              normed=normalize_bin_counts, yscale=y_axis_scale)

## Load Histogram Images from Disk

If you enabled saving of histogram images above, or have images from a collection executed externally, you can use the code below to display the images.

In [None]:
from IPython.display import Image, SVG, display
import glob
from collections import OrderedDict

# Set the path to the images directory
imgs_dir = 'histogram_imgs'

files = sorted(glob.glob(os.path.join(imgs_dir, '*.*')))
files = [f for f in files if os.path.isfile(f)]
fnames_map = OrderedDict([(os.path.split(f)[1], f) for f in files])

@interact(file_name=fnames_map)
def load_image(file_name):
    if file_name.endswith('.svg'):
        display(SVG(filename=file_name))
    else:
        display(Image(filename=file_name))