# Vectorize Activations

<br/>

<pre>
model name:            imagenette_128_resnet18_model.pth
network architecture:  resnet18
dataset:               imagenette training and calibration set
image size:            128x128 (resized beforehand)
</pre>

<br/>

We want to test our Out-of-Distribution (OoD) detection method __Layer-wise Activation Cluster Analysis (LACA)__ on a dataset that is more complex than the MNIST, SVHN or the CIFAR-10 dataset which have been used so far. We chose the [Imagenette dataset](https://github.com/fastai/imagenette) as it contains images showing more complex scenes. The [Imagenette dataset](https://github.com/fastai/imagenette) is a subset of 10 classes of the [ImageNet dataset](https://www.image-net.org/). 

The first step of our OoD detection method is executed before inference. Here we measure in-distribution statistics from the training data and OoD statistics from the calibration data. Both kind of statistics are necessary to calculate the credibility of a test sample at inference. 

After fetching the activations from the data samples (see __01_fetch_activations_imagenette_128_resnet18.ipynb__) we vectorize the activations. Activations from convolutional layers are cube-shaped. To vectorize these activations we simply flatten them. Activations from linear layers are vectors already. Here we do not need to do anything. [Papernot and McDaniel](https://github.com/cleverhans-lab/cleverhans/blob/master/cleverhans_v3.1.0/cleverhans/model_zoo/deep_k_nearest_neighbors/dknn.py#L544) vectorized the activations in the same way.

<br/>

_Sources:_
* [Imagenette dataset](https://github.com/fastai/imagenette)
* [Deep kNN paper](https://arxiv.org/abs/1803.04765)
* [Deep kNN sample code](https://github.com/cleverhans-lab/cleverhans/blob/master/cleverhans_v3.1.0/cleverhans/model_zoo/deep_k_nearest_neighbors/dknn.py)
* [Deep kNN sample code (PyTorch)](https://github.com/bam098/deep_knn/blob/master/dknn_mnist.ipynb)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.functional import adaptive_avg_pool2d
from torch.utils.data import DataLoader
import torchvision
from torchvision import transforms, models, datasets
import sklearn
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn import metrics
import skimage
from skimage.measure import block_reduce
from umap import UMAP
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import pickle
import numpy as np
import platform
from pathlib import Path
import random
import warnings
import pprint
from collections import Counter

sns.set()
sns.set_context("notebook", font_scale=1.1)
sns.set_style("ticks")

print('python version:      {}'.format(platform.python_version()))
print('torch version:       {}'.format(torch.__version__))
print('torchvision version: {}'.format(torchvision.__version__))
print('sklearn version:     {}'.format(sklearn.__version__))
print('skimage version:     {}'.format(skimage.__version__))
print('numpy version:       {}'.format(np.__version__))
print('matplotlib version:  {}'.format(matplotlib.__version__))
print('seaborn version:     {}'.format(sns.__version__))
print('pandas version:      {}'.format(pd.__version__))
print('pickle version:      {}'.format(pickle.format_version))

use_cuda = torch.cuda.is_available()
print('CUDA available:      {}'.format(use_cuda))
print('cuDNN enabled:       {}'.format(torch.backends.cudnn.enabled))
print('num gpus:            {}'.format(torch.cuda.device_count()))

if use_cuda:
    print('gpu:                 {}'.format(torch.cuda.get_device_name(0)))

    print()
    print('------------------------- CUDA -------------------------')
    ! nvcc --version

python version:      3.6.9
torch version:       1.7.0
torchvision version: 0.8.1
sklearn version:     0.23.2
skimage version:     0.17.2
numpy version:       1.19.5
matplotlib version:  3.2.2
seaborn version:     0.11.0
pandas version:      1.1.4
pickle version:      4.0
CUDA available:      False
cuDNN enabled:       True
num gpus:            0


We set the seed values to obtain reproducible results. For more information how to set seed values in Python and Pytorch see the [Pytorch documentation](https://pytorch.org/docs/1.7.0/notes/randomness.html?highlight=repro).

In [2]:
seed = 0
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)

torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.set_deterministic(True)

## Parameters

In [3]:
# Activations
img_size          = 128                                                             # Image size
base_act_folder   = Path('/Users/lehmann/research/laca3/activations/imagenette')    # Base activations folder
afname_string     = 'imagenette_{}_resnet18_acts'.format(img_size)                  # Activations file name
acts_path         = base_act_folder/afname_string                                   # Activations path
layer_names       = [                                                               # List of layer names 
    'relu',
    'maxpool',
    'layer1-0',
    'layer1-1',
    'layer2-0',
    'layer2-1',
    'layer3-0',
    'layer3-1',
    'layer4-0',
    'layer4-1',
    'avgpool'
]

## Define Function for Vectorizing Activations

In [4]:
def transform_acts2vec(dataset_name, pool=None):
    
    for layer_name in layer_names:
        print('## Transforming Activations to Vector Form for Layer {}'.format(layer_name))
        
        # Load activations
        fname = str(acts_path) + '_{}_{}.pkl'.format(dataset_name, layer_name)
        with open(fname, 'rb') as pickle_file:
            loaded_activations = pickle.load(pickle_file)
        
        # Transform activations
        layer_activations = loaded_activations['activations']
        layer_targets = loaded_activations['targets']
        
        layer_activation_vectors = []
        
        for i in range(layer_activations.shape[0]):
            sample_activation_vector = transform_acts2vec_from_sample(layer_activations[i], pool)            
            layer_activation_vectors.append(sample_activation_vector)
        
        layer_activation_vectors = np.array(layer_activation_vectors) 
        print('- activations transformed: {}'.format(layer_activation_vectors.shape))
        
        # Save activations vectors
        activation_vectors = {}
        activation_vectors['activations'] = layer_activation_vectors
        activation_vectors['targets'] = layer_targets
        
        fname = str(acts_path) + '_{}_{}_vectors.pkl'.format(dataset_name, layer_name)
        with open(fname, 'wb') as pickle_file:
            pickle.dump(activation_vectors, pickle_file, protocol=4)

        print("done!")        
        print()
    
def transform_acts2vec_from_sample(sample_activations, pool=(2,2)):
    if len(sample_activations.shape) == 3:
        if pool is None:
            sample_activations = sample_activations.flatten()
        else:
            sample_activations = np.array([
                block_reduce(act_map, pool, np.mean) for act_map in sample_activations
            ]).flatten()
                        
    return sample_activations

## Vectorize Activations from Training Set

In [5]:
trainset_name = 'trainset'
transform_acts2vec(trainset_name)

## Transforming Activations to Vector Form for Layer relu
- activations transformed: (9469, 262144)
done!

## Transforming Activations to Vector Form for Layer maxpool
- activations transformed: (9469, 65536)
done!

## Transforming Activations to Vector Form for Layer layer1-0
- activations transformed: (9469, 65536)
done!

## Transforming Activations to Vector Form for Layer layer1-1
- activations transformed: (9469, 65536)
done!

## Transforming Activations to Vector Form for Layer layer2-0
- activations transformed: (9469, 32768)
done!

## Transforming Activations to Vector Form for Layer layer2-1
- activations transformed: (9469, 32768)
done!

## Transforming Activations to Vector Form for Layer layer3-0
- activations transformed: (9469, 16384)
done!

## Transforming Activations to Vector Form for Layer layer3-1
- activations transformed: (9469, 16384)
done!

## Transforming Activations to Vector Form for Layer layer4-0
- activations transformed: (9469, 8192)
done!

## Transforming

In [6]:
for layer_name in layer_names:
    fname = str(acts_path) + '_{}_{}_vectors.pkl'.format(trainset_name, layer_name)
    with open(fname, 'rb') as pickle_file:
        loaded_activations = pickle.load(pickle_file)
    
    print('## layer {}'.format(layer_name))
    print('activations: {}, targets: {}'.format(
        loaded_activations['activations'].shape, loaded_activations['targets'].shape
    ))
    print()

## layer relu
activations: (9469, 262144), targets: (9469,)

## layer maxpool
activations: (9469, 65536), targets: (9469,)

## layer layer1-0
activations: (9469, 65536), targets: (9469,)

## layer layer1-1
activations: (9469, 65536), targets: (9469,)

## layer layer2-0
activations: (9469, 32768), targets: (9469,)

## layer layer2-1
activations: (9469, 32768), targets: (9469,)

## layer layer3-0
activations: (9469, 16384), targets: (9469,)

## layer layer3-1
activations: (9469, 16384), targets: (9469,)

## layer layer4-0
activations: (9469, 8192), targets: (9469,)

## layer layer4-1
activations: (9469, 8192), targets: (9469,)

## layer avgpool
activations: (9469, 512), targets: (9469,)



## Vectorize Activations from Calibration Set

In [7]:
calibset_name = 'calibset'
transform_acts2vec(calibset_name)

## Transforming Activations to Vector Form for Layer relu
- activations transformed: (750, 262144)
done!

## Transforming Activations to Vector Form for Layer maxpool
- activations transformed: (750, 65536)
done!

## Transforming Activations to Vector Form for Layer layer1-0
- activations transformed: (750, 65536)
done!

## Transforming Activations to Vector Form for Layer layer1-1
- activations transformed: (750, 65536)
done!

## Transforming Activations to Vector Form for Layer layer2-0
- activations transformed: (750, 32768)
done!

## Transforming Activations to Vector Form for Layer layer2-1
- activations transformed: (750, 32768)
done!

## Transforming Activations to Vector Form for Layer layer3-0
- activations transformed: (750, 16384)
done!

## Transforming Activations to Vector Form for Layer layer3-1
- activations transformed: (750, 16384)
done!

## Transforming Activations to Vector Form for Layer layer4-0
- activations transformed: (750, 8192)
done!

## Transforming Activati

In [8]:
for layer_name in layer_names:
    fname = str(acts_path) + '_{}_{}_vectors.pkl'.format(calibset_name, layer_name)
    with open(fname, 'rb') as pickle_file:
        loaded_activations = pickle.load(pickle_file)
    
    print('## layer {}'.format(layer_name))
    print('activations: {}, targets: {}'.format(
        loaded_activations['activations'].shape, loaded_activations['targets'].shape
    ))
    print()

## layer relu
activations: (750, 262144), targets: (750,)

## layer maxpool
activations: (750, 65536), targets: (750,)

## layer layer1-0
activations: (750, 65536), targets: (750,)

## layer layer1-1
activations: (750, 65536), targets: (750,)

## layer layer2-0
activations: (750, 32768), targets: (750,)

## layer layer2-1
activations: (750, 32768), targets: (750,)

## layer layer3-0
activations: (750, 16384), targets: (750,)

## layer layer3-1
activations: (750, 16384), targets: (750,)

## layer layer4-0
activations: (750, 8192), targets: (750,)

## layer layer4-1
activations: (750, 8192), targets: (750,)

## layer avgpool
activations: (750, 512), targets: (750,)

