## Motivation for Signal Estimation and Attribution

Some of the most accurate predictive models today are black-box models. This means that it is really hard to understand how they work. Once we use UntangleAI's uncertainty estimation, we would like to know why the model's decision is uncertain regarding those test points. Some relevant questions one could have at this point are
- What are the salient features in the input that model is relying upon, to make its decision?
- For a given prediction, how important are each values to that prediction?
- What are the relative importance given to these features by the model to frame its decision?
- How robust is the model for adversarial attacks? Is it possible to change input slightly to get drastically different model decisions?
As part of UntangleAI Signal Estimation service, we provide these insights about the model and a way to visualize the decision process of the model for computer vision applications. This service tries to separate signal component from the noise in the training data and learns for each class what signal/features model is learning and what else it discards as noise

## Step 0 Training a CNN for recognizing MNIST dataset

This step is optional. If you would like to train a CNN network to recognize MNIST dataset you can refer to [this tutorial](/tutorials/mnist_model_training) which trains a model for 10 epochs and saves the trained weights into lenet_mnist_model.h5.

Or you can download the trained weights from [here](https://untanglemodels.s3.amazonaws.com/lenet_mnist_model.h5).

## Step 1 - Signal Estimation for each class

During this phase, we take the trained model and training data set batched by class and estimate signals that model has learned during its training phase for each class and generate signal estimation statistics for each class and save them 

In [None]:
# Required imports

import os
import torch
import torch.nn as nn
from tqdm import tqdm
torch.set_printoptions(precision=8)
from untangle import UntangleAI

torch.manual_seed(42)
torch.cuda.manual_seed(42)
torch.backends.cudnn.deterministic=True

Load the model from the trained or downloaded checkpoint file.

In [None]:
#Use the same model used for training
class LeNet(nn.Module):
    # TODO: This isn't really a LeNet, but we implement this to be
    #  consistent with the Evidential Deep Learning paper
    def __init__(self):
        super(LeNet, self).__init__()
        self.model = None
        lenet_conv = []
        lenet_conv += [torch.nn.Conv2d(1,20, kernel_size=(5,5))]
        lenet_conv += [torch.nn.ReLU(inplace=True)]
        lenet_conv += [torch.nn.MaxPool2d(kernel_size=(2,2), stride=2)]
        lenet_conv += [torch.nn.Conv2d(20, 50, kernel_size=(5,5))]
        lenet_conv += [torch.nn.ReLU(inplace=True)]
        lenet_conv += [torch.nn.MaxPool2d(kernel_size=(2,2), stride=2)]

        lenet_dense = []
        lenet_dense += [torch.nn.Linear(4*4*50, 500)]
        lenet_dense += [torch.nn.ReLU(inplace=True)]
        lenet_dense += [torch.nn.Linear(500, 10)]

        self.features = torch.nn.Sequential(*lenet_conv)
        self.classifier = torch.nn.Sequential(*lenet_dense)

    def forward(self, input):
        output = self.features(input)
        output = output.view(input.shape[0], -1)
        output = self.classifier(output)
        return(output)
    
model_ckpt_path = 'lenet_mnist_model.h5'
model = LeNet()
if (torch.cuda.is_available()):
    ckpt = torch.load(model_ckpt_path)
    model.load_state_dict(ckpt)
    model = model.cuda()
else:
    ckpt = torch.load(model_ckpt_path, map_location='cpu')
    model.load_state_dict(ckpt)

model.eval()

Let us define argruments needed for signal estimation.

In [None]:
class SignalEstimationArgs:
    mname = 'lenet'
    batch_size = 16
    num_classes = 10
    img_size = (1,28,28)
    input_tensor = torch.randn(1,1,28,28) # provide your own input tensor
    input_tensor_true = torch.randn(28,28,1) # provide your own true input tensor / ndarray / PIL Image Obj
    data_class = None # or `None` to estimate all classes
    mode = 'estimate' # one of `estimate`, `attribute`
    topk = 1
    cmap = 'seismic'
    json = False
    hm_diff = 'joint'
    
args = SignalEstimationArgs()

Create required directories to store estimated signal statistics which will be used later to attribute signals for a given test point.

In [None]:
module_path = os.path.dirname(os.path.realpath('.'))
proj_path = os.path.abspath(os.path.join(module_path, os.pardir))
model_signal_data_path = os.path.join(module_path, 'model_signal_data/')
results_path = os.path.join(module_path, 'results')
if(not os.path.exists(model_signal_data_path)):
    os.makedirs(model_signal_data_path)
if(not os.path.exists(results_path)):
    os.makedirs(results_path)
signal_store_path = os.path.join(model_signal_data_path, '{}_signals'.format(args.mname))


# Create untangle object
untangle_ai = UntangleAI()

Call untangle API (estimate_signals) to learn and store signal estimation statistics. Provide a data loader which loads the training dataset class by class. For MNIST we have provided an API for the same, which is load_mnist_per_class

In [None]:
def train_loader_fun(class_i):
    loader, _ = untangle_ai.load_mnist_per_class(batch_size=args.batch_size, data_class=class_i)
    return(loader)

untangle_ai.estimate_signals(model, signal_store_path, train_loader_fun, args = args)

## Step 2 - Attributing signals for a test point
Now we use the signal statistics modeled in Step 1 to attribute and visualize the signals for an input test point.
We use untangle_ai API (attribute_signals) to get signal information and visualization concerning top k class of model prediction. We also generate a joint heatmap and differential heatmap images for the top k classes as part of this API.
This API works on individual data points one at a time. It takes input_tensor and image_tensor as (Height, Width, Channel) as its shape as inputs.

Let us try to visualize signal/feature considered important for a random data point. We visualize heatmap, differential heat map and inverse differential heatmap for the top 3 classes predicted by model.

In [None]:
from scipy.misc import imread
import matplotlib.pyplot as plt
import random

keys = [str(item) for item in range(args.num_classes)]
ID2Name_Map = dict(zip(keys, keys))
rand_class = random.randint(0, 9)
print('Visualizing signal for a data point in class {}'.format(rand_class))
rand_class_loader = train_loader_fun(rand_class)
for input_tensor, _ in rand_class_loader:
    idx = random.randint(0, input_tensor.shape[0]-1)
    for topk in range(1, 2): # Heatmap showing signals in favour of top prediction
        args.topk = topk   
        input_tensor_single = input_tensor[idx][None, :, :, :]
        input_tensor_true = input_tensor[idx].permute(1, 2, 0) # expected shape (H,W,C)
        out_prefix = os.path.join(results_path, '{}_signals'.format(idx))
        untangle_ai.attribute_signals(model, input_tensor_single, input_tensor_true, signal_store_path,
            ID2Name_Map, args, out_prefix)
        for i in range(topk):
            img = imread(out_prefix + '_class_{}.JPEG'.format(i))
            print('Visualizing {}'.format(out_prefix + '_class_{}.JPEG'.format(i)))
            plt.imshow(img)
            plt.show()
    break


Fundamentally, what we provide is a gradient-based technique. This implies that it will work only with machine learning models which are differentiable (neural networks, logistic regression, and SVMs). The raw feature importance which we compute is comprised of two types of gradients:
- Positive Gradients (Red-colored pixels with intensity indicative of positive importance for seismic colormap) Positive gradients indicate which pixels in an image are indicative of evidence in favor of the network's decision/output.
- Negative Gradients (Blue colored pixels with intensity indicative of negative importance for seismic colormap) Negative gradients indicate which pixels in an image are indicative of evidence against the network's decision/output.
When equal number pixels in an area are comprised of blue and red, then they meaningfully cancel out which should indicate no importance in that entire area.
### Expectation:
Well trained networks will have feature attributions mostly comprised of positive gradients alone. This means that the network is solely relying on the presence of features to make a prediction.

In [None]:
for topk in range(2, 4): # Joint and Differential Heatmaps comparing top class with other k-1 top predictions.
    args.topk = topk   
    out_prefix = os.path.join(results_path, '{}_signals'.format(idx))
    untangle_ai.attribute_signals(model, input_tensor_single, input_tensor_true, signal_store_path,
            ID2Name_Map, args, out_prefix)
    for i in range(1, topk):
        img = imread(out_prefix + '_diff_class_{}.JPEG'.format(i))
        print('Visualizing {}'.format(out_prefix + '_diff_class_{}.JPEG'.format(i)))
        plt.imshow(img)
        plt.show()
        img = imread(out_prefix + '_invDiff_class_{}.JPEG'.format(i))
        print('Visualizing {}'.format(out_prefix + '_invDiff_class_{}.JPEG'.format(i)))
        plt.imshow(img)
        plt.show()

### Comparative Heatmaps


- Joint Heatmap

If you want to look at evidence for or against a prediction towards a single output node, then a simple heatmap is sufficient. However, when there are multiple classes, these heatmaps need to be compared correctly. This can be challenging because the gradients which are returned at the end of any gradient-based technique are within different orders of magnitude.

We provide a modified heatmap function in case you want to compare the attributions of two or more classes which takes into account the relative magnitudes of the gradients of both the classes and applies the heatmap function appropriately to bring all of the values onto the same scale. This is what is described by the joint heatmap terminology. 

This is the reason why we suggest using top k > 1 for all use cases since this provides us a clearer picture of why class X was chosen and not class Y.

- Difference Heatmaps

Since with the concept of joint heatmaps, the gradients obtained for the two classes are identical, you can treat them as two 2-D vectors.

Difference heatmaps tell you which pixels of class X are important only for class X and not for class Y. (Given logit of X > logit of Y)

Similarly, in the reverse direction, inverse difference heatmaps tell you which pixels of class Y are important only for class Y and not for class X.

Therefore, the difference heatmaps and the inverse difference heatmaps combined tell a larger story of what are the differentiating pixels/feature maps(if any) for the classifier to differentiate between two classes or does it only care about specific pixel intensities to differentiate between the two classes.

In [None]:
from untangle.external.signal_api import _attribute
signal = _attribute(model, input_tensor_single, signal_store_path, rand_class, args)
signal = signal.permute(0, 2, 3, 1)
signal = signal.numpy()
heatmap_signal = untangle_ai.get_heatmap(signal)
def slider_tool(th):
    fig,axes = plt.subplots(1,1,figsize = (8,8))

    analysis = [heatmap_signal<th]*heatmap_signal

    axes.set_title('Attribution')
    axes.imshow(analysis[0][0])

In [None]:
from ipywidgets import interact
interact(slider_tool, th = (0.0, 0.99))

### Slider tool to highlight pixel importance

Since the output of our heatmap function returns a heatmap image, it is sometimes hard to differentiate between the highly important pixels and low importance pixels, especially if high importance pixels cluster around a low importance pixels or vice versa. Slider tools provide a way to find the most important pixels by getting rid of the lowest importance pixels and retaining the highest importance pixels for threshold in the range (0.0, 0.99). Any modification to these high importance features should result in a drastic drop of prediction probability.