# Developing with iNNvestigate

In this notebook we present the API behind **iNNvestigate** and show you how to develop your own analysis algorithms! If you havn't done so you should read the notebook [Analyzing with iNNvestigate](introduction.ipynb) first.

The main functionallity of **iNNvestigate's** API is to abstract the tedious, custom backward propagation of (many) analysis algorithms. This should help researchers to focus on algorithm development. In this notebook we will first introduce the concepts behind the API and then show with various examples how one can realize analysis algorithms with **iNNvestigate**.

## Concepts

### Computational graphs

We assume that the reader is familiar with the main concepts of neural networks like forward and backward propagation.
In many computational frameworks the forward pass along the layers of a neural network is realized with the concept of a computational graph.

Such graph typically contain two types of nodes: tensors and operations. An operation node takes incoming edges from it's input tensors and has outgoing edges towards it's output tensors. Tensors are never directly connected with tensors and operations are never directly connected with operations.

Depending on the granularity one can encode the operations in a neural network in various ways. In this library we follow and make itense use of the conecpt of a ["layer node"](https://keras.io/getting-started/functional-api-guide/#the-concept-of-layer-node) in the Keras documentation. It is important to understand this concept to follow the rest of this notebook!

#### Why Keras?

There are several choices for deep learning frameworks. We chose Keras as base because it offers a high level interface and convention to approach to neural network models. While, for example, in Tensorflow, MxNet, or PyTorch one can implement a ReLU-activation function in many ways, in Keras it is typically realized with the `keras.activations.relu` function. Relying on such conventions and limiting our library to built-in Keras layers allows to offer better usability: the user can simply pass a Keras model and the library will take care of the rest.

### Backward propagation

Many predicition analysis methods are realized by a (custom) backward propagation from the output neuron to the input.
The basic principle behind these methods is the same as for the gradient backward propagation, which uses the chain-rule to propagate gradients backward. This means that in order to compute the gradient (or any such analysis) one creates a computational graph that maps the output back to the input (and any intermediate tensor).

In **iNNvestigate** this backward mapping is done in the background. The developer only needs to specify how a "layer node" should be "reverted" or the backward pass at a node should be done, while the tedious handling of the whole graph is done by the library.

In more detail, to create a custom backward pass one needs to specify:

* how the backpropagation should be initialized.
  * E.g., the gradient backward propagation is initialized by starting with a 1, while Deconvnet starts with the function value at the output neuron.
* how each layer should be reverted.
  * E.g., to compute the gradient one would apply the chain rule.
  
The API will take care of storing intermediate tensors, the backward pass is done in the right order, Keras specifics and more.




## Functionallity

To show you hands-on how to use **iNNvestigate** we now import all needed modules and load a VGG16 model.

Loading vgg16 network and patterns

We want to analyze the pre-softmax activations:

### Bascis

#### Interfaces

The basic, abstract interface of an analyzer is given by:

To implement a custom analyzer one can derive from the `AnalyzerNetworkBase` class and implement the function:

_create_analysis

The function should return for each input of the model an analysis tensor. *The tensors should be created by using Keras layers,* because the analysis tensors will be used to create a Keras model.

This class takes care of selecting the right output neuron.
For more details see the function docstring.

Analyzers that require training should also implement one of the following interface:

#### ReverseAnalyzerBase

The scope of the classes above is merely to provide a common interface. A more advanced and practical starting point for an implementation is given with the class `ReveseAnalyzerBase`.

This class already implements the function `_create_analysis`. In this function a backward pass is prepared and done as described in the section "Backward propagation" above. To use this class one can implement the following two methods:

head_mapping
default_reverse_mapping


Many analysis methods apply different propagation algorithms depending on the layer type. This is reflected by this interface:

`ReverseAnalyzerBase` will first try to match a registered, conditional mapping and, if none applies, fall back to `default_reverse_mapping`.

In the rest of the tutorial we will show how to use this base class.

### Default backward pass

The API will package the created backward pass in a Keras model and therefore each propagation step needs to be created by Keras layers. In this first example we implement the gradient for a model solely composed of dense, linear layers:

This shows to basic idea of the propagation step. Now, to implement a full gradient backward propagation, we will rely on the chain rule and automatic differentiation. This makes the code cleaner and allows to treat different layers with one code fragement, e.g., in this case dense and convolutional layers:

And apply it to our VGG16 model:

In the next section we will extend this example to implement Guided Backprop!

### Conditional changes to the backward pass

Guided Backprop applies the same algorithm as gradient backpropagation, except that the propagation is initialized with the final function value instead of ones and that each time a ReLU-activation is applied also on the backward pass a ReLU-activation is applied. This is done to keep the signal in the original value range of the forward pass:

And we apply it again to our VGG16 model:

### Keras layer nodes

A Keras "layer node" can be applied many times. The above examples revert or back proagate each of these indiviually. In some cases it can be computationally more efficient or even necessary to bundle all the back propagations for a single layer node. This can be done by implementing the interface ``ReverseMappingBase``, which then gets initialized once for each layer node and applied for each backward pass of the layer's applications:

``
class ReverseMappingBase(object):

    def __init__(self, layer, state):
        pass

    def apply(self, Xs, Yx, reversed_Ys, reverse_state):
        raise NotImplementedError()
``

We use this to implement LRP-Z for VGG16 (for more complex networks additional steps need to be done). 

rule for dense and conv layers
rule for activation layers
gradient for max-pooling

## Additional

### Network compatibility

Not all methods work on all network structures. E.g., most methods cannot handle softmax outputs, other work only with ReLU activation functions.

One can check if a passed model does not confirm with the method's assumption by passing a condition and a message like this:

The condition gets tested on each layer of the network and if it holds the either a warning is printed or an exception raised.

The common check for a softmax activation can be done by calling this helper function:

### Debug routines

To ease the debugging the API provides a few helpful flags.

When setting `reverse_verbose` the API will print information about the backward pass:

The flag `reverse_check_min_max_values` triggers the printing of min and max values for each tensor created in the backward pass. Note that the NodeID and TensorID corresponds to the values from the verbose print above:

Similarly the flag `reverse_check_finite` checks if all values in backward pass tensors are finite. If not a message will be printed:

To have full access on the intermediate state of the backward pass one can set the flag `reverse_keep_tensors`, which indicates to store all tensors in a nested dictionaries, where the key for the first level are the NodeIDs and for the second level the TensorIDs: