### LRP-0 tutorial
Contribution Analysis aims to explain the decision-making progress of a neuron network.

Take the composed function $f$ represents the NN. Given one sample input $X=\{X_i\}_i$, $f$ output the scalar score $f(X)$.

We should decompose the output into the contribution of each input $R(X_i)$.

**LRP** propose the first property, **conservation**, that is:

$$f(X)=\sum_i R(X_i)$$

It means the summation of all contributions $R(X_i)$ equals to the output itself.

Here is an easy example for the conservation.

In [7]:
import torch
X = torch.rand(1,5)
y = X.sum()

Sum of Rx equals to the output y? True


It easy to find the contribution of each X. It is X itself.

In [None]:
Rx = X
print(f'Sum of Rx equals to the output y? {Rx.sum()==y}')

In this example, the contribution is the same as the input itself.

When coming to the complex situation, **LRP-0** gives us a reasonable answer.

LRP-0 applies only for the stacked linear NN. A Linear Network is as:

$$Y^l=W^l*Y^{l-1}+B^l, \quad l=1...L$$

$W^l$ and $B^l$ is the weight and the bias of layer $l$.

$Y^l$ is the output of layer $l$.

Specifically, $Y^0=X$

The l travels from 1 to L that means the input of each layer is the output before.

### scalar linear
First take an easy example of one linear layer with dim of y is one and no bias.

In [None]:
X = torch.randn(3)
W = torch.randn(1,3)
y = W.mm(X)

Suppose Ry = y, then we want to get R(X).

In [None]:
Ry = y

We separate the dimension of X and y by reshaping. dim_y in left side.

In [None]:
X = X.reshape(1,3)
y = y.reshape(1,1)

These gives a simple insight to forward.

In [None]:
print((X*W).sum(dim=1))
print(y)

It easy to find the contribution of each X. It is weighted X.

In [None]:
Rx = X * W
print(f'Sum of Rx equals to the output y? {Rx.sum()==y}')

Ry surely can be any with the shape like y. Then What is R(X)?

here is a general lrp-0

In [None]:
def lrp_0(layer, x, Ry):
    # -- Original Relevance Propagation
    # This demo is to demonstrate how the relevance propagate.
    # If you use this jacobian version, will get "CUDA out of memory" in common CNNs.
    # we move dim of x to the end, y to the start. dim of x&y is seperated.
    x = x.squeeze(0)  # remove batch dim
    Ry = Ry.squeeze(0)
    x_dim_depth = len(x.shape)
    x_empty_dim = (1,) * x_dim_depth
    y = layer.forward(x)
    y_dim_depth = len(y.shape)
    # unnecessary to reshape x, because pytorch ignore the front empty dimension.
    y = y.reshape(y.shape + x_empty_dim)  # y as (y_shape, 1,..,1)
    # we get the jacobian whose dim match x&y
    # on FC layer , you will see jacobian == layer.weight
    g = torch.autograd.functional.jacobian(lambda x: layer.forward(x), x)
    # we use jacobian to approximate the increment of output
    r = safeDivide(Ry * g * x, y)
    Rx = r.sum(list(range(y_dim_depth)))  # sum according y_shape
    return Rx.unsqueeze(0)