# Integradet gradients method

## Overview

Integrated gradients is a method originally proposed by [link] that aims to attribute an importance value to each input feature of a machine learning model based on the gradients of the model's output with respect to the input. 
Roughly speaking, an higher gradient with respect to a given feature indicates more variability of the output when the feature is changed, which means the feature has a higher impact on the model's predictions. 

However this is not always the case: as pointed out in [link], an attribution method should satisfy the sensitivity axiom, i.e. if we consider a beseline input instance $x^\prime$ which differs from the input instance of interest $x$ for the value of one feature $x_i$ and yields different predictions, the attribution given to feature $f_i$ should be non-zero. Since gradients might flatten at the input, simply taking the value of the gradient as an attribution might break the sensitivity axiom. 

Integrated gradients overcome this problem by considering the integral of the gradients taken along a given path from a baseline $x^\prime$ to $x$ and using the value of the path integral as an attribution. It is shown in [link] that such attributions satisfy the sensitivity axiom. 



## Integrated gradients method

Let us consider an input instance $x,$ a beseline instance $x^\prime$ and a model $M: X \rightarrow Y$ which act on the feature space $X$ and produce an output $y$ in the output space $Y.$ The method is valid both for regression and classification models. In case of a non-scalar output, such in classification models or multi target regressions, the gradients shoul be calculated for one given element of the output. For classification models, the gradients usually refers to the output corresponding to the true value of the target.

The attributions $A_i(x)$ for each features $x_i$ are calculated as

$$A_i(x) = (x_i - x_i^\prime) \int_0^1 \frac{\partial M(x^\prime + \alpha (x - x^\prime))}{\partial x_i} d\alpha$$

## Usage

```python 
import tensorflow as tf
from alibi.explainers import IntegratedGradients

model = tf.keras.models.load_model("path_to_your_model")

ig  = IntegratedGradients(model)

explanation = ig.explain(x)
attributions = explanation.data['attributions']
```

## Examples

[Mnist dataset](../examples/integrated_gradients_mnist.nblink)

[Imagenet dataset](../examples/integrated_gradients_imagenet.nblink)

[Imdb dataset text classification](../examples/integrated_gradients_imdb.nblink)