(a.k.a. Path-Integrated Gradients, a.k.a. Axiomatic Attribution for Deep Networks)
Contact: integrated-gradients AT gmail.com
Contributors (alphabetical, last name):
- Kedar Dhamdhere (Google)
- Pramod Kaushik Mudrakarta (U. Chicago)
- Mukund Sundararajan (Google)
- Ankur Taly (Google Brain)
- Jinhua (Shawn) Xu (Verily)
We study the problem of attributing the prediction of a deep network to its input features, as an attempt towards explaining individual predictions. For instance, in an object recognition network, an attribution method could tell us which pixels of the image were responsible for a certain label being picked, or which words from sentence were indicative of strong sentiment.
Applications range from helping a developer debug, allowing analysts to explore the logic of a network, and to give end-user’s some transparency into the reason for a network’s prediction.
Integrated Gradients is a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).
Relevant papers and slide decks
Axiomatic Attribution for Deep Networks -- Mukund Sundararajan, Ankur Taly, Qiqi Yan, Proceedings of International Conference on Machine Learning (ICML), 2017
This paper introduced the Integrated Gradients method. It presents an axiomatic justification of the method along with applications to various deep networks. Slide deck
Did the model understand the questions? -- Pramod Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere, Proceedings of Association of Computational Linguistics (ACL), 2018
This paper discusses an application of integrated gradients for evaluating the robustness of question-answering networks. Slide deck
Implementing Integrated Gradients
This How-To document describes the steps involved in implementing integrated gradients for an arbitrary deep network.
This repository provideds code for implementing integrated gradients for networks with image inputs. It is structured as follows:
- Integrated Gradients library: Library implementing the core integrated gradients algorithm.
- Visualization library: Library implementing methods for visualizing atributions for image models.
- Inception notebook: A Jupyter notebook for generating and visualizing atributions for the Inception (v1) object recognition network.
We recommend starting with the notebook. To run the notebook, please follow the following instructions.
Clone this repository
git clone https://github.com/ankurtaly/Attributions
In the same directory, run the Jupyter notebook server.
attributions.ipynband run all cells.