This repository is intended to be a tutorial of various DNN interpretation and explanation techniques. Explanation of the theoretical background as well as step-by-step Tensorflow implementation for practical usage are both covered in the Jupyter Notebooks. I did not include explanation for techniques for which I thought the algorithm as well as the explanation of the original paper was clear.
It seems that Github is unable to render some of the equations in the notebooks. I strongly recommend using the nbviewer until I find out what the problem is (you can also download the repo and view them on your local environment). Links are listed below.
1 Activation Maximization
This section focuses on interpreting a concept learned by a deep neural network (DNN) through activation maximization.
1.1 Activation Maximization (AM)
1.3 Performing AM in Code Space
2 Layer-wise Relevance Propagation
In this section, we first introduce the concept of relevance score with Sensitivity Analysis, explore basic relevance decomposition with Simple Taylor Decomposition and then build up to various Layer-wise Relevance Propagation methods such as Deep Taylor Decomposition and DeepLIFT.
2.1 Sensitivity Analysis
2.2 Simple Taylor Decomposition
2.3 Layer-wise Relevance Propagation
2.4 Deep Taylor Decomposition
3 Gradient Based Methods
Implementation of various types of gradient-based visualization methods such as Deconvolution, Backpropagation, Guided Backpropagation, Integrated Gradients and SmoothGrad. Check out grad.py, a modular implementation of various gradient-based visualization techniques.
3.3 Guided Backpropagation
3.4 Integrated Gradients
4 Class Activation Map
Implementation of Class Activation Map (CAM) and its generalized versions, Grad-CAM and Grad-CAM++ the cluttered MNIST dataset.
4.1 Class Activation Map
5 Quantifying Explanation Quality
While each explanation technique is based on its own intuition or mathematical principle, it is also important to define at a more abstract level what are the characteristics of a good explanation, and to be able to test for these characteristics quantitatively. We present in Sections 5.1 and 5.2 two important properties of an explanation, along with possible evaluation metrics.
5.1 Explanation Continuity
5.2 Explanation Selectivity
Sections 1.1 ~ 2.2 and 5.1 ~ 5.2
 Montavon, G., Samek, W., Müller, K., jun 2017. Methods for Interpreting and Understanding Deep Neural Networks. arXiv preprint arXiv:1706.07979, 2017.
 Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J., 2016. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. pp. 3387-3395.
 A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. In NIPS, 2016.
 Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W., 07 2015. On pixel-wise explanations for non-linear classier decisions by layer-wise relevance propagation. PLOS ONE 10 (7), 1-46.
 Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.R., 2017. Explaining nonlinear classication decisions with deep Taylor decomposition. Pattern Recognition 65, 211-222.
 Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning Important Features Through Propagating Activation Differences. arXiv preprint arXiv:1704.02685, 2017.
 Zeiler, M. D., Fergus, R., 2014. Visualizing and understanding convolutional networks. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I. pp. 818-833.
 K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations, 2014.
 Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
 Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365, 2017.
 Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929, 2016.
 R. R.Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv:1611.01646, 2016.
 A. Chattopadhyay, A. Sarkar, P. Howlader, and V. N. Balasubramanian. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. CoRR, abs/1710.11063, 2017.