## Canonical and Related Papers on *Understanding Neural Networks Through Deep Visualization*

| Year | Paper Title | Author(s) | Core Contribution / Research Focus |
|---:|---|---|---|
| 2009 | *Visualizing Higher-Layer Features of a Deep Network* | Dumitru Erhan; Yoshua Bengio; Aaron Courville; Pascal Vincent | Early optimization-based visualization of neuron activations, demonstrating that hidden units encode meaningful structure. |
| 2013 | *Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps* | Karen Simonyan; Andrea Vedaldi; Andrew Zisserman | Gradient-based saliency and sensitivity analysis for understanding CNN predictions. |
| 2013 | *Visualizing and Understanding Convolutional Networks* | Matthew D. Zeiler; Rob Fergus | Introduced deconvolutional networks to visualize filters and activations, enabling systematic CNN interpretability. |
| 2014 | *Understanding Deep Image Representations by Inverting Them* | Aravindh Mahendran; Andrea Vedaldi | Feature inversion techniques to analyze what information representations retain. |
| 2014 | *Deep Neural Networks Are Easily Fooled* | Anh Tuan Nguyen; Jason Yosinski; Jeff Clune | Revealed adversarial and fooling images, highlighting brittleness of learned representations. |
| 2014 | *Object Detectors Emerge in Deep Scene CNNs* | Bolei Zhou et al. | Demonstrated emergent semantic object detectors within CNNs trained for scene recognition. |
| 2015 | *Understanding Neural Networks Through Deep Visualization* | Jason Yosinski et al. | Interactive feature visualization and activation maximization across layers. |
| 2015 | *Convergent Learning: Do Different Neural Networks Learn the Same Representations?* | Yixuan Li et al. | Analyzed representational similarity across independently trained networks. |
| 2015 | *Understanding Intra-Class Knowledge Inside CNN* | Di Wei et al. | Studied class-specific internal representations and intra-class variation. |
| 2015 | *Visualizing Deep Convolutional Neural Networks Using Natural Pre-images* | Aravindh Mahendran; Andrea Vedaldi | Constrained visualization using natural image priors. |
| 2015 | *Discovering Internal Representations from Object-CNNs Using Population Encoding* | Jianyu Wang et al. | Population-level encoding analysis of CNN representations. |
| 2015 | *Robust Convolutional Neural Networks under Adversarial Noise* | Jun Jin et al. | Studied robustness of CNN representations to adversarial perturbations. |
| 2015 | *Foveation-based Mechanisms Alleviate Adversarial Examples* | Yan Luo et al. | Introduced biologically inspired foveation to improve robustness. |
| 2015 | *Evaluating the Visualization of What a Deep Neural Network Has Learned* | Wojciech Samek et al. | Quantitative evaluation framework for visualization and interpretability methods. |
| 2016 | *Multifaceted Feature Visualization* | Anh Tuan Nguyen; Jason Yosinski; Jeff Clune | Revealed that neurons often encode multiple semantic modes. |
| 2016 | *Synthesizing the Preferred Inputs for Neurons via Deep Generator Networks* | Anh Tuan Nguyen et al. | GAN-based feature visualization producing more realistic preferred inputs. |
| 2016 | *A Taxonomy and Library for Visualizing Learned Features in CNNs* | Felix Grün et al. | Systematic categorization and library of CNN visualization techniques. |
| 2016 | *Visualization of Deep Convolutional Neural Networks* | Dingwen Li | Survey of CNN visualization methodologies and challenges. |
| 2016 | *Layer-Wise Relevance Propagation for Deep Neural Network Architectures* | Alexander Binder et al. | Formalized LRP as a principled method for explaining predictions. |
| 2016 | *Every Filter Extracts a Specific Texture in CNNs* | Zhiwei Xia et al. | Texture-centric analysis of CNN filters. |
| 2016 | *A Powerful Generative Model Using Random Weights for Deep Image Representation* | Kaiming He et al. | Explored generative modeling and representation properties using random-weight networks. |
| 2016 | *A New Method to Visualize Deep Neural Networks* | Luisa M. Zintgraf et al. | Introduced Prediction Difference Analysis for attribution. |
| 2016 | *Salient Deconvolutional Networks* | Aravindh Mahendran; Andrea Vedaldi | Combined saliency with deconvolutional visualization. |
| 2016 | *The Essence of Pose* | Jayant Thatte | Analyzed pose-sensitive representations in deep networks. |
| 2017 | *Visualization of Maximizing Images with Deconvolutional Optimization* | Dmitry Nekhaev; Vladimir Demin | Refinements to deconvolutional optimization techniques. |
| 2017 | *SVCCA: Singular Vector Canonical Correlation Analysis* | Maithra Raghu et al. | Introduced SVCCA for comparing representations across layers and models. |
| 2017 | *Interpreting CNN Knowledge via an Explanatory Graph* | Quanshi Zhang et al. | Graph-based semantic interpretation of CNN internal structure. |
| 2017 | *Axiomatic Attribution for Deep Networks* | Mukund Sundararajan; Ankur Taly; Qiqi Yan | Introduced Integrated Gradients with axiomatic guarantees. |
| 2017 | *Exploring LOTS in Deep Neural Networks* | András Rozsa et al. | Studied localized transferability and robustness of representations. |
| 2017 | *Towards Interpretable DNNs by Leveraging Adversarial Examples* | Yinpeng Dong et al. | Used adversarial examples to probe interpretability. |
| 2017 | *Visualizing Deep Neural Network Decisions: Prediction Difference Analysis* | Luisa M. Zintgraf et al. | Decision-level attribution through systematic perturbations. |
| 2017 | *Interpreting Deep Visual Representations via Network Dissection* | Bolei Zhou et al. | Quantitative semantic alignment of neurons with human concepts. |
| 2018 | *Neural Network Interpretation via Fine-Grained Textual Summarization* | Peng Guo et al. | Generated textual explanations of neural behavior. |
| 2018 | *Towards Understanding Learning Representations* | Lei Wang et al. | Cross-network comparison of learned representations. |
| 2018 | *Insights on Representational Similarity with Canonical Correlation* | Ari S. Morcos et al. | CCA-based metrics for representation similarity analysis. |
| 2018 | *A Theoretical Explanation for Perplexing Behaviors of Visualization* | Weili Nie et al. | Theoretical analysis of artifacts in gradient-based visualization. |
| 2018 | *Visualizing Deep Neural Networks by Alternately Image Blurring and Deblurring* | Fei Wang et al. | Iterative perturbation-based visualization technique. |
| 2019 | *Sampling the “Inverse Set” of a Neuron* | S. S. Hada; Miguel Carreira-Perpiñán | Distributional view of neuron preferences rather than single optima. |
| 2019 | *Understanding Neural Networks via Feature Visualization: A Survey* | Anh Tuan Nguyen; Jason Yosinski; Jeff Clune | Comprehensive survey of feature visualization methods and findings. |
| 2023 | *Frequency and Scale Perspectives of Feature Extraction* | Linfeng Zhang et al. | Frequency-domain and scale-based interpretation of learned features. |


## Conceptual Coverage Map

**Feature Visualization (2009 → 2019)**  
This line of work focuses on making internal representations visible by synthesizing inputs that maximize neuron or feature activations. It begins with early optimization-based visualizations and evolves into multifaceted, generator-based, and distributional views of neuron preferences, culminating in comprehensive surveys that systematize the field.

**Attribution and Explainability**  
Methods such as Layer-Wise Relevance Propagation (LRP), Integrated Gradients, and Prediction Difference Analysis (PDA) aim to explain individual model decisions by assigning relevance or importance to input features. This direction emphasizes faithfulness, axiomatic grounding, and decision-level interpretability rather than global representation structure.

**Representation Similarity**  
Techniques such as SVCCA and convergence studies investigate whether different networks, layers, or training runs learn similar representations. This perspective treats representations as comparable objects and seeks invariances and universality across architectures and optimization trajectories.

**Adversarial Understanding**  
Research on fooling images, adversarial examples, and robustness reveals failure modes of learned representations. These works show that representations can be brittle and expose mismatches between human-aligned semantics and model decision boundaries, motivating deeper analysis of internal structure.

**Semantic Alignment**  
Approaches such as network dissection and explanatory graphs attempt to align internal units or features with human-interpretable concepts. This direction quantifies the degree to which learned representations correspond to semantic categories and compositional structures.

**Theoretical Foundations**  
This category provides formal explanations for observed visualization behaviors, including artifacts in gradient-based methods and distributional perspectives such as inverse sets. These works ground interpretability techniques in theory and clarify their limitations and assumptions.

Taken together, these categories map the field from early qualitative visualization toward a more structured, comparative, and theoretically grounded science of neural representations.
