Skip to content

NikolasBielski/SubgraphXAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SubgraphXAI

SubgraphXAI is a research prototype exploring graph-level explainability for graph neural networks (GNNs), with a focus on extracting, analysing, and reusing explanation subgraphs as first-class artefacts.

The project investigates whether explanation subgraphs can act as compact, semantically meaningful representations for downstream tasks such as graph generation, evaluation, and robustness analysis.

The current implementation targets homogeneous, undirected graphs and uses a toy benchmark dataset (PROTEINS) to validate the end-to-end pipeline before extending to real-world program analysis graphs (e.g. control-flow graphs).


Motivation

Part of my PhD studies, paused from 2023

Graph explainability methods often stop at visualisation.

SubgraphXAI treats explanation subgraphs as reusable signal, asking:

  • Can explanation subgraphs replace full graphs without significant performance loss?
  • Can they serve as inputs to graph generators or evaluators?
  • Do they expose structure that is hidden in full-graph representations?

Project Scope

SubgraphXAI currently implements the following pipeline:

  1. Graph Classification
    • Train a GNN-based graph classification model.
  2. Graph-Level Explainability
    • Extract explanation subgraphs using GNNExplainer.
  3. Explanation Subgraph Analysis
    • Store, visualise, and reuse explanation subgraphs.
  4. (Planned) Graph Generation
    • Generate candidate graphs from explanation subgraphs.
    • Evaluate generated graphs based on prediction confidence.

Current Features

  • Graph classification using a simple GCN with pooling
  • Graph-level explanations using GNNExplainer
  • Extraction of node feature and edge importance masks
  • Visualisation of explanation subgraphs overlaid on original graphs
  • Modular design for modelling, evaluation, and explainability

Dataset

  • PROTEINS dataset from the TUDataset collection (via PyTorch Geometric)
  • Used as a toy dataset for rapid prototyping and validation

Models

The current pipeline trains the following components:

  1. Graph Classification Model
    • Simple GCN architecture
    • Graph-level pooling
  2. Graph Explanation Model
    • GNNExplainer with log-probability objective
    • Produces node feature and edge importance masks
  3. Graph Generator (Planned)
    • Generator architectures under investigation
    • Candidate methods listed in generators.py

Project Structure (High-Level)

subgraphxai/
│
├── main.py # Entry point for training and explanation
├── modelling/ # GNN model definitions
├── evaluation/ # Training, testing, and explanation utilities
├── utils/ # Dataset statistics and helper functions
├── setup.py # pip dependencies
└── README.md

Explanation Workflow

  • GNNExplainer is applied at the graph level
  • Explanation outputs include:
    • Node feature importance masks
    • Edge importance masks
  • Explanation subgraphs are:
    • Thresholded by influence
    • Visualised alongside original graphs
    • Stored for future reuse

Planned Work / TODO

  • Graph classification with PyTorch Geometric (toy dataset)
  • Graph-level explanations with GNNExplainer
  • Explanation subgraph visualisation
  • Store explanation subgraphs for efficient dataloading
  • Implement graph generators using explanation subgraphs as input
  • Evaluate generated graphs using prediction confidence
  • Integrate Devign source code (epicosy)
  • Parse and adapt Devign data (see epicosy/devign Issue #7)
  • Adapt pipeline to CFG-only graphs
  • Test pretrained Devign models and resolve compatibility issues
  • Adapt GNNExplainer for Devign

Research Questions

  • How does model performance change when using explanation subgraphs instead of full graphs?
  • Do explanation subgraphs preserve class-discriminative structure?
  • Can explanation subgraphs act as compact inputs for graph generation?
  • How transferable are explanation methods from toy datasets to real-world program graphs?

Notes

  • SubgraphXAI is an experimental research prototype.
  • The code prioritises clarity and experimentation over optimisation.
  • The scope is intentionally limited to homogeneous graphs before extending to heterogeneous settings.

About

Graph-level explainability for graph neural networks (GNNs), with a focus on extracting, analysing, and reusing explanation subgraphs as first-class artefacts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages