Large-Scale Knowledge Graph Generation Using a Diffusion Approach
FGdiffusion is a framework for generating graphs - including knowledge graphs - using discrete diffusion models. It implements and compares several generative approaches for graph generation, with a focus on discrete diffusion applied to edge-level graph tokens.
Key Contributions:
- Discrete diffusion for graph generation: We adapt LLaDA-style discrete diffusion and score-based discrete diffusion (SEDD) to the graph generation task, operating directly on edge token sequences.
- Graph flattening via BFS ordering: Graphs are serialized into edge-index sequences using BFS node ordering, enabling transformer-based models to process graphs as token sequences.
- Knowledge graph generation: We extend the approach to generate typed knowledge graphs with node and edge labels.
This repository is the official code for the paper "FGdiffusion: Large-Scale Knowledge Graph Generation Using a Diffusion Approach" (Adrien Bufort, Lionel Tailhardat, 2025). If you are using FGdiffusion in your work, please cite:
@misc{hal-05410352,
title = {FGdiffusion: Large-Scale Knowledge Graph Generation Using a Diffusion Approach},
author = {Bufort, Adrien and Tailhardat, Lionel},
year = {2025},
howpublished = {\url{https://hal.science/hal-05410352}}
}| Model | Type | File | Reference |
|---|---|---|---|
| G2PT + LLaDA | Discrete diffusion | deepgraphgen/trainers/trainer_g2pt_llada.py |
Xie et al., 2025 |
| G2PT + Score Diffusion | Score-based discrete diffusion | deepgraphgen/trainers/trainer_g2pt_score.py |
Based on SEDD |
| G2PT + KG (NASA) | KG generation with labels | deepgraphgen/trainers/trainer_g2pt_llada_kg.py |
This paper |
- Python 3.10+
For the full list of dependencies,
- see the pyproject.toml for package names and release number,
- see the DEPENDENCIES.csv for the list of the related licenses.
Note that the CUDA framework is recommended for the FGdiffusion training stage ; it is the user responsibility to download those tools and to agree to the associated terms and conditions.
# Clone the repository
git clone https://github.com/yourusername/FGdiffusion.git
cd FGdiffusion
# Install dependencies
pip install .docker build -t fgdiffusion .
docker run --gpus all -it fgdiffusion# Train G2PT with LLaDA discrete diffusion on planar graphs
python scripts/train_g2pt_llada.py
# Train G2PT with score-based diffusion
python scripts/train_g2pt_score.py
# Train GraphGDP baseline
python scripts/train_diffusion.py
# Train GRAN baseline
python scripts/train_gran.py# Evaluate generated graphs
python scripts/evaluate.py --checkpoint path/to/checkpoint.ckpt# Train on NASA Knowledge Graph
python scripts/train_kg.pyGraphs are flattened into sequences of edge tokens using the following approach:
- BFS ordering: Nodes are ordered via BFS traversal starting from node 0
- Edge serialization: Edges are serialized as pairs of node indices
(u, v) - Padding: Edge sequences are padded to a fixed length based on the
edges_to_node_ratio - Mask tokens: A special
[MASK]token (indexnb_max_node + 2) is used for discrete diffusion
- Graph Transformers outperform GNNs for graph generation, especially in diffusion setups
- Flattening to edge indices works better than adjacency matrix representations
- Discrete diffusion (LLaDA-like) is effective for graph generation
- BFS node ordering significantly enhances learning
FGdiffusion/
├── deepgraphgen/
│ ├── trainers/ # Training modules (PyTorch Lightning)
│ │ ├── trainer_g2pt_auto.py # Autoregressive G2PT
│ │ ├── trainer_g2pt_llada.py # G2PT + LLaDA discrete diffusion
│ │ ├── trainer_g2pt_score.py # G2PT + score-based diffusion
│ │ └── trainer_g2pt_llada_kg.py # G2PT + KG with labels
│ ├── datageneration.py # Graph data generation utilities
│ ├── datasets.py # Dataset classes for all approaches
│ ├── diffusion_generation.py # Diffusion noise scheduling
│ ├── utils.py # Shared utilities
│ └── random_walk_features.py # Random walk feature computation
├── scripts/ # Training & evaluation scripts
├── tests/ # Unit tests
├── scripts_preprocess/ # Data preprocessing notebooks
├── images/ # Example generation images
├── data_kg/ # Knowledge graph data (parquet)
├── Dockerfile
├── pyproject.toml
└── README.md
Copyright (c) 2024-2026, Orange. All rights reserved.