What Do Temporal Graph Learning Models Learn?

Companion repository for the paper What Do Temporal Graph Learning Models Learn? by Abigail J. Hayes, Tobias Schumacher, and Markus Strohmaier.

Paper: arXiv:2510.09416

This repository contains the code for generating dynamic graph datasets, training temporal link prediction models, and aggregating the evaluation outputs used in the paper.

1. Data creation
2. Training
3. Aggregation
4. Plots and tables

Parts of the training and model code are adapted from DyGLib, while dataset generation and evaluation logic are specific to this repository.

Citation

If you use this repository or build on the paper, please cite:

@online{hayes_what_2025,
  title = {What Do Temporal Graph Learning Models Learn?},
  author = {Hayes, Abigail J. and Schumacher, Tobias and Strohmaier, Markus},
  date = {2025},
  eprint = {2510.09416},
  eprinttype = {arXiv},
  doi = {10.48550/arXiv.2510.09416},
  url = {http://arxiv.org/abs/2510.09416},
}

Repository layout

Data folders

data/: generated or preprocessed datasets.
generate/: synthetic graph generators and the dataset configuration registry.
utils/: shared parser, logging, and helper utilities.

Result folders

saved_models/: trained checkpoints.
saved_results/: raw run outputs and aggregated CSV results.
saved_results/plotting/: plotting datasets consumed by the notebooks.
tables/: task-specific summary tables.
figures/: output folder for saved figures.

Scripts and modules

create_structures.py: create or extract temporal growth structures used by synthetic generators.
generate_data.py: preprocess empirical data or generate synthetic datasets from generate/configs.py.
run_model.py: train one model on one dataset split and save predictions and metrics.
run_evaluation.py: aggregate saved run outputs into task-specific evaluation tables.
run_hp_compare.py: aggregate hyperparameter sweep outputs for notebook analysis.
train/: model wrappers, samplers, prediction, and training utilities.
evaluate/: evaluation-time aggregation code for each research question.

Workflow

Create or preprocess data.
Train a model with run_model.py.
Aggregate results with run_evaluation.py or run_hp_compare.py.
Produce figures and tables from the saved CSV outputs in the notebooks.

The evaluation scripts write analysis-ready CSV files, including datasets under saved_results/plotting/ that are used by the plotting notebooks.

1. Data creation

Data source

The data files of the Enron, UCI, and WIkipedia datasts were taken from the data repository of Poursafaei et al.. The Bitcoin-alpha dataset was taken from SNAP. Data should be extracted as-is into /data/raw/DATASET_NAME/ with DATASET_NAME in ["bcalpha", "enron", "uci", "wikipedia"].

Dataset families

Family	Purpose	Example configs
`empirical`	Preprocessed real-world interaction datasets	`uci`, `enron`, `wikipedia`, `bcalpha`
`sbm`	Stochastic block model datasets for homophily-style experiments	`basic`, `basic_swap`, `dense`
`pa`	Preferential attachment style datasets	`basic`, `dense`
`partition`	Datasets for recency-based analyses	`uci`, `enron`, `wikipedia`
`periodic`	Datasets with repeated temporal patterns	`uci1`, `uci2`, `uci5`, `enron1`, `enron2`, `enron5`, `wiki1`, `wiki2`, `wiki5`

The configuration registry lives in generate/configs.py.

Structure creation

Extract a structure from an empirical dataset:

python create_structures.py --type extract --extract_folder uci

Generate a custom structure:

python create_structures.py --type generate --add_ts 3 --add_edges 10,10,35 --add_nodes 15 --name test

Dataset generation or preprocessing

Preprocess an empirical dataset:

python generate_data.py --data_family empirical --data_configs uci

Generate synthetic data:

python generate_data.py --data_family sbm --data_configs basic --gen_seed 0
python generate_data.py --data_family pa --data_configs basic --gen_seed 0
python generate_data.py --data_family partition --data_configs uci --gen_seed 0
python generate_data.py --data_family periodic --data_configs uci1 --gen_seed 0

2. Training

For quick smoke tests, set --max_epoch 1.

Experimental protocol

In the example commands below, replace placeholders such as MODEL_NAME, DATASET_NAME, DATA_CONFIG, TRAIN_SEED, GEN_SEED, TRAIN_VARIANT, and TRAIN_RATIO with the corresponding experiment settings.
MODEL_NAME must be one of cawn, dygformer, dyrep, graphmixer, jodie, tcl, tgat, or tgn.
--train_seed controls model initialization and training randomness.
--gen_seed controls synthetic dataset generation and is only used for generated datasets.
For granularity, density, and directionality, the paper uses --train_seed values 0 to 9.
For periodicity, recency, homophily, and preferential attachment, the paper uses all 5 x 5 combinations of --gen_seed values 0 to 4 and --train_seed values 0 to 4.
Add --overwrite to run_model.py if you want to retrain and replace existing checkpoints and saved outputs.

Representative training commands by graph characteristic:

Temporal granularity:

python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --train_variant TRAIN_VARIANT --sample_final

For granularity experiments, the key setting is --train_variant. Variants used in the paper are default, flat, and ctdg. --sample_final flag is set to that not all possible edges are used for evaluation.

Density:

python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --sample_ratio TRAIN_RATIO --train_variant default

For density experiments, the key setting is --sample_ratio, which controls the training negative-to-positive ratio.

Directionality:

python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --train_variant TRAIN_VARIANT --sample_final

For directionality experiments, the key setting is --train_variant. Variants used in the paper are default, both, and reverse. --sample_final flag is set to that not all possible edges are used for evaluation.

Periodicity or persistence:

python run_model.py --model MODEL_NAME --data_family periodic --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEED

For periodic experiments, the main choice is --data_configs: configs ending in 1 represent persistence-style datasets, while configs ending in 2 or 5 represent periodic structure, i.e., these numbers refer to period length. See also table on dataset families above.

Recency:

python run_model.py --model MODEL_NAME --data_family partition --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEED --skip_validation

For recency experiments, the main choice is --data_configs, which selects the partition-based variant derived from uci, enron, or wikipedia. The --skip_validation flag removes usage of validation set, which is not given in this property---models should simply optimize on learning recent edges.

Homophily:

python run_model.py --model MODEL_NAME --data_family sbm --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEED

For homophily experiments, the main choice is --data_configs, which selects the stochastic block model regime used in the paper, such as basic, basic_swap, or dense.

Preferential attachment:

python run_model.py --model MODEL_NAME --data_family pa --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEED

For preferential attachment experiments, the main choice is --data_configs, which selects the attachment regime used in the paper, currently basic or dense.

Key conventions:

For empirical datasets, saved run names are based on <data_configs>_<sample_ratio>_<train_seed>.
For synthetic datasets, saved run names are based on <data_configs>_<gen_seed>_<train_seed>.
Non-default dropout, neighbor count, and neighbor sampling strategy are appended to the run name.

3. Aggregation

Experimental protocol

Aggregation assumes that the corresponding saved_results/... outputs already exist from run_model.py.
The placeholders in the commands below refer to the same settings described in the training section.
Use the same --train_seed and --gen_seed coverage as in training before aggregating outputs.
For granularity, density, and directionality, aggregate runs across --train_seed values 0 to 9.
For periodicity, recency, homophily, and preferential attachment, aggregate runs across all 5 x 5 combinations of --gen_seed values 0 to 4 and --train_seed values 0 to 4.
The --overwrite flag only works for empirical datasets, where some bigger intermeidate datasets are constructed/aggregated---this aggregation would be skipped. By default, existing results are always overwritten.

Representative aggregation commands by graph characteristic:

Empirical granularity:

python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question granularity

Aggregate across the relevant train_variant settings used in the paper: default, flat, and ctdg.

Empirical density:

python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question density

Run this after training the same dataset across the density ratios of interest via --sample_ratio.

Empirical direction:

python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question direction

Run this after training the relevant direction variants used in the paper: default, both, and reverse.

Periodic persistence or periodicity:

python run_evaluation.py --data_family periodic --data_configs DATA_CONFIG

Use configs ending in 1 for persistence-style analyses and configs ending in 2 or 5 for periodicity analyses.

Partition recency:

python run_evaluation.py --data_family partition --data_configs DATA_CONFIG --eval_question recency

Choose the dataset-specific partition config, typically uci, enron, or wikipedia.

Synthetic homophily:

python run_evaluation.py --data_family sbm --data_configs DATA_CONFIG --eval_question homophily

Choose the SBM regime through --data_configs, for example basic, basic_swap, or dense.

Preferential attachment:

python run_evaluation.py --data_family pa --data_configs DATA_CONFIG

Choose the attachment regime through --data_configs, currently basic or dense.

Notes:

run_evaluation.py writes tables and plotting datasets, not figures.
Some evaluations are routed by data_family rather than eval_question, so the intended family/config pairing matters.

Hyperparameter comparison aggregation

python run_hp_compare.py --type recency --model tgn --config enron
python run_hp_compare.py --type periodic --model tgn --config uci2
python run_hp_compare.py --type pa --model tgn --config basic

4. Plots and tables

Use plots_tables.ipynb and plots_tables_ablations.ipynb to turn aggregated CSV outputs into final figures and tables.

The notebooks expect outputs such as:

task-level CSVs under saved_results/
plotting datasets under saved_results/plotting/
summary tables under tables/

-- Reproduction Example --

Example: test whether GraphMixer learns direction on UCI

python generate_data.py --data_family empirical --data_configs uci
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant default
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant both
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant reverse
python run_evaluation.py --data_family empirical --data_configs uci --eval_question direction

For paper-scale reproduction, repeat training across all required seeds, models, and dataset variants before running the aggregation step.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
evaluate		evaluate
generate		generate
train		train
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_structures.py		create_structures.py
generate_data.py		generate_data.py
plots_tables.ipynb		plots_tables.ipynb
plots_tables_ablations.ipynb		plots_tables_ablations.ipynb
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py
run_hp_compare.py		run_hp_compare.py
run_model.py		run_model.py

Folders and files

Latest commit

History

Repository files navigation

What Do Temporal Graph Learning Models Learn?

Citation

Repository layout

Data folders

Result folders

Scripts and modules

Workflow

1. Data creation

Data source

Dataset families

Structure creation

Dataset generation or preprocessing

2. Training

Experimental protocol

Representative training commands by graph characteristic:

Temporal granularity:

Density:

Directionality:

Periodicity or persistence:

Recency:

Homophily:

Preferential attachment:

Key conventions:

3. Aggregation

Experimental protocol

Representative aggregation commands by graph characteristic:

Empirical granularity:

Empirical density:

Empirical direction:

Periodic persistence or periodicity:

Partition recency:

Synthetic homophily:

Preferential attachment:

Notes:

Hyperparameter comparison aggregation

4. Plots and tables

-- Reproduction Example --

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages