Companion repository for the paper What Do Temporal Graph Learning Models Learn? by Abigail J. Hayes, Tobias Schumacher, and Markus Strohmaier.
Paper: arXiv:2510.09416
This repository contains the code for generating dynamic graph datasets, training temporal link prediction models, and aggregating the evaluation outputs used in the paper.
Parts of the training and model code are adapted from DyGLib, while dataset generation and evaluation logic are specific to this repository.
If you use this repository or build on the paper, please cite:
@online{hayes_what_2025,
title = {What Do Temporal Graph Learning Models Learn?},
author = {Hayes, Abigail J. and Schumacher, Tobias and Strohmaier, Markus},
date = {2025},
eprint = {2510.09416},
eprinttype = {arXiv},
doi = {10.48550/arXiv.2510.09416},
url = {http://arxiv.org/abs/2510.09416},
}data/: generated or preprocessed datasets.generate/: synthetic graph generators and the dataset configuration registry.utils/: shared parser, logging, and helper utilities.
saved_models/: trained checkpoints.saved_results/: raw run outputs and aggregated CSV results.saved_results/plotting/: plotting datasets consumed by the notebooks.tables/: task-specific summary tables.figures/: output folder for saved figures.
create_structures.py: create or extract temporal growth structures used by synthetic generators.generate_data.py: preprocess empirical data or generate synthetic datasets fromgenerate/configs.py.run_model.py: train one model on one dataset split and save predictions and metrics.run_evaluation.py: aggregate saved run outputs into task-specific evaluation tables.run_hp_compare.py: aggregate hyperparameter sweep outputs for notebook analysis.train/: model wrappers, samplers, prediction, and training utilities.evaluate/: evaluation-time aggregation code for each research question.
- Create or preprocess data.
- Train a model with
run_model.py. - Aggregate results with
run_evaluation.pyorrun_hp_compare.py. - Produce figures and tables from the saved CSV outputs in the notebooks.
The evaluation scripts write analysis-ready CSV files, including datasets under saved_results/plotting/ that are used by the plotting notebooks.
The data files of the Enron, UCI, and WIkipedia datasts were taken from the data repository of Poursafaei et al..
The Bitcoin-alpha dataset was taken from SNAP. Data should be extracted as-is into /data/raw/DATASET_NAME/ with DATASET_NAME in ["bcalpha", "enron", "uci", "wikipedia"].
| Family | Purpose | Example configs |
|---|---|---|
empirical |
Preprocessed real-world interaction datasets | uci, enron, wikipedia, bcalpha |
sbm |
Stochastic block model datasets for homophily-style experiments | basic, basic_swap, dense |
pa |
Preferential attachment style datasets | basic, dense |
partition |
Datasets for recency-based analyses | uci, enron, wikipedia |
periodic |
Datasets with repeated temporal patterns | uci1, uci2, uci5, enron1, enron2, enron5, wiki1, wiki2, wiki5 |
The configuration registry lives in generate/configs.py.
Extract a structure from an empirical dataset:
python create_structures.py --type extract --extract_folder uciGenerate a custom structure:
python create_structures.py --type generate --add_ts 3 --add_edges 10,10,35 --add_nodes 15 --name testPreprocess an empirical dataset:
python generate_data.py --data_family empirical --data_configs uciGenerate synthetic data:
python generate_data.py --data_family sbm --data_configs basic --gen_seed 0
python generate_data.py --data_family pa --data_configs basic --gen_seed 0
python generate_data.py --data_family partition --data_configs uci --gen_seed 0
python generate_data.py --data_family periodic --data_configs uci1 --gen_seed 0For quick smoke tests, set --max_epoch 1.
- In the example commands below, replace placeholders such as
MODEL_NAME,DATASET_NAME,DATA_CONFIG,TRAIN_SEED,GEN_SEED,TRAIN_VARIANT, andTRAIN_RATIOwith the corresponding experiment settings. MODEL_NAMEmust be one ofcawn,dygformer,dyrep,graphmixer,jodie,tcl,tgat, ortgn.--train_seedcontrols model initialization and training randomness.--gen_seedcontrols synthetic dataset generation and is only used for generated datasets.- For granularity, density, and directionality, the paper uses
--train_seedvalues0to9. - For periodicity, recency, homophily, and preferential attachment, the paper uses all
5 x 5combinations of--gen_seedvalues0to4and--train_seedvalues0to4. - Add
--overwritetorun_model.pyif you want to retrain and replace existing checkpoints and saved outputs.
python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --train_variant TRAIN_VARIANT --sample_finalFor granularity experiments, the key setting is --train_variant. Variants used in the paper are default, flat, and ctdg. --sample_final flag is set to that not all possible edges are used for evaluation.
python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --sample_ratio TRAIN_RATIO --train_variant defaultFor density experiments, the key setting is --sample_ratio, which controls the training negative-to-positive ratio.
python run_model.py --model MODEL_NAME --data_family empirical --data_configs DATASET_NAME --train_seed TRAIN_SEED --train_variant TRAIN_VARIANT --sample_finalFor directionality experiments, the key setting is --train_variant. Variants used in the paper are default, both, and reverse. --sample_final flag is set to that not all possible edges are used for evaluation.
python run_model.py --model MODEL_NAME --data_family periodic --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEEDFor periodic experiments, the main choice is --data_configs: configs ending in 1 represent persistence-style datasets, while configs ending in 2 or 5 represent periodic structure, i.e., these numbers refer to period length. See also table on dataset families above.
python run_model.py --model MODEL_NAME --data_family partition --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEED --skip_validationFor recency experiments, the main choice is --data_configs, which selects the partition-based variant derived from uci, enron, or wikipedia. The --skip_validation flag removes usage of validation set, which is not given in this property---models should simply optimize on learning recent edges.
python run_model.py --model MODEL_NAME --data_family sbm --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEEDFor homophily experiments, the main choice is --data_configs, which selects the stochastic block model regime used in the paper, such as basic, basic_swap, or dense.
python run_model.py --model MODEL_NAME --data_family pa --data_configs DATA_CONFIG --gen_seed GEN_SEED --train_seed TRAIN_SEEDFor preferential attachment experiments, the main choice is --data_configs, which selects the attachment regime used in the paper, currently basic or dense.
- For empirical datasets, saved run names are based on
<data_configs>_<sample_ratio>_<train_seed>. - For synthetic datasets, saved run names are based on
<data_configs>_<gen_seed>_<train_seed>. - Non-default dropout, neighbor count, and neighbor sampling strategy are appended to the run name.
- Aggregation assumes that the corresponding
saved_results/...outputs already exist fromrun_model.py. - The placeholders in the commands below refer to the same settings described in the training section.
- Use the same
--train_seedand--gen_seedcoverage as in training before aggregating outputs. - For granularity, density, and directionality, aggregate runs across
--train_seedvalues0to9. - For periodicity, recency, homophily, and preferential attachment, aggregate runs across all
5 x 5combinations of--gen_seedvalues0to4and--train_seedvalues0to4. - The
--overwriteflag only works for empirical datasets, where some bigger intermeidate datasets are constructed/aggregated---this aggregation would be skipped. By default, existing results are always overwritten.
python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question granularityAggregate across the relevant train_variant settings used in the paper: default, flat, and ctdg.
python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question densityRun this after training the same dataset across the density ratios of interest via --sample_ratio.
python run_evaluation.py --data_family empirical --data_configs DATASET_NAME --eval_question directionRun this after training the relevant direction variants used in the paper: default, both, and reverse.
python run_evaluation.py --data_family periodic --data_configs DATA_CONFIGUse configs ending in 1 for persistence-style analyses and configs ending in 2 or 5 for periodicity analyses.
python run_evaluation.py --data_family partition --data_configs DATA_CONFIG --eval_question recencyChoose the dataset-specific partition config, typically uci, enron, or wikipedia.
python run_evaluation.py --data_family sbm --data_configs DATA_CONFIG --eval_question homophilyChoose the SBM regime through --data_configs, for example basic, basic_swap, or dense.
python run_evaluation.py --data_family pa --data_configs DATA_CONFIGChoose the attachment regime through --data_configs, currently basic or dense.
run_evaluation.pywrites tables and plotting datasets, not figures.- Some evaluations are routed by
data_familyrather thaneval_question, so the intended family/config pairing matters.
python run_hp_compare.py --type recency --model tgn --config enron
python run_hp_compare.py --type periodic --model tgn --config uci2
python run_hp_compare.py --type pa --model tgn --config basicUse plots_tables.ipynb and plots_tables_ablations.ipynb to turn aggregated CSV outputs into final figures and tables.
The notebooks expect outputs such as:
- task-level CSVs under
saved_results/ - plotting datasets under
saved_results/plotting/ - summary tables under
tables/
Example: test whether GraphMixer learns direction on UCI
python generate_data.py --data_family empirical --data_configs uci
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant default
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant both
python run_model.py --model graphmixer --data_family empirical --data_configs uci --train_seed 0 --train_variant reverse
python run_evaluation.py --data_family empirical --data_configs uci --eval_question directionFor paper-scale reproduction, repeat training across all required seeds, models, and dataset variants before running the aggregation step.