Constrained asynchronous packet-switched graph network experiments in pure PyTorch. The main task is a write-then-query memory-routing problem where writers expire into per-sample node-local cache and a later query must route to the same latent node, read the cached residual, and route to output.
apsgnn/: model, routing, buffers, tasks, training, evaluationconfigs/: v1 and v2 smoke, main, no-cache, throughput configsscripts/: setup and run entrypointstests/: unit tests for routing, buffers, TTL, cache isolationruns/: logs, checkpoints, metricsreports/: final report and plots
bash scripts/setup_env.sh
source .venv/bin/activateSmoke:
bash scripts/smoke.shMain training:
bash scripts/train_4gpu.shNo-cache ablation:
bash scripts/ablate_no_cache.shThroughput benchmark:
bash scripts/benchmark_throughput.shV2 smoke:
bash scripts/smoke_v2.shV2 learned-router training:
bash scripts/train_v2.shV2 learned-router no-cache ablation:
bash scripts/ablate_v2_no_cache.shV3 router CE smoke:
bash scripts/smoke_v3_router_ce.shV3 router auxiliary smoke:
bash scripts/smoke_v3_router_aux.shV3 selected-router training:
bash scripts/train_v3_router.shV3 selected-router no-cache ablation:
bash scripts/ablate_v3_router_no_cache.shV4 implicit-retrieval smoke:
bash scripts/smoke_v4_retrieval_implicit.shV4 key-conditioned-retrieval smoke:
bash scripts/smoke_v4_retrieval_keycond.shV4 selected-retrieval training:
bash scripts/train_v4_retrieval.shV4 selected-retrieval no-cache ablation:
bash scripts/ablate_v4_retrieval_no_cache.shV5 static sparse smoke:
bash scripts/smoke_v5_static_sparse.shV5 static bootstrap smoke:
bash scripts/smoke_v5_static_bootstrap.shV5 growth clone smoke:
bash scripts/smoke_v5_growth_clone.shV5 growth mutate smoke:
bash scripts/smoke_v5_growth_mutate.shV5 static sparse training:
bash scripts/train_v5_static_sparse.shV5 static bootstrap training:
bash scripts/train_v5_static_bootstrap.shV5 growth clone training:
bash scripts/train_v5_growth_clone.shV5 growth mutate training:
bash scripts/train_v5_growth_mutate.shV6 static moderate smoke:
bash scripts/smoke_v6_static_moderate.shV6 static bootstrap moderate smoke:
bash scripts/smoke_v6_static_bootstrap_moderate.shV6 growth clone moderate smoke:
bash scripts/smoke_v6_growth_clone_moderate.shV6 static hard smoke:
bash scripts/smoke_v6_static_hard.shV6 static bootstrap hard smoke:
bash scripts/smoke_v6_static_bootstrap_hard.shV6 growth clone hard smoke:
bash scripts/smoke_v6_growth_clone_hard.shV6 growth mutate hard smoke:
bash scripts/smoke_v6_growth_mutate_followup.shV6 static moderate training:
bash scripts/train_v6_static_moderate.shV6 static bootstrap moderate training:
bash scripts/train_v6_static_bootstrap_moderate.shV6 growth clone moderate training:
bash scripts/train_v6_growth_clone_moderate.shV6 static hard training:
bash scripts/train_v6_static_hard.shV6 static bootstrap hard training:
bash scripts/train_v6_static_bootstrap_hard.shV6 growth clone hard training:
bash scripts/train_v6_growth_clone_hard.shV6 growth mutate hard follow-up:
bash scripts/train_v6_growth_mutate_followup.shV7 static bootstrap hard smoke:
bash scripts/smoke_v7_static_bootstrap_hard.shV7 staged static hard smoke:
bash scripts/smoke_v7_staged_static_hard.shV7 growth clone hard smoke:
bash scripts/smoke_v7_growth_clone_hard.shV7 growth mutate hard smoke:
bash scripts/smoke_v7_growth_mutate_hard.shV7 static bootstrap hard training:
bash scripts/train_v7_static_bootstrap_hard.shV7 staged static hard training:
bash scripts/train_v7_staged_static_hard.shV7 growth clone hard training:
bash scripts/train_v7_growth_clone_hard.shV7 growth mutate hard training:
bash scripts/train_v7_growth_mutate_hard.shV7 growth mutate hard long training:
bash scripts/train_v7_growth_mutate_hard_long.shV8 static bootstrap hard smoke:
bash scripts/smoke_v8_static_bootstrap_hard.shV8 staged static selective hard smoke:
bash scripts/smoke_v8_staged_static_selective_hard.shV8 clone selective hard smoke:
bash scripts/smoke_v8_clone_selective_hard.shV8 random selective hard smoke:
bash scripts/smoke_v8_random_selective_hard.shV8 utility selective hard smoke:
bash scripts/smoke_v8_utility_selective_hard.shV8 utility mutate hard smoke:
bash scripts/smoke_v8_utility_mutate_hard.shV8 static bootstrap hard training:
bash scripts/train_v8_static_bootstrap_hard.shV8 staged static selective hard training:
bash scripts/train_v8_staged_static_selective_hard.shV8 clone selective hard training:
bash scripts/train_v8_clone_selective_hard.shV8 random selective hard training:
bash scripts/train_v8_random_selective_hard.shV8 utility selective hard training:
bash scripts/train_v8_utility_selective_hard.shV8 utility mutate hard training:
bash scripts/train_v8_utility_mutate_hard.shV9 staged static selective long smoke:
bash scripts/smoke_v9_staged_static_selective_long.shV9 clone selective long smoke:
bash scripts/smoke_v9_clone_selective_long.shV9 utility selective long smoke:
bash scripts/smoke_v9_utility_selective_long.shV9 utility mutate long smoke:
bash scripts/smoke_v9_utility_mutate_long.shV9 utility no-success long smoke:
bash scripts/smoke_v9_utility_nosuccess_long.shV9 utility no-grad long smoke:
bash scripts/smoke_v9_utility_nograd_long.shV9 staged static selective long training:
bash scripts/train_v9_staged_static_selective_long.shV9 clone selective long training:
bash scripts/train_v9_clone_selective_long.shV9 utility selective long training:
bash scripts/train_v9_utility_selective_long.shV9 utility mutate long training:
bash scripts/train_v9_utility_mutate_long.shV9 utility no-success long training:
bash scripts/train_v9_utility_nosuccess_long.shV9 utility no-grad long training:
bash scripts/train_v9_utility_nograd_long.shManual evaluation of a checkpoint:
source .venv/bin/activate
torchrun --standalone --nproc_per_node=2 -m apsgnn.eval \
--config configs/main.yaml \
--checkpoint runs/<run>/best.pt \
--tag best_k6_ddp \
--batches 40 \
--output runs/<run>/best_k6_ddp.json
python -m apsgnn.eval \
--config configs/main.yaml \
--checkpoint runs/<run>/best.pt \
--writers-per-episode 10 \
--tag k10 \
--batches 40 \
--output runs/<run>/k10.json
torchrun --standalone --nproc_per_node=2 -m apsgnn.eval \
--config configs/v2_learned_router.yaml \
--checkpoint runs/<run>/best.pt \
--tag best_k6_ddp \
--batches 40 \
--output runs/<run>/best_k6_ddp.json- Address routing uses negative squared L2 to a frozen orthogonal address table with node
0fixed at the origin. - APSGNN v2 replaces the frozen first-hop key hint with a learned strongly supervised first-hop router and optional teacher forcing on the first hop only.
- APSGNN v3 keeps the v2 task and memory path stable, but upgrades first-hop routing with a stronger key-centric router and a CE-vs-aux selection path.
- APSGNN v4 keeps the v3 router fixed, warm-starts from the v3 cached checkpoint, freezes the first-hop router, and weakens cache retrieval with learned implicit or learned key-conditioned attention over cached residuals.
- APSGNN v5 keeps the v4 memory path and v3-style router family, but tests a reduced 16-leaf benchmark with a clockwise transport prior, stage bootstraps, and 4->8->16 growth via clone or mutate splitting.
- APSGNN v6 scales the growth study to a harder 32-leaf benchmark, uses task-packet-only coverage metrics, hardens ingress coverage with a restricted start-node pool, and compares static, static+bootstrap, growth clone, and mutate follow-up runs across moderate and hard regimes.
- APSGNN v7 keeps the v6 hard 32-leaf benchmark and adds the key staged-static curriculum control so the main comparison is now static+bootstrap vs staged-static vs growth-clone across multiple seeds.
- APSGNN v8 keeps the hard 32-leaf setup, replaces pure doubling growth with a selective
4->6->8->12->16->24->32schedule, and compares staged-static, deterministic clone, random selective clone, and utility-ranked selective clone growth. - Scripts requesting 4 GPUs automatically fall back to the available GPU count.
- Metrics and checkpoints are written to
runs/<timestamp>-<name>/. - Final report and summary plots are written to
reports/.