Skip to content

Benchmarks

Jen edited this page May 11, 2026 · 1 revision

Benchmarks

Two benchmark suites. One measures pure algorithm performance; the other compares SpacetimeDB against Neo4j and Apache AGE on real-world graph datasets at scale.


Cross-system benchmarks

A dockerized comparison of SpacetimeDB (Wasm module, zero engine changes), Neo4j Community Edition, and Apache AGE (PostgreSQL extension) across three SNAP datasets. All systems run as Docker containers on the same host.

Hardware: AMD Ryzen 7 3800X 8-Core @ 3.9 GHz, 32 GB RAM, Windows 10.

Systems under test

System Interface Notes
SpacetimeDB (baseline) Reducers over HTTP Bulk-insert + traversal result tables; 3 HTTP round-trips per query
SpacetimeDB (optimized) Procedures over HTTP Single round-trip; short tx for edge snapshot, traversal outside lock
Neo4j (baseline) Bolt + Cypher 5K batch sizes, index after insert, no warmup
Neo4j (optimized) Bolt + APOC 10K batch sizes, index before insert, warmup, APOC subgraphNodes
Apache AGE SQL + recursive CTE Direct INSERT into internal tables, recursive CTE over edge/vertex tables

ego-Facebook (~4K vertices, ~88K edges)

Metric STDB STDB (opt) Neo4j Neo4j (opt) AGE
Load 376 ms 376 ms 4.78 s 1.74 s 1.95 s
Neighbor count 25 ms 1.2 ms 229 ms 8 ms 12 ms
BFS full 31 ms 20 ms 228 ms 54 ms 77 ms
Shortest path 21 ms 16 ms 178 ms 15 ms 127 ms

Neighbors sampled: 347 · BFS visited: 3,829 · SP hops: 5

com-DBLP (~317K vertices, ~1.05M edges)

Metric STDB STDB (opt) Neo4j Neo4j (opt) AGE
Load 4.6 s 4.8 s 34.0 s 23.0 s 30.1 s
Neighbor count 274 ms 1.9 ms 201 ms 112 ms 147 ms
BFS full 1.12 s 339 ms 571 ms 453 ms 4.42 s
Shortest path 307 ms 219 ms 145 ms 7.4 ms 2.20 s

Neighbors sampled: 16 · BFS visited: 234,991 · SP hops: 7

com-LiveJournal (~4M vertices, ~34.7M edges)

Metric STDB STDB (opt) Neo4j Neo4j (opt) AGE
Load 177 s 179 s 1,069 s 786 s 972 s
Neighbor count 8.16 s 1.5 ms 271 ms 110 ms 1.26 s
BFS full 29.0 s 14.5 s 34.7 s 39.7 s 603 s
Shortest path 9.10 s 7.69 s 184 ms 125 ms 4.47 s

Neighbors sampled: 23 · BFS visited: 3,995,057 · SP hops: 2

Neo4j (optimized) required 8 GB heap for LiveJournal (OOM'd at 4 GB).

Transaction hold times (STDB optimized, procedures)

Dataset Edges Neighbor tx BFS tx BFS total SP tx SP total
ego-Facebook 88K ~40 µs 15 ms 20 ms 13 ms 16 ms
com-DBLP 1.05M ~50 µs 211 ms 339 ms 177 ms 219 ms
com-LiveJournal 34.7M ~60 µs 7.2 s 14.5 s 6.6 s 7.7 s

Neighbor count tx is O(degree), constant across datasets. BFS/SP tx time is dominated by the full edge-table scan to build the adjacency HashMap — the bottleneck Stage 2 eliminates.

Analysis

SpacetimeDB dominates loading at every scale. 4-6x faster than all competitors. On LiveJournal: 179 s vs. 786-1,069 s. Bulk reducer calls on in-memory B-tree tables process the entire insertion in a single transaction without per-row commit overhead.

The optimized neighbor count is the standout result. LiveJournal: 8.16 s → 1.5 ms (5,440x improvement). The baseline rebuilds the 34.7M-edge adjacency map just to count 23 neighbors. The optimized version uses a direct B-tree index scan inside a procedure tx held for ~60 µs. Total ~1.5 ms is constant regardless of graph size.

Neo4j's shortest path is unmatched. LiveJournal: 125 ms vs. 7.7 s (62x gap). DBLP: 7.4 ms vs. 219 ms (30x). Root cause is algorithmic: STDB runs unidirectional BFS after scanning all edges into a HashMap; Neo4j uses a dedicated bidirectional shortestPath() that terminates when frontiers meet. On LiveJournal the path is 2 hops — Neo4j finds it in ~50 node expansions, STDB scans all 4M vertices. Stage 2's bidirectional operator targets 80-150 ms.

AGE degrades sharply on larger graphs. BFS: 77 ms (ego-Facebook) → 4.42 s (DBLP) → 603 s (LiveJournal). Recursive CTEs over PostgreSQL tables lack graph-specific query planning; each iteration performs a full join against the edge table with no pruning strategies.

Scaling behavior summary

Metric ego-Facebook winner DBLP winner LiveJournal winner Why
Load STDB STDB STDB Consistent 5-6x advantage at every scale
Neighbors STDB (opt) STDB (opt) STDB (opt) Index scan is O(degree), constant at ~1.5 ms
BFS STDB (opt) STDB (opt) STDB (opt) Adjacency map viable, but slowing at scale
Shortest path Neo4j (opt) Neo4j (opt) Neo4j (opt) Bidirectional BFS wins at every scale

Reproducing

1. Build the SpacetimeDB standalone Docker image:

docker build -f crates/standalone/Dockerfile -t bench-stdb-local .

2. Build the graph module WASM binary:

cargo build -p graph-module --release --target wasm32-unknown-unknown

3. Start all three databases in Docker:

# SpacetimeDB (in-memory, no durability)
docker run -d --name bench-stdb -p 3000:3000 bench-stdb-local start --data-dir=/stdb/data --in-memory --jwt-priv-key-path=/tmp/id_ecdsa --jwt-pub-key-path=/tmp/id_ecdsa.pub

# Neo4j Community (APOC plugin, 8 GB heap, durability off)
docker run -d --name bench-neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=none -e NEO4J_PLUGINS='["apoc"]' -e NEO4J_server_memory_heap_initial__size=2g -e NEO4J_server_memory_heap_max__size=8g neo4j:community

# Apache AGE (PostgreSQL 18, fsync off for fair comparison)
docker run -d --name bench-age -p 5455:5432 --shm-size=1g -e POSTGRES_DB=bench_age -e POSTGRES_HOST_AUTH_METHOD=trust apache/age postgres -c fsync=off -c synchronous_commit=off -c full_page_writes=off

Wait for containers to become healthy (~20 seconds), then:

4. Run the benchmark:

# Single dataset
cargo run -p graph-bench-xsys --release -- --systems spacetimedb,stdb-opt,neo4j,neo4j-opt,age --dataset small

# All three datasets
cargo run -p graph-bench-xsys --release -- --systems spacetimedb,stdb-opt,neo4j,neo4j-opt,age

5. Clean up:

docker rm -f bench-stdb bench-neo4j bench-age

Results written to artifacts/benchmarks/graph-cross-system-benchmarks.md.


Algorithm-level benchmarks (graph-algo crate)

Pure in-memory BFS, DFS, and shortest-path measurements — the same algorithms invoked by the graph module, measured without table-scan overhead to isolate algorithm scaling.

Hardware: AMD Ryzen 7 3800X 8-Core @ 3.9 GHz, 32 GB RAM, Windows 10. Rust 1.93.1.

Topology: Ring-of-successors with fixed out-degree 3. Vertex counts: 100, 1,000, 10,000. Setup: 2 warmup iterations (discarded), 5 measured iterations, median/min/max reported.

Full-graph traversals

Algorithm Vertices Median Min Max
BFS 100 0.010 ms 0.010 ms 0.013 ms
DFS 100 0.011 ms 0.011 ms 0.011 ms
BFS 1,000 0.128 ms 0.117 ms 0.147 ms
DFS 1,000 0.123 ms 0.121 ms 0.128 ms
BFS 10,000 1.729 ms 1.668 ms 2.462 ms
DFS 10,000 2.028 ms 1.694 ms 2.698 ms

Linear scaling — ~10x time for 10x vertices. At 10,000 vertices (30,000 edges), full traversal in under 2 ms. BFS and DFS perform comparably; DFS shows slightly higher variance at large sizes.

Depth-bounded traversals

Algorithm Vertices max_depth Visited Median
BFS 100 3 10 0.002 ms
DFS 100 3 10 0.002 ms
BFS 100 5 16 0.003 ms
DFS 100 5 16 0.003 ms
BFS 100 10 31 0.005 ms
DFS 100 10 31 0.006 ms
BFS 10,000 3 10 0.001 ms
DFS 10,000 3 10 0.001 ms
BFS 10,000 10 31 0.004 ms
DFS 10,000 10 31 0.004 ms

Timing depends on visited count, not total graph size — as expected for O(V_visited + E_visited) algorithms with fixed fan-out.

Shortest path

Vertices Path Hops Median
100 1→4 (near) 1 0.001 ms
100 1→51 17 0.010 ms
1,000 1→4 (near) 1 0.001 ms
1,000 1→501 167 0.113 ms
10,000 1→4 (near) 1 0.001 ms
10,000 1→5,001 1,667 0.979 ms

Near-neighbor lookups: sub-microsecond regardless of graph size (early termination). Cross-ring paths scale with hop count — 1,667 hop path completes in ~1 ms at 10K vertices.

Reproducing

# From the repository root:
cargo bench -p graph-algo --bench traversal

Source: modules/graph-algo/benches/traversal.rs


What these numbers mean for SpacetimeDB

The algorithm benchmarks show the lower bound — how fast the traversal logic itself runs in memory. Inside a live SpacetimeDB instance, the graph module adds B-tree index lookups on the Edge table, transaction overhead, and (for client-invoked queries) HTTP round-trip latency.

The cross-system benchmarks show the real picture: SpacetimeDB's userland module wins loading, neighbors, and BFS at every scale — but shortest path trails Neo4j because the current architecture must scan all edges before it can traverse. Stage 2's bidirectional operator and index-hop BFS close both gaps without sacrificing the loading and neighbor-count advantages.

The architecture that makes Stage 1 fast at data loading and BFS (in-memory bulk operations, single-transaction inserts) is the same architecture that makes it slow at shortest path (must scan all edges before traversing). Stage 2 keeps the strengths and addresses the weaknesses.