Skip to content

Stage 1

Jen edited this page May 11, 2026 · 1 revision

Stage 1 — Userland Graph Extension

Stage 1 delivers a complete graph module for SpacetimeDB using only the existing reducer/procedure API. No engine changes. No new storage formats. No planner modifications. Just a Wasm module that makes SpacetimeDB behave like a graph database.

What's included

GraphId SATS type

A new GraphId special type following the same pattern as Identity, Timestamp, Uuid, and ConnectionId. A 64-bit newtype with full propagation through:

  • AlgebraicType and ProductType — type introspection and construction
  • SATN formatting — PsqlPrintFmt::GraphId, serializer dispatch
  • SQL literal parsing — to_graph_id() in the expression parser
  • PostgreSQL wire protocol — write_graph_id() in the PG encoder

The implementation is intentionally minimal: GraphId is a Product(U64) with the __graph_id__ tag, exactly like Uuid is Product(U128) with __uuid__. Zero new concepts for reviewers familiar with the existing type system.

graph-algo crate

A standalone Rust crate with no SpacetimeDB dependencies:

  • bfs(start, max_depth, neighbors) — breadth-first traversal with cycle detection
  • dfs(start, max_depth, neighbors) — depth-first traversal with cycle detection
  • shortest_path(start, end, neighbors) — BFS-based unweighted shortest path with parent reconstruction

17 unit tests covering: full-graph reachability, depth limits (0, 1, unbounded), cycle handling, isolated vertices, shortest-path tie-breaking, and unreachable targets.

A microbenchmark (benches/traversal.rs) measures traversal performance with warmup, 5-iteration median/min/max reporting across vertex counts from 100 to 10,000.

graph-module Wasm cdylib

A SpacetimeDB module (crate-type = ["cdylib"]) with:

Tables:

  • Vertex — id (auto-increment PK), label, properties (JSON string)
  • Edge — id (auto-increment PK), start_id, end_id, edge_type, properties, B-tree indexes on start_id and end_id
  • TraversalResult — temporary table for BFS/DFS output: run_tag, vertex_id, depth
  • PathResult — temporary table for shortest path output: run_tag, step, vertex_id

Reducers:

  • CRUD: create_vertex, update_vertex, delete_vertex, create_edge, update_edge, delete_edge
  • Traversal: bfs, dfs, shortest_path — build in-memory adjacency map from Edge table, run graph-algo, persist results
  • Maintenance: clear_traversal_results, clear_graph
  • Benchmark: bench_bulk_insert_vertices, bench_bulk_insert_edges, bench_seed_graph, bench_bfs, bench_dfs, bench_shortest_path

Procedures (run outside transaction lock):

  • neighbor_count(vertex_id) — index scan, no lock held
  • bfs_count(start_id) — short tx for edge snapshot, BFS outside lock
  • shortest_path_hops(start_id, end_id) — short tx for edge snapshot, BFS outside lock

Key design decisions

In-reducer adjacency caching. Traversal reducers load the entire Edge table into a HashMap<u64, Vec<u64>> once per invocation, then run graph-algo against the in-memory map. This avoids per-vertex B-tree index lookups during BFS expansion. The tradeoff: memory usage scales with edge count, but for graphs under ~1M edges this is negligible.

Procedure-based lock release. The ReducerContext holds a transaction lock for the full reducer lifetime. For large-graph BFS, this can hold the lock for seconds. Procedures (bfs_count, shortest_path_hops) wrap edge snapshotting in a short with_tx() block, release the lock, then run the traversal. The result: lock hold time drops from O(traversal) to O(index scan).

Performance

See Benchmarks for detailed cross-system comparison results.

What Stage 1 proves

  1. SpacetimeDB can do graphs today — with zero engine changes, just a well-designed module.
  2. The 50x gap was methodology, not engine — initial SQL-iterative benchmarks measured HTTP round-trip overhead, not SpacetimeDB performance.
  3. Userland has a clear ceiling — in-memory adjacency maps work well up to ~1M edges. Beyond that, scan/memory overhead calls for engine-level adjacency traversal.
  4. Stage 2 is justified — the 5x gap on LiveJournal (4M vertices, 35M edges) can't be closed in userland. We need native graph operators.

Limitations

  • GraphId is not yet in AlgebraicTypeDef — generated client code sees Product(U64), not GraphId. Blocked on client-side graph type design for Stage 2.
  • Cypher parser doesn't yet produce GraphId literals — requires the Cypher AST → RelExpr mapping pipeline.
  • Module reducer error handling and traversal result persistence lack dedicated integration tests — tested indirectly through the benchmark harness.

Clone this wiki locally