-
Notifications
You must be signed in to change notification settings - Fork 0
Stage 1
Stage 1 delivers a complete graph module for SpacetimeDB using only the existing reducer/procedure API. No engine changes. No new storage formats. No planner modifications. Just a Wasm module that makes SpacetimeDB behave like a graph database.
A new GraphId special type following the same pattern as Identity, Timestamp, Uuid, and ConnectionId. A 64-bit newtype with full propagation through:
-
AlgebraicTypeandProductType— type introspection and construction - SATN formatting —
PsqlPrintFmt::GraphId, serializer dispatch - SQL literal parsing —
to_graph_id()in the expression parser - PostgreSQL wire protocol —
write_graph_id()in the PG encoder
The implementation is intentionally minimal: GraphId is a Product(U64) with the __graph_id__ tag, exactly like Uuid is Product(U128) with __uuid__. Zero new concepts for reviewers familiar with the existing type system.
A standalone Rust crate with no SpacetimeDB dependencies:
-
bfs(start, max_depth, neighbors)— breadth-first traversal with cycle detection -
dfs(start, max_depth, neighbors)— depth-first traversal with cycle detection -
shortest_path(start, end, neighbors)— BFS-based unweighted shortest path with parent reconstruction
17 unit tests covering: full-graph reachability, depth limits (0, 1, unbounded), cycle handling, isolated vertices, shortest-path tie-breaking, and unreachable targets.
A microbenchmark (benches/traversal.rs) measures traversal performance with warmup, 5-iteration median/min/max reporting across vertex counts from 100 to 10,000.
A SpacetimeDB module (crate-type = ["cdylib"]) with:
Tables:
-
Vertex— id (auto-increment PK), label, properties (JSON string) -
Edge— id (auto-increment PK), start_id, end_id, edge_type, properties, B-tree indexes on start_id and end_id -
TraversalResult— temporary table for BFS/DFS output: run_tag, vertex_id, depth -
PathResult— temporary table for shortest path output: run_tag, step, vertex_id
Reducers:
- CRUD:
create_vertex,update_vertex,delete_vertex,create_edge,update_edge,delete_edge - Traversal:
bfs,dfs,shortest_path— build in-memory adjacency map from Edge table, run graph-algo, persist results - Maintenance:
clear_traversal_results,clear_graph - Benchmark:
bench_bulk_insert_vertices,bench_bulk_insert_edges,bench_seed_graph,bench_bfs,bench_dfs,bench_shortest_path
Procedures (run outside transaction lock):
-
neighbor_count(vertex_id)— index scan, no lock held -
bfs_count(start_id)— short tx for edge snapshot, BFS outside lock -
shortest_path_hops(start_id, end_id)— short tx for edge snapshot, BFS outside lock
In-reducer adjacency caching. Traversal reducers load the entire Edge table into a HashMap<u64, Vec<u64>> once per invocation, then run graph-algo against the in-memory map. This avoids per-vertex B-tree index lookups during BFS expansion. The tradeoff: memory usage scales with edge count, but for graphs under ~1M edges this is negligible.
Procedure-based lock release. The ReducerContext holds a transaction lock for the full reducer lifetime. For large-graph BFS, this can hold the lock for seconds. Procedures (bfs_count, shortest_path_hops) wrap edge snapshotting in a short with_tx() block, release the lock, then run the traversal. The result: lock hold time drops from O(traversal) to O(index scan).
See Benchmarks for detailed cross-system comparison results.
- SpacetimeDB can do graphs today — with zero engine changes, just a well-designed module.
- The 50x gap was methodology, not engine — initial SQL-iterative benchmarks measured HTTP round-trip overhead, not SpacetimeDB performance.
- Userland has a clear ceiling — in-memory adjacency maps work well up to ~1M edges. Beyond that, scan/memory overhead calls for engine-level adjacency traversal.
- Stage 2 is justified — the 5x gap on LiveJournal (4M vertices, 35M edges) can't be closed in userland. We need native graph operators.
- GraphId is not yet in
AlgebraicTypeDef— generated client code seesProduct(U64), notGraphId. Blocked on client-side graph type design for Stage 2. - Cypher parser doesn't yet produce GraphId literals — requires the Cypher AST → RelExpr mapping pipeline.
- Module reducer error handling and traversal result persistence lack dedicated integration tests — tested indirectly through the benchmark harness.