Skip to content

Stage 2

Jen edited this page May 11, 2026 · 1 revision

Stage 2 — Native Graph Engine

Stage 2 moves graph capabilities from userland into the SpacetimeDB engine itself. Where Stage 1 proved what's possible without engine changes, Stage 2 closes the remaining gap on large graphs by giving the engine first-class knowledge of graph structures.

Motivation

Stage 1's userland module hits a clear ceiling around ~1M edges. The root cause: every traversal must scan the full Edge table into an in-memory HashMap before running BFS/DFS. For LiveJournal (35M edges), that scan-and-hash step dominates. Stage 2 eliminates this by teaching the engine about adjacency — so traversing from one vertex to its neighbors becomes an O(1) operation, not an O(E) scan.

Planned work

1. Native GraphId in AlgebraicTypeDef

Currently, GraphId is a Product(U64) with the __graph_id__ tag — engine code knows about it, but generated client SDKs don't. Stage 2 adds GraphId to AlgebraicTypeDef and TypespaceForGenerate so that C#, TypeScript, Rust, and Unreal clients generate native GraphId types rather than raw u64.

2. Graph-aware physical plan operators

The largest performance win. Stage 2 introduces new PhysicalPlan variants:

  • AdjacencyScan(vertex_id) — direct neighbor lookup without index traversal
  • PathExpand(start, pattern, max_hops) — evaluates variable-length path patterns in a single operator

These operators work against a new storage layout where each vertex carries a compact adjacency list — enabling index-free adjacency traversal. For LiveJournal BFS, this eliminates the 35M-edge scan entirely and replaces it with direct per-vertex neighbor iteration.

Expected impact: LiveJournal BFS drops from ~14.88 s (userland) to a target of 1-2x Neo4j (~3-6 s).

3. Cypher → RelExpr translation

SpacetimeDB already has a Cypher parser (spacetimedb-cypher-parser). Stage 2 wires it into the query planner:

Cypher text → Cypher AST → RelExpr → PhysicalPlan → Executor

This gives users a graph-native query language (MATCH (a)-[*1..3]->(b) RETURN a, b) while reusing SpacetimeDB's existing optimizer, executor, and subscription infrastructure.

4. Subscription integration

SpacetimeDB's killer feature is real-time incremental query evaluation. Stage 2 ensures graph queries participate in the subscription layer:

  • A MATCH pattern subscribes to changes on matching vertices and edges
  • Path queries produce incremental delta updates as the graph evolves
  • Materialized graph views stay up-to-date without polling This is where SpacetimeDB's architectural advantage over Neo4j and AGE becomes most visible — graph queries that push results in real time, rather than requiring periodic re-execution.

5. Closing the LiveJournal gap

The userland module hits 14.88 s on LiveJournal BFS (5x behind Neo4j's 2.76 s). Engine-level adjacency traversal targets closing this gap to 1-2x. Combined with subscription support, SpacetimeDB would offer a unique proposition: real-time graph queries at near-Neo4j performance, without operational complexity.

What Stage 2 does NOT change

  • No new storage formats — adjacency lists live alongside existing table data, not as a replacement.
  • No breaking API changes — the module API remains the primary extension mechanism. Engine operators are additive.
  • No change to the SpacetimeDB architecture — Rust workspace, closed enums, static linking. Graph support is wired into the existing pipeline, not bolted on.

Dependencies

Stage 2 requires the Stage 1 GraphId type to be merged first. The graph-algo crate remains the algorithm library; engine operators call it directly rather than reimplementing BFS/DFS.

Timeline

Stage 2 is in planning. The work items above represent the full scope, but implementation order and phasing depend on upstream SpacetimeDB release cycles and team priorities.