Add incremental invalidation/resolution by st0012 · Pull Request #589 · Shopify/rubydex

st0012 · 2026-02-19T23:27:22Z

Summary

Replace the full re-resolution strategy (clear_declarations + resolve everything from scratch) with incremental invalidation. Graph mutations (update/delete_document) now compute the minimal set of definitions, references, and ancestor chains that need re-resolution, and the resolver processes only that subset.

How it works

Graph mutations follow a three-step pipeline:

invalidate — Detaches old definitions from declarations. Identifies affected declarations (definition removed, new definition/reference touching existing names). Feeds them into invalidate_graph, a unified worklist that processes declarations and names.
remove_document_data — Removes old refs/defs/names/strings from maps. Cleans up empty declarations.
extend — Merges new LocalGraph data and queues new definitions/references for resolution.

Work items are accumulated as Unit (definitions, references, ancestors) and drained by the resolver. For the initial full index, this contains everything. For incremental updates, only the invalidated subset.

Reverse index: `name_dependents`

A single reverse index name_dependents: NameId → Vec<NameDependent> maps each name to the definitions and references that depend on it. Each dependent is stored under its own name_id and under the nesting/parent_scope of that name, so invalidation can trace from any scope level directly to affected refs/defs without an intermediate name-to-name layer.

Invalidation worklist

invalidate_graph processes two kinds of items:

Declaration — Two modes: removal (no definitions left or orphaned owner) removes the declaration tree and unresolves all resolved names; ancestor-stale (still has definitions) clears ancestors/descendants and cascades to dependent names.
Name — Two modes: structural cascade (nesting or parent_scope dependency broken) unresolves the name and detaches its refs/defs from the old declaration; ancestor-only (dependencies intact but ancestor context changed) unresolves references for re-evaluation.

Other changes

without_resolution option — Skip accumulating pending work for tools that only need definitions.
name_dependents populated during indexing — LocalGraph builds name_dependents entries in add_definition and add_constant_reference, then merged into the global graph during extend.
Resolution refactored — resolve_all renamed to resolve, drains pending_work via Unit enum instead of scanning all names.

TODO: smarter ancestor invalidation

Currently, any definition change on a declaration enters ancestor-stale mode, which conservatively unresolves all nested references. A future optimization: compare old vs new definitions for ancestor-affecting fields (mixins, parent_class). If they haven't changed, skip ancestor invalidation entirely — only re-resolve the definition itself.

Compared to `main`

Correctness: identical — all declaration counts, definition counts, orphan rates, and linked/orphan breakdowns match exactly between main and the branch.

Performance (initial full index on 94,036 files):

Stage	main	branch	Delta
Listing	0.645s	0.701s	+0.056s (+8.7%)
Indexing	10.309s	10.261s	-0.048s (-0.5%)
Resolution	41.570s	25.275s	-16.295s (-39.2%)
Querying	0.647s	0.696s	+0.049s (+7.6%)
Total	53.171s	36.933s	-16.238s (-30.5%)

Memory:

	main	branch	Delta
Max RSS	3,909 MB	4,437 MB	+528 MB (+13.5%)

Resolution is 39% faster at the cost of 13.5% more memory from the reverse index and pending work accumulation. (Note: benchmarks predate some of the recent changes — numbers may have shifted slightly.)

vinistock

I'm still trying to reason about the algorithm, so not done reviewing yet.

rust/rubydex/src/test_utils/graph_test.rs