Skip to content

Add incremental invalidation/resolution#589

Open
st0012 wants to merge 2 commits intounresolve-primitivesfrom
add-incremental-invalidation
Open

Add incremental invalidation/resolution#589
st0012 wants to merge 2 commits intounresolve-primitivesfrom
add-incremental-invalidation

Conversation

@st0012
Copy link
Member

@st0012 st0012 commented Feb 19, 2026

Summary

Replace the full re-resolution strategy (clear_declarations + resolve everything from scratch) with incremental invalidation. Graph mutations (update/delete_document) now compute the minimal set of definitions, references, and ancestor chains that need re-resolution, and the resolver processes only that subset.

How it works

Graph mutations follow a three-step pipeline:

  1. invalidate — Detaches old definitions from declarations. Identifies affected declarations (definition removed, new definition/reference touching existing names). Feeds them into invalidate_graph, a unified worklist that processes declarations and names.
  2. remove_document_data — Removes old refs/defs/names/strings from maps. Cleans up empty declarations.
  3. extend — Merges new LocalGraph data and queues new definitions/references for resolution.

Work items are accumulated as Unit (definitions, references, ancestors) and drained by the resolver. For the initial full index, this contains everything. For incremental updates, only the invalidated subset.

Reverse index: name_dependents

A single reverse index name_dependents: NameId → Vec<NameDependent> maps each name to the definitions and references that depend on it. Each dependent is stored under its own name_id and under the nesting/parent_scope of that name, so invalidation can trace from any scope level directly to affected refs/defs without an intermediate name-to-name layer.

Invalidation worklist

invalidate_graph processes two kinds of items:

  • Declaration — Two modes: removal (no definitions left or orphaned owner) removes the declaration tree and unresolves all resolved names; ancestor-stale (still has definitions) clears ancestors/descendants and cascades to dependent names.
  • Name — Two modes: structural cascade (nesting or parent_scope dependency broken) unresolves the name and detaches its refs/defs from the old declaration; ancestor-only (dependencies intact but ancestor context changed) unresolves references for re-evaluation.

Other changes

  • without_resolution option — Skip accumulating pending work for tools that only need definitions.
  • name_dependents populated during indexingLocalGraph builds name_dependents entries in add_definition and add_constant_reference, then merged into the global graph during extend.
  • Resolution refactoredresolve_all renamed to resolve, drains pending_work via Unit enum instead of scanning all names.

TODO: smarter ancestor invalidation

Currently, any definition change on a declaration enters ancestor-stale mode, which conservatively unresolves all nested references. A future optimization: compare old vs new definitions for ancestor-affecting fields (mixins, parent_class). If they haven't changed, skip ancestor invalidation entirely — only re-resolve the definition itself.

Compared to main

Correctness: identical — all declaration counts, definition counts, orphan rates, and linked/orphan breakdowns match exactly between main and the branch.

Performance (initial full index on 94,036 files):

Stage main branch Delta
Listing 0.645s 0.701s +0.056s (+8.7%)
Indexing 10.309s 10.261s -0.048s (-0.5%)
Resolution 41.570s 25.275s -16.295s (-39.2%)
Querying 0.647s 0.696s +0.049s (+7.6%)
Total 53.171s 36.933s -16.238s (-30.5%)

Memory:

main branch Delta
Max RSS 3,909 MB 4,437 MB +528 MB (+13.5%)

Resolution is 39% faster at the cost of 13.5% more memory from the reverse index and pending work accumulation. (Note: benchmarks predate some of the recent changes — numbers may have shifted slightly.)

@st0012 st0012 requested a review from a team as a code owner February 19, 2026 23:27
@st0012 st0012 self-assigned this Feb 19, 2026
@st0012 st0012 force-pushed the add-incremental-invalidation branch from 365da40 to 774d476 Compare February 19, 2026 23:34
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to reason about the algorithm, so not done reviewing yet.

@st0012 st0012 force-pushed the add-incremental-invalidation branch from 7c0e491 to e8cdc32 Compare February 25, 2026 15:54
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is highly complex. I think the mechanism works, but it's hard to ship so many changes in one go. Here's what I think we should do to minimize the changes and ship with confidence:

  • Move the rename of resolve_all to resolve to a separate PR
  • Create a PR implementing unresolve_name and unresolve_reference that can be shipped separate from the algorithm
  • Add the name_dependents hashmap of IdentityHashMap<NameId, IdentityHashSet<NameDependent>> using a NameDependent enum. Populate this map during indexing when creating names, so that the global graph only has to merge the work at the end

With this foundation, it will be significantly easier for reviewers to focus on the algorithm. What do you think?

Comment on lines +463 to +468
if let Some(name_set) = self.declaration_to_names.get_mut(&declaration_id) {
name_set.remove(&name_id);
if name_set.is_empty() {
self.declaration_to_names.remove(&declaration_id);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the reasons why I'm not a fan of the multiple hashmaps. We need to ensure that the data on the auxiliary maps is also consistent and the benefit is just avoiding tracing the graph from declaration -> definitions -> names. I'm not convinced tbh.

Also, to make sure we're making progress, I think you can probably ship a separate PR with just the unresolve_name and unresolve_reference methods separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address all the feedback first and then see how we can split this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one: #627

Comment on lines +75 to +81
/// Reverse index: for each `NameId`, which definitions and constant references use it.
/// Eliminates O(D+R) scans during invalidation.
name_users: IdentityHashMap<NameId, Vec<NameUser>>,

/// Reverse index: for each `NameId`, which other names depend on it
/// (via nesting or `parent_scope`). Used for cascade invalidation.
name_dependents: IdentityHashMap<NameId, IdentityHashSet<NameId>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these two maps? The idea of having an enum for name dependents is so that you can go NameId -> ReferenceId | DefinitionId -> NameId.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the maps but I think the value should also include NameId. So it'd be NameId -> ReferenceId | DefinitionId | NameId.

I prototyped using reference/definition to look up name but it doesn't work well with parent_scope/nesting cases. For example:

class Baz
  include Foo
  CONST
end

The reference CONST creates a name with nesting=baz_name, but it doesn't create a member declaration under Baz. So Baz.members() is empty — there's no path from baz_name to the reference's name through the declaration/definitions. When Baz's ancestors change (e.g., a new prepend Bar is added), we need to re-evaluate CONST, but without the explicit Name(NameId) entry under baz_name, we can't discover it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no path from baz_name to the reference's name through the declaration/definitions

This is the only reason why we need name_dependents. In your example, this is what I would expect the hashmap to look like:

# name_dependents

{
  NameId(Baz) => Set[ReferenceId(Foo), ReferenceId(CONST)]
}

This allows the graph to remember which references and definitions will be potentially impacted by a name change. We then trace name_dependents and the rest of the graph to unresolve names.

In this case, if we had to unresolve all names due to a change to Baz, I would expect the algorithm to do something like this:

  1. Baz changed. Loop through name dependents
  2. Regardless whether the dependent is a reference or definition, get its name_id, pull the name from the graph and unresolve it
  3. Now, unresolving the definition and reference may invalidate other things. Go back to 1. and invalidate the name_id for the reference/definition

This also involves invalidating ancestors, but you get the idea.


/// Accumulated work items from update/delete operations.
/// Drained by `take_pending_work()` before resolution.
pending_work: Vec<Unit>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need an answer for the ever growing memory if users don't call resolve.

@st0012 st0012 force-pushed the add-incremental-invalidation branch from 995854d to 746e90c Compare February 27, 2026 21:24
Add methods for unwinding name resolution during incremental updates:

- `unresolve_name`: converts a Resolved name back to Unresolved
- `unresolve_reference`: detaches a reference from its declaration and
  unresolves the underlying name
- `remove_name`: removes a name and cleans up reverse indices
- `detach_name_from_declaration`: shared helper for reverse index cleanup

Add `declaration_to_names` reverse index that tracks which names resolve
to each declaration. Populated during `record_resolved_name`, cleaned up
by the unresolve/remove primitives.

Integrate `unresolve_reference` into `remove_definitions_for_document`
to properly detach references during document updates.
@st0012 st0012 force-pushed the add-incremental-invalidation branch from a421b3c to 9ee49c6 Compare March 2, 2026 22:04
@st0012 st0012 changed the base branch from main to unresolve-primitives March 2, 2026 22:04
@st0012 st0012 force-pushed the add-incremental-invalidation branch from 9ee49c6 to 417635f Compare March 3, 2026 20:24
@st0012 st0012 force-pushed the unresolve-primitives branch 2 times, most recently from 4980e3b to 0ac1ae6 Compare March 3, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants