Skip to content

perf(resolver): track per-scope extractable state at phase level#72

Merged
doubleailes merged 2 commits into
mainfrom
skip-extractable-scopes
May 15, 2026
Merged

perf(resolver): track per-scope extractable state at phase level#72
doubleailes merged 2 commits into
mainfrom
skip-extractable-scopes

Conversation

@doubleailes
Copy link
Copy Markdown
Owner

Summary

Stacks on #71. After #71 made `PackageVariantSlice::extractable` O(1), every `extract` no-op early-return is nearly free — but the function-call round-trip itself is still paid on every probe. Each iteration of the inner extract loop in `ResolvePhase::solve` walks every scope and calls `extract`, even on scopes that have already exhausted their common dependencies.

A scope's extractability only changes when its entries change. Track a per-scope `non_extractable` flag and skip the `extract()` call once it has returned `None`, until intersect / reduce / widen replaces the scope:

  • After `extract` returns `None` for `scope[i]` → set `non_extractable[i]`.
  • `ScopeIntersect::Narrowed` → clear it.
  • `ScopeReduce::Reduced` → clear it.
  • New scope pushed → `non_extractable.push(false)`.

Benchmark (188 cases, release, same machine)

Stage Total Mean vs rez
Baseline (post-#71), median ~13.5 s ~87 ms ~28×
+ this change, run 1 13.1 s 69 ms 29.3×
+ this change, run 2 12.4 s 66 ms 30.8×

Modest, ~5 % typical. The gain shrunk vs predicted because #71 already collapsed the `extract` body to a length compare — there is much less function-call work left to skip than there would have been on the pre-#71 codebase. Honest data point: cascading wins show diminishing returns when each removes the slack the next was relying on.

Cumulative from main: 43.0 s → 12.4 s, 8.8× → 30.8× rez.

Correctness

  • `cargo build` — clean.
  • `cargo test` — passes.
  • `cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` — 188/188 still match rez 1:1, 20.7 s.

The correctness invariant: `non_extractable[i] == true` ⇒ the latest `scope[i]` was just observed to have no common dependencies left. Any operation that would expand or rebuild the variant slice resets the flag.

Base

This PR targets `extractable-counter` (#71). When #71 merges, GitHub will retarget this to `main`.

🤖 Generated with Claude Code

doubleailes and others added 2 commits May 15, 2026 17:05
`PackageVariantSlice::extractable` was a `HashSet::is_subset` call
that iterated `common_fams` and tested membership in `extracted_fams`
on every probe. Fresh callgrind on the rez 188-case benchmark (after
#66/#67/#68/#70) put `HashSet::is_subset` at **11.8 %** of inclusive
cycles — nearly half of `PackageVariantSlice::extract`'s total
25.3 %. Every `extract()` call hits this guard; the vast majority
return early with "nothing left to extract."

The set operation is unnecessary. `extracted_fams` is only ever
populated by inserting an element of `common_fams.difference(
extracted_fams)` in `extract`, and `copy_with_entries` resets it to
empty. So `extracted_fams ⊆ common_fams` always holds, and under
that invariant the `is_subset` check is equivalent to a length
compare:

    common_fams.is_subset(extracted_fams)
      ⟺ common_fams ⊆ extracted_fams ⟺ (since extracted ⊆ common)
        common_fams == extracted_fams ⟺ |common| == |extracted|

Replace the body with `common_fams().len() > extracted_fams.len()`.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                                | Total   | Mean   | vs rez |
|--------------------------------------|--------:|-------:|-------:|
| Baseline (main, post-#70)            |  18.6 s |  99 ms |  20.5× |
| + this change, run 1                 |  13.1 s |  70 ms |  29.1× |
| + this change, run 2                 |  13.9 s |  74 ms |  27.4× |

**-30 % on top of #70**, **-69 % cumulative from main** (43.0 s →
13.1 s, 8.8× rez → 29.1× rez).

Differential test got the same lift: 188/188 still match rez 1:1,
in 20.77 s (down from 27.67 s).

Predicted gain was 5–10 %. Like #68, hidden downstream costs (the
`is_subset` iterator setup/teardown, the hash lookups it performed,
and the now-unnecessary `common_fams_cache` first-time computation
on slices that never need extraction) made the actual gain larger.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After #71 made `PackageVariantSlice::extractable` O(1), every `extract`
call's no-op early-return is nearly free — but the function-call
round-trip itself is still paid on every probe (tens of millions over
the benchmark). Each iteration of the inner extract loop in
`ResolvePhase::solve` walks every scope and calls `extract`, even on
scopes that have already exhausted their common dependencies.

A scope's extractability only changes when its entries change. Track
a per-scope flag and skip the call once `extract` has returned `None`,
until intersect / reduce / widen replaces the scope (in which case
the flag resets to "might be extractable again"):

- After `extract` returns `None` for `scope[i]`: set `non_extractable[i]`.
- `ScopeIntersect::Narrowed`: clear it.
- `ScopeReduce::Reduced`: clear it.
- New scope added: pushed as `false`.

Correctness invariant: `non_extractable[i] == true` ⇒ the latest
`scope[i]` was just observed to have no common dependencies left. Any
operation that would expand or rebuild the variant slice resets the
flag.

## Benchmark (188 cases, release, same machine)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71), median        | ~13.5 s |  87 ms |  ~28×  |
| + this change, run 1               |  13.1 s |  69 ms |  29.3× |
| + this change, run 2               |  12.4 s |  66 ms |  30.8× |

Modest, ~5 % typical. Predicted 2–4 %. The gain shrunk because #71
already collapsed the `extract` body to a length compare — there is
much less function-call work left to skip than there would have been
on the pre-#71 codebase.

Cumulative from main: 43.0 s → 12.4 s, 8.8× → 30.8× rez.

188/188 differential still matches rez 1:1 (`cargo test … --ignored`,
20.7 s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Base automatically changed from extractable-counter to main May 15, 2026 15:14
@doubleailes doubleailes merged commit b79e0a6 into main May 15, 2026
24 checks passed
@doubleailes doubleailes deleted the skip-extractable-scopes branch May 15, 2026 15:20
doubleailes added a commit that referenced this pull request May 15, 2026
Callgrind on rez's 188-case benchmark (post-#71/#72) showed
`SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from
`VersionRange::clone`. Every `Requirement::clone()` (in
`extracted_request.clone()`, the per-pair `package_request.clone()`
in `reduce_by`, the `req.clone()` and `package_request.clone()` in
`Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of
`(Bound, Bound)` segments. After the rest of the perf stack
(#66/#67/#68/#70/#71/#72), this is the largest non-amortised
allocation cost left.

Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`.
`Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the
inner `T`, so the derived semantics on `VersionRange` are unchanged.

Methods that build a new range (`intersection`, `union`, `complement`,
`from_versions`, `span`, `split`, ...) still produce a fresh `Ranges`
internally and wrap it with `Rc::new` — the win is on the read /
clone path, not the construction path.

`as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges`
now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is
shared, but is the consume-the-`VersionRange` API and rare in
practice.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71/#72), median    | ~12.7 s |  68 ms | ~30×   |
| + this change, run 1               |  11.2 s |  60 ms |  34.1× |
| + this change, run 2               |  11.3 s |  60 ms |  33.7× |

**~11 % on top of #72**, **~74 % cumulative from main** (43.0 s →
11.2 s, 8.8× rez → 34.1× rez).

Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still
match rez 1:1**.

Predicted 3–5 %. The slightly bigger gain reflects that
`VersionRange::clone` cascades into a lot more than just the
`SmallVec::extend` it was attributed to in the callgrind exclusive
view — it also drove allocator-side work and the matching `Drop`s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doubleailes added a commit that referenced this pull request May 15, 2026
Callgrind on rez's 188-case benchmark (post-#71/#72) showed
`SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from
`VersionRange::clone`. Every `Requirement::clone()` (in
`extracted_request.clone()`, the per-pair `package_request.clone()`
in `reduce_by`, the `req.clone()` and `package_request.clone()` in
`Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of
`(Bound, Bound)` segments. After the rest of the perf stack
(#66/#67/#68/#70/#71/#72), this is the largest non-amortised
allocation cost left.

Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`.
`Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the
inner `T`, so the derived semantics on `VersionRange` are unchanged.

Methods that build a new range (`intersection`, `union`, `complement`,
`from_versions`, `span`, `split`, ...) still produce a fresh `Ranges`
internally and wrap it with `Rc::new` — the win is on the read /
clone path, not the construction path.

`as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges`
now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is
shared, but is the consume-the-`VersionRange` API and rare in
practice.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71/#72), median    | ~12.7 s |  68 ms | ~30×   |
| + this change, run 1               |  11.2 s |  60 ms |  34.1× |
| + this change, run 2               |  11.3 s |  60 ms |  33.7× |

**~11 % on top of #72**, **~74 % cumulative from main** (43.0 s →
11.2 s, 8.8× rez → 34.1× rez).

Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still
match rez 1:1**.

Predicted 3–5 %. The slightly bigger gain reflects that
`VersionRange::clone` cascades into a lot more than just the
`SmallVec::extend` it was attributed to in the callgrind exclusive
view — it also drove allocator-side work and the matching `Drop`s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant