perf(resolver): track per-scope `extractable` state at phase level by doubleailes · Pull Request #72 · doubleailes/rer

doubleailes · 2026-05-15T15:13:04Z

Summary

Stacks on #71. After #71 made `PackageVariantSlice::extractable` O(1), every `extract` no-op early-return is nearly free — but the function-call round-trip itself is still paid on every probe. Each iteration of the inner extract loop in `ResolvePhase::solve` walks every scope and calls `extract`, even on scopes that have already exhausted their common dependencies.

A scope's extractability only changes when its entries change. Track a per-scope `non_extractable` flag and skip the `extract()` call once it has returned `None`, until intersect / reduce / widen replaces the scope:

After `extract` returns `None` for `scope[i]` → set `non_extractable[i]`.
`ScopeIntersect::Narrowed` → clear it.
`ScopeReduce::Reduced` → clear it.
New scope pushed → `non_extractable.push(false)`.

Benchmark (188 cases, release, same machine)

Stage	Total	Mean	vs rez
Baseline (post-#71), median	~13.5 s	~87 ms	~28×
+ this change, run 1	13.1 s	69 ms	29.3×
+ this change, run 2	12.4 s	66 ms	30.8×

Modest, ~5 % typical. The gain shrunk vs predicted because #71 already collapsed the `extract` body to a length compare — there is much less function-call work left to skip than there would have been on the pre-#71 codebase. Honest data point: cascading wins show diminishing returns when each removes the slack the next was relying on.

Cumulative from main: 43.0 s → 12.4 s, 8.8× → 30.8× rez.

Correctness

`cargo build` — clean.
`cargo test` — passes.
`cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` — 188/188 still match rez 1:1, 20.7 s.

The correctness invariant: `non_extractable[i] == true` ⇒ the latest `scope[i]` was just observed to have no common dependencies left. Any operation that would expand or rebuild the variant slice resets the flag.

Base

This PR targets `extractable-counter` (#71). When #71 merges, GitHub will retarget this to `main`.

🤖 Generated with Claude Code

`PackageVariantSlice::extractable` was a `HashSet::is_subset` call that iterated `common_fams` and tested membership in `extracted_fams` on every probe. Fresh callgrind on the rez 188-case benchmark (after #66/#67/#68/#70) put `HashSet::is_subset` at **11.8 %** of inclusive cycles — nearly half of `PackageVariantSlice::extract`'s total 25.3 %. Every `extract()` call hits this guard; the vast majority return early with "nothing left to extract." The set operation is unnecessary. `extracted_fams` is only ever populated by inserting an element of `common_fams.difference( extracted_fams)` in `extract`, and `copy_with_entries` resets it to empty. So `extracted_fams ⊆ common_fams` always holds, and under that invariant the `is_subset` check is equivalent to a length compare: common_fams.is_subset(extracted_fams) ⟺ common_fams ⊆ extracted_fams ⟺ (since extracted ⊆ common) common_fams == extracted_fams ⟺ |common| == |extracted| Replace the body with `common_fams().len() > extracted_fams.len()`. ## Benchmark (188 cases, release, same machine, two runs) | Stage | Total | Mean | vs rez | |--------------------------------------|--------:|-------:|-------:| | Baseline (main, post-#70) | 18.6 s | 99 ms | 20.5× | | + this change, run 1 | 13.1 s | 70 ms | 29.1× | | + this change, run 2 | 13.9 s | 74 ms | 27.4× | **-30 % on top of #70**, **-69 % cumulative from main** (43.0 s → 13.1 s, 8.8× rez → 29.1× rez). Differential test got the same lift: 188/188 still match rez 1:1, in 20.77 s (down from 27.67 s). Predicted gain was 5–10 %. Like #68, hidden downstream costs (the `is_subset` iterator setup/teardown, the hash lookups it performed, and the now-unnecessary `common_fams_cache` first-time computation on slices that never need extraction) made the actual gain larger. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

After #71 made `PackageVariantSlice::extractable` O(1), every `extract` call's no-op early-return is nearly free — but the function-call round-trip itself is still paid on every probe (tens of millions over the benchmark). Each iteration of the inner extract loop in `ResolvePhase::solve` walks every scope and calls `extract`, even on scopes that have already exhausted their common dependencies. A scope's extractability only changes when its entries change. Track a per-scope flag and skip the call once `extract` has returned `None`, until intersect / reduce / widen replaces the scope (in which case the flag resets to "might be extractable again"): - After `extract` returns `None` for `scope[i]`: set `non_extractable[i]`. - `ScopeIntersect::Narrowed`: clear it. - `ScopeReduce::Reduced`: clear it. - New scope added: pushed as `false`. Correctness invariant: `non_extractable[i] == true` ⇒ the latest `scope[i]` was just observed to have no common dependencies left. Any operation that would expand or rebuild the variant slice resets the flag. ## Benchmark (188 cases, release, same machine) | Stage | Total | Mean | vs rez | |------------------------------------|--------:|-------:|-------:| | Baseline (post-#71), median | ~13.5 s | 87 ms | ~28× | | + this change, run 1 | 13.1 s | 69 ms | 29.3× | | + this change, run 2 | 12.4 s | 66 ms | 30.8× | Modest, ~5 % typical. Predicted 2–4 %. The gain shrunk because #71 already collapsed the `extract` body to a length compare — there is much less function-call work left to skip than there would have been on the pre-#71 codebase. Cumulative from main: 43.0 s → 12.4 s, 8.8× → 30.8× rez. 188/188 differential still matches rez 1:1 (`cargo test … --ignored`, 20.7 s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

qodo-code-review · 2026-05-15T15:13:09Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Callgrind on rez's 188-case benchmark (post-#71/#72) showed `SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from `VersionRange::clone`. Every `Requirement::clone()` (in `extracted_request.clone()`, the per-pair `package_request.clone()` in `reduce_by`, the `req.clone()` and `package_request.clone()` in `Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of `(Bound, Bound)` segments. After the rest of the perf stack (#66/#67/#68/#70/#71/#72), this is the largest non-amortised allocation cost left. Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`. `Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the inner `T`, so the derived semantics on `VersionRange` are unchanged. Methods that build a new range (`intersection`, `union`, `complement`, `from_versions`, `span`, `split`, ...) still produce a fresh `Ranges` internally and wrap it with `Rc::new` — the win is on the read / clone path, not the construction path. `as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges` now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is shared, but is the consume-the-`VersionRange` API and rare in practice. ## Benchmark (188 cases, release, same machine, two runs) | Stage | Total | Mean | vs rez | |------------------------------------|--------:|-------:|-------:| | Baseline (post-#71/#72), median | ~12.7 s | 68 ms | ~30× | | + this change, run 1 | 11.2 s | 60 ms | 34.1× | | + this change, run 2 | 11.3 s | 60 ms | 33.7× | **~11 % on top of #72**, **~74 % cumulative from main** (43.0 s → 11.2 s, 8.8× rez → 34.1× rez). Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still match rez 1:1**. Predicted 3–5 %. The slightly bigger gain reflects that `VersionRange::clone` cascades into a lot more than just the `SmallVec::extend` it was attributed to in the callgrind exclusive view — it also drove allocator-side work and the matching `Drop`s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

doubleailes and others added 2 commits May 15, 2026 17:05

Base automatically changed from extractable-counter to main May 15, 2026 15:14

doubleailes merged commit b79e0a6 into main May 15, 2026
24 checks passed

doubleailes deleted the skip-extractable-scopes branch May 15, 2026 15:20

doubleailes mentioned this pull request May 15, 2026

perf(version): wrap Ranges in Rc inside VersionRange #73

Merged

3 tasks

doubleailes mentioned this pull request May 15, 2026

docs(readme): add local apples-to-apples rez 3.3.0 benchmark #74

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(resolver): track per-scope `extractable` state at phase level#72

perf(resolver): track per-scope `extractable` state at phase level#72
doubleailes merged 2 commits into
mainfrom
skip-extractable-scopes

doubleailes commented May 15, 2026

Uh oh!

qodo-code-review Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doubleailes commented May 15, 2026

Summary

Benchmark (188 cases, release, same machine)

Correctness

Base

Uh oh!

qodo-code-review Bot commented May 15, 2026

Qodo reviews are paused for this user.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant