perf(resolver): intern package family names as `Rc<str>` by doubleailes · Pull Request #70 · doubleailes/rer

doubleailes · 2026-05-15T14:52:47Z

Summary

Callgrind on the rez 188-case benchmark put `String::clone` at ~2.7 % of cycles and hashbrown ops at ~10 %, much of it for the package-family-name keys that flow through every `Requirement`, `PackageScope`, `PackageVariant`, `PackageVariantSlice` and the various `FxHashSet` / `FxHashMap<String, V>` containers on the solver's hot path.

This PR introduces a `Name` type alias (`Rc`) and uses it for those fields. The clones the solver makes constantly during a solve — `Requirement::clone()`, `PackageScope` clones on every narrowing, `extracted_fams` / `request_fams` / `fam_requires` / `common_fams` set ops — become refcount bumps instead of heap allocations.

What changed

New `pub type Name = Rc` in `rez_solver::mod`.
Fields converted (all on hot paths):
- `Requirement::name`
- `RequirementList::{by_name, names, conflict_names}`
- `PackageVariant::{name, request_fams, conflict_request_fams}`
- `PackageVariantList::package_name`
- `PackageVariantSlice::{package_name, extracted_fams, common_fams_cache, fam_requires_cache}`
- `PackageScope::package_name`
- `PackageVariantCache::variant_lists` (cache key)
- `ResolvePhase::extractions`, the per-call `req_fams`
- `VariantKey::additional_key`
`PackageVariantCache` now uses a `get_or_build` helper with a two-step lookup so a cache hit doesn't allocate a fresh `Name` key.
The public `PackageRepo = HashMap<String, …>` type is unchanged — it is the JSON shape callers hand in. Conversion to `Name` happens at the boundary (in `PackageVariantList::new`, `PackageVariantCache`, `PackageScope::new`).

`Rc` `Borrow`s, so `HashMap<Name, V>::get(&str)` and `HashSet::contains(&str)` still work transparently — no caller-side ergonomic regression.

Benchmark (188 cases, release, same machine)

Stage	Total	Mean	vs rez
Baseline (main, post-#66/#67/#68)	22.1 s	117 ms	17.4×
+ interned names, run 1	18.6 s	99 ms	20.6×
+ interned names, run 2	18.6 s	99 ms	20.5×

-15.5 % additive, -57 % cumulative from main (43.0 s → 18.6 s, 8.8× → 20.5× rez).

Differential test got the same lift: ~38 s → ~28 s.

Correctness

`cargo build` — clean.
`cargo test` — passes (one assertion in `RequirementList` tests updated to `Name::from("foo")` since `Rc` doesn't `PartialEq` against `String`).
`cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` — 188/188 still match rez 1:1.

🤖 Generated with Claude Code

Callgrind on the rez 188-case benchmark put `String::clone` at ~2.7 % of cycles and hashbrown ops at ~10 %, much of it for the package- family-name keys that flow through every `Requirement`, `PackageScope`, `PackageVariant`, `PackageVariantSlice` and the various `FxHashSet<String>` / `FxHashMap<String, V>` containers in the solver hot path. Replace `String` with a new `Name` type alias (`Rc<str>`) for those fields. The clones the solver makes constantly during a solve — `Requirement::clone()` for the per-call `extracted_request.clone()`, `req.clone()` and `package_request.clone()` flow, `PackageScope` clones on every narrowing, `extracted_fams` / `request_fams` / `fam_requires` / `common_fams` set ops — become refcount bumps instead of heap allocations. `Rc<str>` `Borrow<str>`s, so `HashMap<Name, V>::get(&str)` and `HashSet<Name>::contains(&str)` still work transparently. The repo input type stays as `HashMap<String, …>` (it is the JSON shape callers hand in); conversion to `Name` happens at the boundary in `PackageVariantList::new`, `PackageVariantCache`, and `PackageScope::new`. `PackageVariantCache` now uses `get_or_build` with a two-step lookup to avoid allocating a fresh `Name` on cache hits. ## Benchmark (188 cases, release, same machine, two runs) | Stage | Total | Mean | vs rez | |------------------------------------|--------:|-------:|-------:| | Baseline (main, post-#68) | 22.1 s | 117 ms | 17.4× | | + interned names, run 1 | 18.6 s | 99 ms | 20.6× | | + interned names, run 2 | 18.6 s | 99 ms | 20.5× | **-15.5 % on top of #68, -57 % cumulative from main** (43.0 s → 18.6 s, 8.8× rez → 20.5× rez). ## Correctness 188/188 still match rez 1:1 (`cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored`). The differential test itself runs faster — ~38 s → ~28 s — same effect on its workload. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

qodo-code-review · 2026-05-15T14:52:51Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

`PackageVariantSlice::extractable` was a `HashSet::is_subset` call that iterated `common_fams` and tested membership in `extracted_fams` on every probe. Fresh callgrind on the rez 188-case benchmark (after #66/#67/#68/#70) put `HashSet::is_subset` at **11.8 %** of inclusive cycles — nearly half of `PackageVariantSlice::extract`'s total 25.3 %. Every `extract()` call hits this guard; the vast majority return early with "nothing left to extract." The set operation is unnecessary. `extracted_fams` is only ever populated by inserting an element of `common_fams.difference( extracted_fams)` in `extract`, and `copy_with_entries` resets it to empty. So `extracted_fams ⊆ common_fams` always holds, and under that invariant the `is_subset` check is equivalent to a length compare: common_fams.is_subset(extracted_fams) ⟺ common_fams ⊆ extracted_fams ⟺ (since extracted ⊆ common) common_fams == extracted_fams ⟺ |common| == |extracted| Replace the body with `common_fams().len() > extracted_fams.len()`. ## Benchmark (188 cases, release, same machine, two runs) | Stage | Total | Mean | vs rez | |--------------------------------------|--------:|-------:|-------:| | Baseline (main, post-#70) | 18.6 s | 99 ms | 20.5× | | + this change, run 1 | 13.1 s | 70 ms | 29.1× | | + this change, run 2 | 13.9 s | 74 ms | 27.4× | **-30 % on top of #70**, **-69 % cumulative from main** (43.0 s → 13.1 s, 8.8× rez → 29.1× rez). Differential test got the same lift: 188/188 still match rez 1:1, in 20.77 s (down from 27.67 s). Predicted gain was 5–10 %. Like #68, hidden downstream costs (the `is_subset` iterator setup/teardown, the hash lookups it performed, and the now-unnecessary `common_fams_cache` first-time computation on slices that never need extraction) made the actual gain larger. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Callgrind on rez's 188-case benchmark (post-#71/#72) showed `SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from `VersionRange::clone`. Every `Requirement::clone()` (in `extracted_request.clone()`, the per-pair `package_request.clone()` in `reduce_by`, the `req.clone()` and `package_request.clone()` in `Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of `(Bound, Bound)` segments. After the rest of the perf stack (#66/#67/#68/#70/#71/#72), this is the largest non-amortised allocation cost left. Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`. `Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the inner `T`, so the derived semantics on `VersionRange` are unchanged. Methods that build a new range (`intersection`, `union`, `complement`, `from_versions`, `span`, `split`, ...) still produce a fresh `Ranges` internally and wrap it with `Rc::new` — the win is on the read / clone path, not the construction path. `as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges` now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is shared, but is the consume-the-`VersionRange` API and rare in practice. ## Benchmark (188 cases, release, same machine, two runs) | Stage | Total | Mean | vs rez | |------------------------------------|--------:|-------:|-------:| | Baseline (post-#71/#72), median | ~12.7 s | 68 ms | ~30× | | + this change, run 1 | 11.2 s | 60 ms | 34.1× | | + this change, run 2 | 11.3 s | 60 ms | 33.7× | **~11 % on top of #72**, **~74 % cumulative from main** (43.0 s → 11.2 s, 8.8× rez → 34.1× rez). Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still match rez 1:1**. Predicted 3–5 %. The slightly bigger gain reflects that `VersionRange::clone` cascades into a lot more than just the `SmallVec::extend` it was attributed to in the callgrind exclusive view — it also drove allocator-side work and the matching `Drop`s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

doubleailes merged commit c192a7d into main May 15, 2026
24 checks passed

doubleailes deleted the intern-names branch May 15, 2026 14:56

doubleailes mentioned this pull request May 15, 2026

perf(resolver): O(1) extractable() via length compare #71

Merged

3 tasks

doubleailes mentioned this pull request May 15, 2026

perf(version): wrap Ranges in Rc inside VersionRange #73

Merged

3 tasks

doubleailes mentioned this pull request May 15, 2026

docs(readme): add local apples-to-apples rez 3.3.0 benchmark #74

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(resolver): intern package family names as `Rc<str>`#70

perf(resolver): intern package family names as `Rc<str>`#70
doubleailes merged 1 commit into
mainfrom
intern-names

doubleailes commented May 15, 2026

Uh oh!

qodo-code-review Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doubleailes commented May 15, 2026

Summary

What changed

Benchmark (188 cases, release, same machine)

Correctness

Uh oh!

qodo-code-review Bot commented May 15, 2026

Qodo reviews are paused for this user.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant