Skip to content

perf(resolver): intern package family names as Rc<str>#70

Merged
doubleailes merged 1 commit into
mainfrom
intern-names
May 15, 2026
Merged

perf(resolver): intern package family names as Rc<str>#70
doubleailes merged 1 commit into
mainfrom
intern-names

Conversation

@doubleailes
Copy link
Copy Markdown
Owner

Summary

Callgrind on the rez 188-case benchmark put `String::clone` at ~2.7 % of cycles and hashbrown ops at ~10 %, much of it for the package-family-name keys that flow through every `Requirement`, `PackageScope`, `PackageVariant`, `PackageVariantSlice` and the various `FxHashSet` / `FxHashMap<String, V>` containers on the solver's hot path.

This PR introduces a `Name` type alias (`Rc`) and uses it for those fields. The clones the solver makes constantly during a solve — `Requirement::clone()`, `PackageScope` clones on every narrowing, `extracted_fams` / `request_fams` / `fam_requires` / `common_fams` set ops — become refcount bumps instead of heap allocations.

What changed

  • New `pub type Name = Rc` in `rez_solver::mod`.
  • Fields converted (all on hot paths):
    • `Requirement::name`
    • `RequirementList::{by_name, names, conflict_names}`
    • `PackageVariant::{name, request_fams, conflict_request_fams}`
    • `PackageVariantList::package_name`
    • `PackageVariantSlice::{package_name, extracted_fams, common_fams_cache, fam_requires_cache}`
    • `PackageScope::package_name`
    • `PackageVariantCache::variant_lists` (cache key)
    • `ResolvePhase::extractions`, the per-call `req_fams`
    • `VariantKey::additional_key`
  • `PackageVariantCache` now uses a `get_or_build` helper with a two-step lookup so a cache hit doesn't allocate a fresh `Name` key.
  • The public `PackageRepo = HashMap<String, …>` type is unchanged — it is the JSON shape callers hand in. Conversion to `Name` happens at the boundary (in `PackageVariantList::new`, `PackageVariantCache`, `PackageScope::new`).

`Rc` `Borrow`s, so `HashMap<Name, V>::get(&str)` and `HashSet::contains(&str)` still work transparently — no caller-side ergonomic regression.

Benchmark (188 cases, release, same machine)

Stage Total Mean vs rez
Baseline (main, post-#66/#67/#68) 22.1 s 117 ms 17.4×
+ interned names, run 1 18.6 s 99 ms 20.6×
+ interned names, run 2 18.6 s 99 ms 20.5×

-15.5 % additive, -57 % cumulative from main (43.0 s → 18.6 s, 8.8× → 20.5× rez).

Differential test got the same lift: ~38 s → ~28 s.

Correctness

  • `cargo build` — clean.
  • `cargo test` — passes (one assertion in `RequirementList` tests updated to `Name::from("foo")` since `Rc` doesn't `PartialEq` against `String`).
  • `cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` — 188/188 still match rez 1:1.

🤖 Generated with Claude Code

Callgrind on the rez 188-case benchmark put `String::clone` at ~2.7 %
of cycles and hashbrown ops at ~10 %, much of it for the package-
family-name keys that flow through every `Requirement`, `PackageScope`,
`PackageVariant`, `PackageVariantSlice` and the various
`FxHashSet<String>` / `FxHashMap<String, V>` containers in the solver
hot path.

Replace `String` with a new `Name` type alias (`Rc<str>`) for those
fields. The clones the solver makes constantly during a solve —
`Requirement::clone()` for the per-call `extracted_request.clone()`,
`req.clone()` and `package_request.clone()` flow, `PackageScope`
clones on every narrowing, `extracted_fams` / `request_fams` /
`fam_requires` / `common_fams` set ops — become refcount bumps
instead of heap allocations. `Rc<str>` `Borrow<str>`s, so
`HashMap<Name, V>::get(&str)` and `HashSet<Name>::contains(&str)`
still work transparently.

The repo input type stays as `HashMap<String, …>` (it is the JSON
shape callers hand in); conversion to `Name` happens at the boundary
in `PackageVariantList::new`, `PackageVariantCache`, and
`PackageScope::new`.

`PackageVariantCache` now uses `get_or_build` with a two-step lookup
to avoid allocating a fresh `Name` on cache hits.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (main, post-#68)          |  22.1 s | 117 ms |  17.4× |
| + interned names, run 1            |  18.6 s |  99 ms |  20.6× |
| + interned names, run 2            |  18.6 s |  99 ms |  20.5× |

**-15.5 % on top of #68, -57 % cumulative from main** (43.0 s → 18.6 s,
8.8× rez → 20.5× rez).

## Correctness

188/188 still match rez 1:1 (`cargo test --release -p rer-resolver
--test test_rez_benchmark -- --ignored`). The differential test
itself runs faster — ~38 s → ~28 s — same effect on its workload.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@doubleailes doubleailes merged commit c192a7d into main May 15, 2026
24 checks passed
@doubleailes doubleailes deleted the intern-names branch May 15, 2026 14:56
doubleailes added a commit that referenced this pull request May 15, 2026
`PackageVariantSlice::extractable` was a `HashSet::is_subset` call
that iterated `common_fams` and tested membership in `extracted_fams`
on every probe. Fresh callgrind on the rez 188-case benchmark (after
#66/#67/#68/#70) put `HashSet::is_subset` at **11.8 %** of inclusive
cycles — nearly half of `PackageVariantSlice::extract`'s total
25.3 %. Every `extract()` call hits this guard; the vast majority
return early with "nothing left to extract."

The set operation is unnecessary. `extracted_fams` is only ever
populated by inserting an element of `common_fams.difference(
extracted_fams)` in `extract`, and `copy_with_entries` resets it to
empty. So `extracted_fams ⊆ common_fams` always holds, and under
that invariant the `is_subset` check is equivalent to a length
compare:

    common_fams.is_subset(extracted_fams)
      ⟺ common_fams ⊆ extracted_fams ⟺ (since extracted ⊆ common)
        common_fams == extracted_fams ⟺ |common| == |extracted|

Replace the body with `common_fams().len() > extracted_fams.len()`.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                                | Total   | Mean   | vs rez |
|--------------------------------------|--------:|-------:|-------:|
| Baseline (main, post-#70)            |  18.6 s |  99 ms |  20.5× |
| + this change, run 1                 |  13.1 s |  70 ms |  29.1× |
| + this change, run 2                 |  13.9 s |  74 ms |  27.4× |

**-30 % on top of #70**, **-69 % cumulative from main** (43.0 s →
13.1 s, 8.8× rez → 29.1× rez).

Differential test got the same lift: 188/188 still match rez 1:1,
in 20.77 s (down from 27.67 s).

Predicted gain was 5–10 %. Like #68, hidden downstream costs (the
`is_subset` iterator setup/teardown, the hash lookups it performed,
and the now-unnecessary `common_fams_cache` first-time computation
on slices that never need extraction) made the actual gain larger.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doubleailes added a commit that referenced this pull request May 15, 2026
Callgrind on rez's 188-case benchmark (post-#71/#72) showed
`SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from
`VersionRange::clone`. Every `Requirement::clone()` (in
`extracted_request.clone()`, the per-pair `package_request.clone()`
in `reduce_by`, the `req.clone()` and `package_request.clone()` in
`Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of
`(Bound, Bound)` segments. After the rest of the perf stack
(#66/#67/#68/#70/#71/#72), this is the largest non-amortised
allocation cost left.

Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`.
`Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the
inner `T`, so the derived semantics on `VersionRange` are unchanged.

Methods that build a new range (`intersection`, `union`, `complement`,
`from_versions`, `span`, `split`, ...) still produce a fresh `Ranges`
internally and wrap it with `Rc::new` — the win is on the read /
clone path, not the construction path.

`as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges`
now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is
shared, but is the consume-the-`VersionRange` API and rare in
practice.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71/#72), median    | ~12.7 s |  68 ms | ~30×   |
| + this change, run 1               |  11.2 s |  60 ms |  34.1× |
| + this change, run 2               |  11.3 s |  60 ms |  33.7× |

**~11 % on top of #72**, **~74 % cumulative from main** (43.0 s →
11.2 s, 8.8× rez → 34.1× rez).

Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still
match rez 1:1**.

Predicted 3–5 %. The slightly bigger gain reflects that
`VersionRange::clone` cascades into a lot more than just the
`SmallVec::extend` it was attributed to in the callgrind exclusive
view — it also drove allocator-side work and the matching `Drop`s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doubleailes added a commit that referenced this pull request May 15, 2026
Callgrind on rez's 188-case benchmark (post-#71/#72) showed
`SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from
`VersionRange::clone`. Every `Requirement::clone()` (in
`extracted_request.clone()`, the per-pair `package_request.clone()`
in `reduce_by`, the `req.clone()` and `package_request.clone()` in
`Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of
`(Bound, Bound)` segments. After the rest of the perf stack
(#66/#67/#68/#70/#71/#72), this is the largest non-amortised
allocation cost left.

Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`.
`Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the
inner `T`, so the derived semantics on `VersionRange` are unchanged.

Methods that build a new range (`intersection`, `union`, `complement`,
`from_versions`, `span`, `split`, ...) still produce a fresh `Ranges`
internally and wrap it with `Rc::new` — the win is on the read /
clone path, not the construction path.

`as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges`
now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is
shared, but is the consume-the-`VersionRange` API and rare in
practice.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71/#72), median    | ~12.7 s |  68 ms | ~30×   |
| + this change, run 1               |  11.2 s |  60 ms |  34.1× |
| + this change, run 2               |  11.3 s |  60 ms |  33.7× |

**~11 % on top of #72**, **~74 % cumulative from main** (43.0 s →
11.2 s, 8.8× rez → 34.1× rez).

Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still
match rez 1:1**.

Predicted 3–5 %. The slightly bigger gain reflects that
`VersionRange::clone` cascades into a lot more than just the
`SmallVec::extend` it was attributed to in the callgrind exclusive
view — it also drove allocator-side work and the matching `Drop`s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant