Skip to content

Refactor System.Linq.Lookup to use Dictionary for its hash table#128746

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/unify-hashing-implementations
Draft

Refactor System.Linq.Lookup to use Dictionary for its hash table#128746
Copilot wants to merge 5 commits into
mainfrom
copilot/unify-hashing-implementations

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 29, 2026

System.Linq.Lookup<TKey, TElement> (backing GroupBy, ToLookup, Join, GroupJoin) used a bespoke open-chaining hash table that lacks Dictionary's robustness, severely pessimizing on adversarial hash codes. This unifies the implementation so LINQ shares Dictionary/HashSet hashing behavior and perf characteristics.

Changes

  • Lookup.cs: Replaced the custom hash-bucket array (Grouping[] _groupings + _hashNext chains + manual % prime bucketing + Resize() + InternalGetHashCode) with an internal Dictionary<TKey, Grouping<TKey, TElement>>. GetGrouping now uses CollectionsMarshal.GetValueRefOrAddDefault (create path) and TryGetValue (read path).
  • Null keys: Since Dictionary rejects null keys, they are tracked in a dedicated _nullKeyGrouping field rather than wrapped, and GetHashCode is never handed null. To avoid a breaking change, null keys are still routed through a custom comparer exactly as the old table did: null is treated as hashing to 0, so a null key continues to merge with a non-null key the comparer equates it with whenever that key's masked hash code is also 0, and stays in its own grouping otherwise. This null-routing is gated behind the custom-comparer case, so the default-comparer path keeps Dictionary's string collision-randomization untouched (the default comparer never equates null with a non-null key). The "custom comparer" gate (_customComparer) excludes a comparer that equals EqualityComparer<TKey>.Default, so passing the default comparer explicitly still takes the fast default-comparer path.
  • Ordering: Insertion-order enumeration is unchanged — the existing _next/_lastGrouping circular linked list is retained and continues to drive all iterators.
  • Grouping.cs: Dropped the now-unused _hashCode/_hashNext fields; constructor simplified to Grouping(TKey key).

No public API changes.

Notes

  • The adversarial-hash-code Collision GroupBy case from the repro drops from ~22,400 ms to ~41 ms (N=200000), now matching Dictionary.
  • A degenerate user comparer that returns a constant hash code remains O(n²) — inherent to any hash table and identical to ToDictionary/Distinct, consistent with the unification goal.
  • Added regression tests covering null/custom-comparer merging (both insertion orderings) and the non-merge case when the null-equivalent key's hash code is non-zero, preserving the prior behavior. The full System.Linq test suite passes.

Copilot AI self-assigned this May 29, 2026
Copilot AI review requested due to automatic review settings May 29, 2026 08:04
Copilot AI review requested due to automatic review settings May 29, 2026 08:04
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot May 29, 2026 08:34
Copilot AI changed the title [WIP] Update LINQ hashing implementation to match Dictionary behavior Refactor System.Linq.Lookup to use Dictionary for its hash table May 29, 2026
Copilot AI requested a review from eiriktsarpalis May 29, 2026 08:36
@eiriktsarpalis
Copy link
Copy Markdown
Member

Null keys: Since Dictionary rejects null keys, they are tracked in a dedicated _nullKeyGrouping field rather than wrapped, also preserving the prior behavior of never handing null to a comparer.

@copilot what about custom comparers that equate null with non-nullable keys?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 29, 2026

Null keys: Since Dictionary rejects null keys, they are tracked in a dedicated _nullKeyGrouping field rather than wrapped, also preserving the prior behavior of never handing null to a comparer.

...

Good question — there is a behavioral difference here, but only for that pathological case. Concretely, a comparer where Equals(null, X) == true:

  • Old behavior: null's hash was hardcoded to 0 (it was never routed through the comparer, to avoid NREs). A null key would only merge into a non-null key X's grouping when comparer.GetHashCode(X) & 0x7FFFFFFF == 0 and comparer.Equals(X, null) == true. So the merge was already an accident contingent on X happening to hash to 0; for any other hash it landed in a separate grouping.
  • New behavior: null always gets its own grouping (_nullKeyGrouping), so null and X are never merged.

The reason I didn't make this fully faithful is that the only way to route null through a custom comparer is to wrap every key (since Dictionary throws on null before calling the comparer). But wrapping forces a custom adapter comparer, and Dictionary's string collision-randomization fallback only engages when comparer is NonRandomizedStringEqualityComparer (string keys + default/ordinal comparer, see Dictionary.cs:629). Wrapping would therefore disable the exact robustness this change exists to enable.

It's also worth noting such a comparer can't satisfy the hashing contract anyway: IEqualityComparer<T>.GetHashCode is annotated [DisallowNull], so there's no defined hash for null to pair with GetHashCode(X).

Given that, I kept null special-cased. Happy to switch to the wrapping approach (accepting the loss of string randomization) if you'd prefer to preserve the old edge-case semantics.

@eiriktsarpalis
Copy link
Copy Markdown
Member

there is a behavioral difference here, but only for that pathological case.

@copilot pathological or not, we don't ever knowingly introduce breaking changes into the BCL.

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot May 29, 2026 09:54
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot May 29, 2026 09:57
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 29, 2026

there is a behavioral difference here, but only for that pathological case.

@copilot pathological or not, we don't ever knowingly introduce breaking changes into the BCL.

Agreed — fixed in 6c01fb4. Null keys are now routed through a custom comparer exactly as the old bespoke table did: null is treated as hashing to 0 and compared via the comparer, so a null key still merges with a non-null key it's equated with whenever that key's (masked) hash code is also 0 — and stays separate otherwise. The default-comparer path is unaffected (and keeps Dictionary's string-collision randomization), since the default comparer never equates null with a non-null key. GetHashCode is still never called with null. Added regression tests covering the merge (both insertion orderings) and the non-merge-when-hash-is-nonzero case.

#pragma warning disable CS8714 // Nullability of type argument doesn't match 'notnull' constraint.
_groupings = new Dictionary<TKey, Grouping<TKey, TElement>>(comparer);
#pragma warning restore CS8714
_customComparer = comparer is not null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if specified the comparer parameter might still be the default.

Suggested change
_customComparer = comparer is not null;
_customComparer = !Equals(_comparer, EqualityComparer<T>.Default);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot you did not address my feedback yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit: _customComparer = comparer is not null && !comparer.Equals(EqualityComparer<TKey>.Default);. An explicitly-passed default comparer now takes the fast default-comparer path.

internal TElement[] _elements;
internal int _count;
internal Grouping<TKey, TElement>? _hashNext;
internal Grouping<TKey, TElement>? _next;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this linked list implementation, and is it still relevant now that we're switching over to use Dictionary?

// hashed null to 0 and routed equality through the comparer, so a non-null key merged with the
// null grouping only when its own hash code (masked to non-negative) was also 0. Preserve that
// exact behavior so this remains non-breaking, while never passing null to a comparer's GetHashCode.
if (_nullKeyGrouping is not null && _customComparer)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure this branch gets eliminated when TKey is not nullable:

Suggested change
if (_nullKeyGrouping is not null && _customComparer)
if (default(TKey) is null && _nullKeyGrouping is not null && _customComparer)

#pragma warning disable CS8714 // Nullability of type argument doesn't match 'notnull' constraint. The null case is handled above.
ref Grouping<TKey, TElement>? grouping = ref CollectionsMarshal.GetValueRefOrAddDefault(_groupings, key, out _);
#pragma warning restore CS8714
return grouping ??= CreateGrouping(key);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this method is being invoked after the Lookup instance has been returned to the user? If so, this logic almost definitely breaks thread safety of the immutable Lookup instance.

Copy link
Copy Markdown
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this going to do to fixed Linq footprint (startup time, AOT binary sizes, etc.)? I expect it is going to make the fixed footprint of Linq many time more expensive. I do not think switching Linq to use Dictionary is a good idea.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented May 29, 2026

The adversarial-hash-code Collision GroupBy case from the repro drops from ~22,400 ms to ~41 ms (N=200000), now matching Dictionary.

It is easy to construct poor hashcode implementations that perform poorly with Dictionary too. Users should be implementing hashcodes that are universally good, irrespective of the implementation details of the collections that uses the hashcodes. I do not think this is a valid motivating example.

Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LINQ uses a different hashing implementation from Dictionary<TKey, TValue>

3 participants