Refactor System.Linq.Lookup to use Dictionary for its hash table#128746
Refactor System.Linq.Lookup to use Dictionary for its hash table#128746Copilot wants to merge 5 commits into
Conversation
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
@copilot what about custom comparers that equate null with non-nullable keys? |
Good question — there is a behavioral difference here, but only for that pathological case. Concretely, a comparer where
The reason I didn't make this fully faithful is that the only way to route null through a custom comparer is to wrap every key (since It's also worth noting such a comparer can't satisfy the hashing contract anyway: Given that, I kept null special-cased. Happy to switch to the wrapping approach (accepting the loss of string randomization) if you'd prefer to preserve the old edge-case semantics. |
@copilot pathological or not, we don't ever knowingly introduce breaking changes into the BCL. |
|
Tagging subscribers to this area: @dotnet/area-system-linq |
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Agreed — fixed in 6c01fb4. Null keys are now routed through a custom comparer exactly as the old bespoke table did: |
| #pragma warning disable CS8714 // Nullability of type argument doesn't match 'notnull' constraint. | ||
| _groupings = new Dictionary<TKey, Grouping<TKey, TElement>>(comparer); | ||
| #pragma warning restore CS8714 | ||
| _customComparer = comparer is not null; |
There was a problem hiding this comment.
Even if specified the comparer parameter might still be the default.
| _customComparer = comparer is not null; | |
| _customComparer = !Equals(_comparer, EqualityComparer<T>.Default); |
There was a problem hiding this comment.
Fixed in the latest commit: _customComparer = comparer is not null && !comparer.Equals(EqualityComparer<TKey>.Default);. An explicitly-passed default comparer now takes the fast default-comparer path.
| internal TElement[] _elements; | ||
| internal int _count; | ||
| internal Grouping<TKey, TElement>? _hashNext; | ||
| internal Grouping<TKey, TElement>? _next; |
There was a problem hiding this comment.
What's the purpose of this linked list implementation, and is it still relevant now that we're switching over to use Dictionary?
| // hashed null to 0 and routed equality through the comparer, so a non-null key merged with the | ||
| // null grouping only when its own hash code (masked to non-negative) was also 0. Preserve that | ||
| // exact behavior so this remains non-breaking, while never passing null to a comparer's GetHashCode. | ||
| if (_nullKeyGrouping is not null && _customComparer) |
There was a problem hiding this comment.
We should ensure this branch gets eliminated when TKey is not nullable:
| if (_nullKeyGrouping is not null && _customComparer) | |
| if (default(TKey) is null && _nullKeyGrouping is not null && _customComparer) |
| #pragma warning disable CS8714 // Nullability of type argument doesn't match 'notnull' constraint. The null case is handled above. | ||
| ref Grouping<TKey, TElement>? grouping = ref CollectionsMarshal.GetValueRefOrAddDefault(_groupings, key, out _); | ||
| #pragma warning restore CS8714 | ||
| return grouping ??= CreateGrouping(key); |
There was a problem hiding this comment.
Is there any chance this method is being invoked after the Lookup instance has been returned to the user? If so, this logic almost definitely breaks thread safety of the immutable Lookup instance.
It is easy to construct poor hashcode implementations that perform poorly with Dictionary too. Users should be implementing hashcodes that are universally good, irrespective of the implementation details of the collections that uses the hashcodes. I do not think this is a valid motivating example. |
Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
System.Linq.Lookup<TKey, TElement>(backingGroupBy,ToLookup,Join,GroupJoin) used a bespoke open-chaining hash table that lacksDictionary's robustness, severely pessimizing on adversarial hash codes. This unifies the implementation so LINQ sharesDictionary/HashSethashing behavior and perf characteristics.Changes
Lookup.cs: Replaced the custom hash-bucket array (Grouping[] _groupings+_hashNextchains + manual% primebucketing +Resize()+InternalGetHashCode) with an internalDictionary<TKey, Grouping<TKey, TElement>>.GetGroupingnow usesCollectionsMarshal.GetValueRefOrAddDefault(create path) andTryGetValue(read path).Dictionaryrejects null keys, they are tracked in a dedicated_nullKeyGroupingfield rather than wrapped, andGetHashCodeis never handednull. To avoid a breaking change, null keys are still routed through a custom comparer exactly as the old table did:nullis treated as hashing to0, so anullkey continues to merge with a non-null key the comparer equates it with whenever that key's masked hash code is also0, and stays in its own grouping otherwise. This null-routing is gated behind the custom-comparer case, so the default-comparer path keepsDictionary's string collision-randomization untouched (the default comparer never equatesnullwith a non-null key). The "custom comparer" gate (_customComparer) excludes a comparer that equalsEqualityComparer<TKey>.Default, so passing the default comparer explicitly still takes the fast default-comparer path._next/_lastGroupingcircular linked list is retained and continues to drive all iterators.Grouping.cs: Dropped the now-unused_hashCode/_hashNextfields; constructor simplified toGrouping(TKey key).No public API changes.
Notes
CollisionGroupBycase from the repro drops from ~22,400 ms to ~41 ms (N=200000), now matchingDictionary.ToDictionary/Distinct, consistent with the unification goal.System.Linqtest suite passes.