Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve get/set performance in HashSet<T> #40106

Open
wants to merge 4 commits into
base: master
from

Conversation

@JeffreyZhao
Copy link

commented Aug 7, 2019

Similar optimization in Dictionary<,> has been ported to HashSet<>.

_comparer has been made nullable, and default by null. This
makes us possible to check if it's null for faster branches, instead
of comparing it to EqualityComparer<T>.Default. Since _comparer
is null by default, the constructors and the OnDeserialization
method have been modified accordingly to save null instead of the
EqualityComparer<T>.Default if it's been passed/saved. Also, the
places that use _comparer directly much do a null check and switch
to EqualityComparer<T>.Default instead.

The major optimization is under Contains and AddIfNotPresent
methods. Basically the code was:

// Code uses _comparer

Now it becomes:

var comparer = _comparer;
if (comparer == null)
{
  if (default(T) != null)
  {
    // Code uses EqualityComparer<T>.Default for de-virtualizing.
  }
  else
  {
    var defaultComparer = EqualityComparer<T>.Default;
    // Code uses cached defaultComparer
  }
}
else
{
  // Code uses comparer
}

I've intentionally kept the origin implementation and comments as
much as possible, and the optimized code is not exactly the same as
in Dictionary<T>. Further micro-optimizations might be possible
but I think it's cleaner to keep it that way in the first step.

The Remove method hasn't been optimized since it's complicated,
and not been done in Dictionary either. Some other methods like
AddOrGetLocation or InternalIndexOf are not optimized since
they're only used in less-common scenarios, and not in the scope
of the current issue.

The benchmark shows there're obvious improvements in get and set
operations for all cases, especialy if the item is a value type.
I was actually expecting to see some pull back for "slower" case,
like if we use a custom comparer, but it turns out it's getting
faster too, since we have cached the comparer instead of using
_comparer directly in the loop.

The benchmark also shows the optimized HashSet has close or
better performance comparing to Dictionary in almost all cases,
except for HashSet<string> with default comparer. The default
comparer in Dictionary<string> is a "non-random" implementation
that performs better. Unfortunately the comparer is internal to
the CoreLib.

The optimized HashSet<string> is still faster since it's using
cached comparer of type EqualityComparer<string>, which uses
virtual method dispatching instead of interface dispatching. It's
just not as faster as Dictionary<string, T> yet.

Fix #36448

Improve get/set performance in HashSet<T>
Similar optimization in Dictionary<,> has been ported to HashSet<>.

`_comparer` has been made nullable, and default by `null`. This
makes us possible to check if it's null for faster branches, instead
of comparing it to `EqualityComparer<T>.Default`. Since `_comparer`
is `null` by default, the constructors and the `OnDeserialization`
method have been modified accordingly to save `null` instead of the
`EqualityComparer<T>.Default` if it's been passed/saved. Also, the
places that use `_comparer` directly much do a null check and switch
to `EqualityComparer<T>.Default` instead.

The major optimization is under `Contains` and `AddIfNotPresent`
methods. Basically the code was:

``
// Code uses _comparer
``

Now it becomes:

```
var comparer = _comparer;
if (comparer == null)
{
  if (default(T) != null)
  {
    // Code uses EqualityComparer<T>.Default for de-virtualizing.
  }
  else
  {
    var defaultComparer = EqualityComparer<T>.Default;
    // Code uses cached defaultComparer
  }
}
else
{
  // Code uses comparer
}
```

I've intentionally kept the origin implementation and comments as
much as possible, and the optimized code is not exactly the same as
in `Dictionary<T>`. Further micro-optimizations might be possible
but I think it's cleaner to keep it that way in the first step.

The `Remove` method hasn't been optimized since it's complicated,
and not been done in `Dictionary` either. Some other methods like
`AddOrGetLocation` or `InternalIndexOf` are not optimized since
they're only used in less-common scenarios, and not in the scope
of the current issue.

The benchmark shows there're obvious improvements in get and set
operations for all cases, especialy if the item is a value type.
I was actually expecting to see some pull back for "slower" case,
like if we use a custom comparer, but it turns out it's getting
faster too, since we have cached the comparer instead of using
`_comparer` directly in the loop.

The benchmark also shows the optimized `HashSet` has close or
better performance comparing to `Dictionary` in almost all cases,
except for `HashSet<string>` with default comparer. The default
comparer in `Dictionary<string>` is a "non-random" implementation
that performs better. Unfortunately the comparer is internal to
the CoreLib.

The optimized `HashSet<string>` is still faster since it's using
cached comparer of type `EqualityComparer<string>`, which uses
virtual method dispatching instead of interface dispatching. It's
just not as faster as `Dictionary<string, T>` yet.

Fix #36448
@JeffreyZhao

This comment has been minimized.

Copy link
Author

commented Aug 7, 2019

The benchmark code can be found in https://github.com/JeffreyZhao/HashSetPerf, in which the FastHashSet is the optimized version. The benchmark compares performance among Dictionary<TKey, TValue>, current HashSet<T> and optimized FastHashSet<T>, and use HashSet<T> as the baseline.

Result:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17763.615 (1809/October2018Update/Redstone5)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview7-012821
  [Host]     : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
Method Mean Error StdDev Median Ratio RatioSD
Get_StringDict 21.053 ns 0.2522 ns 0.2359 ns 20.999 ns 0.84 0.01
Get_StringSet 25.051 ns 0.2697 ns 0.2523 ns 24.971 ns 1.00 0.00
Get_StringFastSet 23.986 ns 0.1639 ns 0.1533 ns 23.997 ns 0.96 0.01
Get_IntDict 6.596 ns 0.0388 ns 0.0363 ns 6.597 ns 0.63 0.01
Get_IntSet 10.430 ns 0.2240 ns 0.2200 ns 10.377 ns 1.00 0.00
Get_IntFastSet 6.632 ns 0.0335 ns 0.0297 ns 6.638 ns 0.64 0.01
Get_ObjDict 16.235 ns 0.0976 ns 0.0865 ns 16.201 ns 0.82 0.02
Get_ObjSet 19.278 ns 0.3845 ns 0.7222 ns 19.200 ns 1.00 0.00
Get_ObjFastSet 17.266 ns 0.3439 ns 0.7178 ns 16.835 ns 0.90 0.05
Get_CustomDict 11.809 ns 0.5410 ns 0.5556 ns 11.587 ns 0.87 0.04
Get_CustomSet 13.536 ns 0.1752 ns 0.1638 ns 13.540 ns 1.00 0.00
Get_CustomFastSet 11.547 ns 0.1316 ns 0.1231 ns 11.573 ns 0.85 0.01
Set_StringDict 24.956 ns 0.4348 ns 0.4067 ns 24.889 ns 0.97 0.02
Set_StringSet 25.757 ns 0.2293 ns 0.2145 ns 25.667 ns 1.00 0.00
Set_StringFastSet 25.145 ns 0.3634 ns 0.3399 ns 25.057 ns 0.98 0.01
Set_IntDict 8.010 ns 0.0707 ns 0.0661 ns 7.988 ns 0.76 0.01
Set_IntSet 10.552 ns 0.0894 ns 0.0792 ns 10.537 ns 1.00 0.00
Set_IntFastSet 8.005 ns 0.1733 ns 0.5109 ns 8.203 ns 0.66 0.03
Set_ObjDict 19.934 ns 0.1394 ns 0.1236 ns 19.939 ns 1.08 0.02
Set_ObjSet 18.538 ns 0.2455 ns 0.2297 ns 18.520 ns 1.00 0.00
Set_ObjFastSet 16.306 ns 0.0373 ns 0.0291 ns 16.315 ns 0.88 0.01
Set_CustomDict 14.611 ns 0.0883 ns 0.0783 ns 14.610 ns 1.09 0.02
Set_CustomSet 13.434 ns 0.2412 ns 0.2138 ns 13.353 ns 1.00 0.00
Set_CustomFastSet 11.443 ns 0.1962 ns 0.1835 ns 11.479 ns 0.85 0.01
@JeffreyZhao

This comment has been minimized.

Copy link
Author

commented Aug 7, 2019

The check failed at "Send to Helix". Is it a glitch on check server? Can we re-run the checks?

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

You should be able to see the test issue by clicking through the log. It's apparently unrelated Http failure. I'll open issue.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

Thanks for hte contribution. Do we have sufficient micro benchmark coverage in https://github.com/dotnet/performance ? If not would you consider improving coverage there? This is how we typically measure and prevent regression.

cc @adamsitnik

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

It would not be part of this change, but another pending backport to HashSet is dotnet/coreclr#23591.

@eiriktsarpalis eiriktsarpalis requested a review from safern Aug 8, 2019

@eiriktsarpalis eiriktsarpalis added this to the 5.0 milestone Aug 8, 2019

@JeffreyZhao

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

@danmosemsft: You should be able to see the test issue by clicking through the log. It's apparently unrelated Http failure. I'll open issue.

Thanks, I've noticed that now. BTW why it's not part of previous step (compile and test) but "Send to Helix", and what's that step?

Thanks for hte contribution. Do we have sufficient micro benchmark coverage in https://github.com/dotnet/performance ? If not would you consider improving coverage there? This is how we typically measure and prevent regression.

It seems we don't have any coverage for HashSet<> and there's only one case for Dictionary<int, int>.ContainsValue, which is a bit strange to me since it's one of the least used methods. Yes, I can add some for HashSet<> and probably Dictionary<,> too.

It would not be part of this change, but another pending backport to HashSet is dotnet/coreclr#23591.

Yes, I've noticed that too. That's what I mean by this:

I've intentionally kept the origin implementation and comments as
much as possible, and the optimized code is not exactly the same as
in Dictionary. Further micro-optimizations might be possible
but I think it's cleaner to keep it that way in the first step.

There're actually more optimizations can be put in HashSet<> like the one you mentioned, and also including the changes to other methods. My thought is to do it gradually so it's easier to review, test and merge. Those optimizations could be made later. Now I think it could be made after having more performance coverage.

@@ -74,22 +74,22 @@ public class HashSet<T> : ICollection<T>, ISet<T>, IReadOnlyCollection<T>, ISeri
private int _count;
private int _lastIndex;
private int _freeList;
private IEqualityComparer<T> _comparer = default!;
private IEqualityComparer<T>? _comparer = default!;

This comment has been minimized.

Copy link
@safern

safern Aug 8, 2019

Member

Do you still need the !? I don't think so.

This comment has been minimized.

Copy link
@JeffreyZhao

JeffreyZhao Aug 9, 2019

Author

You're right. Actually the whole default! isn't required. Do you think I should fix it now, or commit with other changes (if any) when the review is complete? Thanks.

This comment has been minimized.

Copy link
@safern

safern Aug 9, 2019

Member

Let's do it in a separate PR 😄 -- feel free to start preparing it.

@safern

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

the constructors and the OnDeserialization
method have been modified accordingly to save null instead of the
EqualityComparer.Default if it's been passed/saved.

@ViktorHofer would this break serialization scenarios?

@ViktorHofer

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

@ViktorHofer would this break serialization scenarios?

No it doesn't break serialization as during serialization the default equality comparer is used if the comparer is null. The description is a bit misleading but the change is fine.

Use Equals() to check _comparer in deserialization
After deserialization the `_comparer` will be set to a different
instance of default comparer, so we need to use `Equals` to compare
`_comparer` with `EqualityComparer<T>.Default` instead of reference
comparison by `==`.
{
if (slots[i].hashCode == hashCode && _comparer.Equals(slots[i].value, item))
int hashCode = item == null ? 0 : InternalGetHashCode(item.GetHashCode());

This comment has been minimized.

Copy link
@safern

safern Aug 9, 2019

Member

Should this be in the common path?

// see note at "HashSet" level describing why "- 1" appears in for loop
for (int i = buckets[hashCode % buckets.Length] - 1; i >= 0; i = slots[i].next)
{
if (slots[i].hashCode == hashCode && EqualityComparer<T>.Default.Equals(slots[i].value, item))

This comment has been minimized.

Copy link
@safern

safern Aug 9, 2019

Member

It's unfortunate that this code block is repeated 3 times but I guess there is no better way to do it.

Maybe there's a way to somehow merge the case where _comparer == null && default(t) == null with _comparer != null by sharing the comparer local, something like this?

IEqualityComparer<T>? comparer = _comparer != null ? _comparer : default(T) == null ? EqualityComparer<T>.Default : null;

if (comparer == null)
{
    ... // devirtualization code path.
}
else
{
   ... // comparer code path.
}

However you can validate with your benchmarks if this affects performance.

This comment has been minimized.

Copy link
@JeffreyZhao

JeffreyZhao Aug 12, 2019

Author

The code uses the same pattern as in Dictionary<,> and I intentionally keep the existing implementation detail in HashSet in this change since it's just step one. There're more back-port and further optimization can be done later.

In term of your suggestion here, it's actually not quite the same, since in original change, the 2nd the 3rd branch are not the same. The 2nd branch is using EqualityComparer<T>.Default which is a type of EqualityComparer<T> (an abstract class), but the 3rd branch is using IEqualityComparer<T> (an interface). I've tried this way before and it turns out the performance it a bit slower, probably because virtual method dispatching is faster than interface method dispatching.

I've also tried to check if the custom _comparer is an EqualityComparer<T> sub-class (so it becomes four branches), but there's no performance gain. I'll run benchmark again to confirm the result.

int collisionCount = 0;
Slot[] slots = _slots;
for (int i = _buckets[bucket] - 1; i >= 0; i = slots[i].next)

IEqualityComparer<T>? comparer = _comparer;

This comment has been minimized.

Copy link
@safern

safern Aug 9, 2019

Member

Same suggestions as above for merging code paths and calculating hashCode in the common path.

JeffreyZhao added some commits Aug 15, 2019

Ignore resetting "_comparer" in OnDeserialization
Previously we tried to reset `_comparer` in `OnDeserialization` if
it's `EqualityComparer<T>.Default`. But after discussion in #40150
we decide to follow the same approach as in `Dictionary<,>` to make
it simple.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.