-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the default hash code for structs #4414
Comments
To use a struct efficiently as key in hash based collections you already have to override GetHashCode and implement IEquatable. Otherwise you will get a lot of boxing. |
Here is an example: const int Iters = 10 * 1000000;
var dict = new Dictionary<Key, int>(Iters); // prealocate all the required storage
for (int i = 0; i < Iters; i++)
{
dict.Add(new Key(i), 0);
}
...
public struct Key
{
int i;
public Key(int i)
{
this.i = i;
}
} It takes 775 ms and allocates 24 bytes (on average) for each Add call. If we implement GetHashCode: public struct Key
{
int i;
public Key(int i)
{
this.i = i;
}
public override int GetHashCode()
{
return i;
}
} it takes 200 ms with no allocations. Let's make the generated hash code not so perfect. It will cause Equals to be called. public override int GetHashCode()
{
return i % (1 << 20);
} It will cost us 2.5 seconds and 205 bytes per call. Finally implement IEquatable: public struct Key: IEquatable<Key>
{
int i;
public Key(int i)
{
this.i = i;
}
public override int GetHashCode()
{
return i % (1 << 20);
}
public bool Equals(Key other)
{
return i == other.i;
}
} Now even though the GetHashCode implementation causes hash collisions |
@omariom that is a good point. This seems like a JIT limitation. Maybe we can fix the JIT in this regard as well. The JIT certainly doesn't have to box a struct in order to call default equals and GetHashCode methods. The dictionaries code is being specialized for this exact struct type. This means that when jitting the exact type is known. |
@omariom Is implementing |
Overriding Object.Equals will reduce boxing but to get rid of it completely IEquatable is necessary. |
The default Equals takes an What might work is to treat That said, I think implementing |
Maybe all that's needed in the JIT is:
It would be even nicer if the C# and/or the CLR had the necessary meta-programming facilities to push all equality concerns (including IEquatable) of a type into an attribute or "template". That is likely not on the table, though. |
@mikedn That is the answer: generate specialized code for each and every value type. |
@DemiMarie I'm not sure what your comment adds to this thread. Of course that specialized code is faster but we don't know when that code will be generated, how it will be called etc. |
This is an interesting problem, but perhaps the most important issue associated with it is how much effort(cost) should be devoted to it. which is unclear. We could do absolutely nothing (after it always 'works' and even works OK many times, and arguably users who care about perf can always fix it (by overriding the methods needed). Also the need for this is moderately rare (you need a user defined type that is a key in a dictionary). The other extreme is quite a bit of JIT magic to make it work well (but probably not optimally) in all cases. From what I can tell this is moderately expensive (when you synthesize methods out of 'nothing' (No IL), things tend to be at least moderately expensive. My guess as to the right code-benefit, is to see if you can simply fix RegularGetValueTypeHashCode to not ever produce horrendous hashs. This should be very easy (modifying one method), and avoid the worst problem (because users that care about perf should be implementing the methods because that will always give the best perf). Ideally we have some idea of the scenarios that are performing poorly today. (Examples of REAL WORLD examples where the hash is very poor, along with some idea of how much better it would be if the hash was fixed). If people have this information it should be added to the discussion. If anyone wants to go do that work in RegularGetValueTypeHashCode (it is straightforward enough), I would be supportive of getting it checked in. (low cost/risk, reasonable value). If anyone wishes to pursue more aggressive tactics, you should talk about them here before proceeding. We have to be sure of the cost/benefit before you spend time on it. |
Now that the HashCode type has been merged, we should consider making it available via ValueType.GetHashCode. It should have much higher quality hashes. |
I agree this is an area that needs fixing. Defining Equals and GetHashCode over and over again is a pain, particularly if you want a decent hash function. |
Apparently, the default
GetHashCode
implementation for structs is really bad: http://stackoverflow.com/a/5927853/ (seeFastGetValueTypeHashCodeHelper
). This answer lists a lot of problems. For example, if the struct contains a reference type field the hash code is based on one field only. This can cause excessive hash collisions.I'd like to propose that the default hash code and equality be changed to a more useful form. The hash code and equality comparison should always be based on calling
GetHashCode
andEquals
on all fields. The hash code could be based in the common patternaccumulator * 37 + nextHashCode
. Resharper uses that for its "generate GetHashCode" feature and it's quite reasonable.Ideally, the code to do this could be added to each struct type at runtime so that the JIT can generate super fast code for this on demand. In many cases this would obviate the need for custom overrides of these two functions.
In order to mitigate compatibility risk this should be tied to some form of compatibility switch. The
BinaryCompatibility
class infrastructure seems to be made to address exactly this problem.The text was updated successfully, but these errors were encountered: