-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueTuple with more than 8 fields calculates its hash code incorrectly #13087
Comments
This has come up before I think. It is a tradeoff between cost of calculating hash code and cost of collisions for structs with sufficient size. |
Understandable, but still..
;-) |
What about it "doesn't work"? There is nothing that requires a GetHashCode to factor in all data from a type (and in fact frequently implementations don't). The only guarantee it need make is that two values that should be considered equal return the same hash code (in that light a GetHashCode that always returns 0 is correct, but obviously will likely lead to subsequent performance deficiencies). This is about a perf trade-off, not correctness. |
@stephentoub Sorry, but I can't agree with that. It's like having print() method, which prints only 8 arguments (for performance reasons), or a String.Join() which joins only 8 elements of the collection.. The examples may be a bit ;-) exaggerated, but you get the idea. It's about consistency in the API. Besides that, you don't know anything about Our Business Case. Perhaps we have a hundred tuples like that (which is nothing), so the performance issue does not exist, yet the hash collision problem remains.. Do you really think that changing the code, because the class has suddenly more than 8 fields, is reasonable? |
It is in no way the same as those.
You're right, I don't. If your business case demands such explicit control over exactly how a GetHashCode works, you will need to implement it yourself on your own type, and not use the built-in types for this. |
Might want to use |
@dioptryk assuming you accept it's a performance issue, not a correctness issue, consider that hashing all the fields might hurt performance more than the reduced collisions helps. Clearly, there has to be a cut-over, and it's apparently 8 in this case. If it hashed 1000 fields, it would be worse in every respect. |
Using a 1000 field struct/ValueTuple as a key in a dictionary will have be terrible performance to begin with; hashcode performance would likely to be the bottleneck? |
Yes, this is a correct I had practical cases in which leaving out parts of the data for hashing purposes lead to catastrophic hash table behavior. In my opinion, leaving out fields entirely is a trap that is not appropriate given the usual reliability and quality standards of the .NET class libraries. Bad hash code functions usually fail more gracefully because at least they still distribute the data over many buckets. This function amounts to |
I am currently evaluating whether to use large, generated In my opinion, the ValueTuple documentation should state that not all items of the tuple are factored into the hash code calculation. That would save us developers some time, since apparently there are some that are interested in such functionality. However, I don't understand the criticism here. No GetHashCode implementation is obliged to generate unique results; a |
Consider the following code:
The ValueTuple ignores first N-8 fields when calculating hash code. It even says so in a comment in ValueTuple.cs. The offending code is:
To be honest, this seems like a bug to me, since this behaviour is described nowhere but the source. We were using tuple syntax to calculate hash codes of several class members and just noticed, that even with some fields changed, the hashes were still equal.. I understand this will have performance implications, but still I find the GetHashCode() unusable in its current state for "big" tuples. This is wrong for both .NET Framework and Core.
The text was updated successfully, but these errors were encountered: