ValueTuple with more than 8 fields calculates its hash code incorrectly #13087

dioptryk · 2019-07-16T12:04:51Z

Consider the following code:

Console.WriteLine((  1, 2, 3, 4, 5, 6, 7, 8).GetHashCode());
Console.WriteLine((777, 2, 3, 4, 5, 6, 7, 8).GetHashCode()); // different result
		
Console.WriteLine((  1, 2, 3, 4, 5, 6, 7, 8, 9, 10).GetHashCode());
Console.WriteLine((777, 2, 3, 4, 5, 6, 7, 8, 9, 10).GetHashCode()); // same result

The ValueTuple ignores first N-8 fields when calculating hash code. It even says so in a comment in ValueTuple.cs. The offending code is:

if (size >= 8) { return rest.GetHashCode(); }

To be honest, this seems like a bug to me, since this behaviour is described nowhere but the source. We were using tuple syntax to calculate hash codes of several class members and just noticed, that even with some fields changed, the hashes were still equal.. I understand this will have performance implications, but still I find the GetHashCode() unusable in its current state for "big" tuples. This is wrong for both .NET Framework and Core.

The text was updated successfully, but these errors were encountered:

danmoseley · 2019-07-16T22:53:03Z

This has come up before I think. It is a tradeoff between cost of calculating hash code and cost of collisions for structs with sufficient size.

dioptryk · 2019-07-17T14:06:10Z

Understandable, but still..

"If it doesn’t work, it doesn’t matter how fast it doesn’t work."
— Mich Ravera

;-)
It should be (at the very least) clarified in the docs, IMO. Who knows how many people are unknowingly affected by this. Besides that, decision about being the subject to such an inconsistent optimization should be left to the programmer using the API, I think.

stephentoub · 2019-07-17T15:34:17Z

"If it doesn’t work, it doesn’t matter how fast it doesn’t work"

What about it "doesn't work"? There is nothing that requires a GetHashCode to factor in all data from a type (and in fact frequently implementations don't). The only guarantee it need make is that two values that should be considered equal return the same hash code (in that light a GetHashCode that always returns 0 is correct, but obviously will likely lead to subsequent performance deficiencies). This is about a perf trade-off, not correctness.

dioptryk · 2019-07-18T12:46:05Z

@stephentoub Sorry, but I can't agree with that. It's like having print() method, which prints only 8 arguments (for performance reasons), or a String.Join() which joins only 8 elements of the collection.. The examples may be a bit ;-) exaggerated, but you get the idea. It's about consistency in the API. Besides that, you don't know anything about Our Business Case. Perhaps we have a hundred tuples like that (which is nothing), so the performance issue does not exist, yet the hash collision problem remains.. Do you really think that changing the code, because the class has suddenly more than 8 fields, is reasonable?

stephentoub · 2019-07-18T12:48:07Z

It's like having print() method, which prints only 8 arguments (for performance reasons), or a String.Join() which joins only 8 elements of the collection

It is in no way the same as those.

Besides that, you don't know anything about Our Business Case.

You're right, I don't. If your business case demands such explicit control over exactly how a GetHashCode works, you will need to implement it yourself on your own type, and not use the built-in types for this.

benaadams · 2019-07-18T13:55:08Z

Might want to use System.HashCode rather than ValueTuple to generate your hash code

danmoseley · 2019-07-18T16:04:04Z

@dioptryk assuming you accept it's a performance issue, not a correctness issue, consider that hashing all the fields might hurt performance more than the reduced collisions helps. Clearly, there has to be a cut-over, and it's apparently 8 in this case. If it hashed 1000 fields, it would be worse in every respect.

benaadams · 2019-07-18T17:30:54Z

If it hashed 1000 fields, it would be worse in every respect.

Using a 1000 field struct/ValueTuple as a key in a dictionary will have be terrible performance to begin with; hashcode performance would likely to be the bottleneck?

GSPP · 2019-08-31T16:33:26Z

Yes, this is a correct GetHashCode implementation, but...

I had practical cases in which leaving out parts of the data for hashing purposes lead to catastrophic hash table behavior. In my opinion, leaving out fields entirely is a trap that is not appropriate given the usual reliability and quality standards of the .NET class libraries.

Bad hash code functions usually fail more gracefully because at least they still distribute the data over many buckets. This function amounts to int GetHashCode() => 0; for all but the last 8 arguments. That can get catastrophic rather easily. And catastrophic means O(N^2) instead of O(N).

EnCey · 2021-07-29T12:43:07Z

I am currently evaluating whether to use large, generated ValueTuples to determine whether 2 sets of values are equal and thus also stumbled upon this.

In my opinion, the ValueTuple documentation should state that not all items of the tuple are factored into the hash code calculation. That would save us developers some time, since apparently there are some that are interested in such functionality.

However, I don't understand the criticism here. No GetHashCode implementation is obliged to generate unique results; a List<T> never computed a hash code based on all its items, and plenty non-collection types only use a fraction of their fields for theirs. The most obvious case are types that contain an "ID" and return a hash code derived solely from that, ignoring all other fields.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueTuple with more than 8 fields calculates its hash code incorrectly #13087

ValueTuple with more than 8 fields calculates its hash code incorrectly #13087

dioptryk commented Jul 16, 2019

danmoseley commented Jul 16, 2019

dioptryk commented Jul 17, 2019

stephentoub commented Jul 17, 2019

dioptryk commented Jul 18, 2019

stephentoub commented Jul 18, 2019 •

edited

Loading

benaadams commented Jul 18, 2019

danmoseley commented Jul 18, 2019

benaadams commented Jul 18, 2019 •

edited

Loading

GSPP commented Aug 31, 2019 •

edited

Loading

EnCey commented Jul 29, 2021 •

edited

Loading

ValueTuple with more than 8 fields calculates its hash code incorrectly #13087

ValueTuple with more than 8 fields calculates its hash code incorrectly #13087

Comments

dioptryk commented Jul 16, 2019

danmoseley commented Jul 16, 2019

dioptryk commented Jul 17, 2019

stephentoub commented Jul 17, 2019

dioptryk commented Jul 18, 2019

stephentoub commented Jul 18, 2019 • edited Loading

benaadams commented Jul 18, 2019

danmoseley commented Jul 18, 2019

benaadams commented Jul 18, 2019 • edited Loading

GSPP commented Aug 31, 2019 • edited Loading

EnCey commented Jul 29, 2021 • edited Loading

stephentoub commented Jul 18, 2019 •

edited

Loading

benaadams commented Jul 18, 2019 •

edited

Loading

GSPP commented Aug 31, 2019 •

edited

Loading

EnCey commented Jul 29, 2021 •

edited

Loading