Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetHashCode code generator should produce code that doesn't overflow in checked mode. #24161

Merged
merged 1 commit into from Feb 14, 2018

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Jan 11, 2018

Fixes #24035
Fixes #17646

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner January 11, 2018 00:35
Copy link
Contributor

@jmarolf jmarolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

hashCode = hashCode * -1521134295 + base.GetHashCode();
hashCode = hashCode * -1521134295 + j.GetHashCode();
hashCode = hashCode ^ base.GetHashCode();
hashCode = (hashCode << 5) | (hashCode >> 27);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Did you mean for this to be a rotation? If so, the right shift needs to be unsigned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. let me ask the original hash people about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Switched to unsigned right shift. Awesome catch!

@@ -1868,6 +1868,10 @@ public SyntaxNode TupleElementExpression(ITypeSymbol type, string name = null)
/// </summary>
public abstract SyntaxNode BitwiseOrExpression(SyntaxNode left, SyntaxNode right);

internal abstract SyntaxNode ExclusiveOrExpression(SyntaxNode left, SyntaxNode right);
internal abstract SyntaxNode LeftShiftExpression(SyntaxNode left, SyntaxNode right);
internal abstract SyntaxNode RightShiftExpression(SyntaxNode left, SyntaxNode right);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: i cannot think of any reason why these should not be exposed publicly. They make sense for both C# and VB. @mattwar Were these intentionally not included? If so, why? if not, then i think we should just make public.

Copy link

@chm-tm chm-tm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hate to say it, but this can, again, throw in a checked environment.
In C#, you could use unchecked((int)((uint)hashCode >> 27))

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 12, 2018

That's right for all hashcode<0. This can be checked (sic) with this test code:

using System;
public class C 
{
    public static void Main() 
    {
        TestWrappedShiftLeft(0, 5);
        TestWrappedShiftLeft(int.MaxValue, 5);
        TestWrappedShiftLeft(0b0010_0110_0000_0011_1001_1011_1010_1100, 5);
        TestWrappedShiftLeft(int.MinValue, 5); //throws OverflowException
    }

    static void TestWrappedShiftLeft(int number, int shift)
    {
        void PrintNumberBinary(string message, int numberToPrint) 
            => Console.WriteLine($"{message}: { Convert.ToString(numberToPrint, 2).PadLeft(32, '0')}");
        PrintNumberBinary("Orginal", number);
        PrintNumberBinary("Shifted", checked((number << shift) | (int)((uint)number >> (32 - shift))));
        Console.WriteLine();
    }
}

Sharplab

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked conversion from int to uint can throw, so this no longer fixes the original problem. It appears there are two separate things going on:

  1. Trying to make the generated code not throw an exception in checked mode
  2. Changing the hash code algorithm

Given that both hash code algorithms require the use of unchecked to work in all cases, I recommend restricting this pull request to only include the changes on that front. The discussion on which hash code algorithm to use can and should be separate.

@CyrusNajmabadi
Copy link
Member Author

Ok. I've changed things. Now, we check if we're in a 'checked' context. If so:

  1. For C#, we wrap the code in an 'unchecked' statement block.
  2. For VB, we add a comment stating that this code could throw. I could find no way at all in checked mode to do unchecked math sanely and efficiently for VB. So this was the best compromise i could think of.

--

Again, in the future, we will detect and call to HashCode.Combine, which will solve the problem nicely. for now, this at least mitigates the problem.

@CyrusNajmabadi
Copy link
Member Author

@sharwell @jmarolf Can you rereview?

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 13, 2018

I added some comments regarding alternative hash code functions in the original issue #24035 (it felt more natural to discuss those things there). See this comment

@@ -1247,4 +1247,7 @@ Sub(&lt;parameterList&gt;) &lt;statement&gt;</value>
<data name="Use_IsNot_Nothing_check" xml:space="preserve">
<value>Use 'IsNot Nothing' check</value>
</data>
<data name="Warning_code_may_overflow_at_runtime_unless_removeintchecks_flag_is_supplied_to_compiler" xml:space="preserve">
<value>Warning: code may overflow ar runtime unless /removeintchecks flag is passed to compiler</value>
Copy link
Contributor

@MaStr11 MaStr11 Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: overflow ar runtime

You basically say "We are only able to provide an algorithm that fails at non deterministic occasions at runtime until you remove a safety net ( which many developers are not allowed to remove). Good luck with that.". I would say that anything, even the most terrible algorithm, is better than obviously buggy code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that anything, even the most terrible algorithm, is better than obviously buggy code.

As i mentioned in the other thread, i'm happy to take an algorithm that will work for both cases. That said, in the interim perioud, this at least lets the user know about the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This feature is shipped and needs to be fixed soon. I posted some other ideas in the corresponding issue. Based on those we can make a new PR if we are confident to have a better solution.

@CyrusNajmabadi
Copy link
Member Author

Yes. The comment describes the situation as it exists today.

@CyrusNajmabadi
Copy link
Member Author

A simple approach we could take for vb would be to do something like:

Return (Me.A, Me.B, Me.C).GetHashCode()

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 13, 2018

But this relies on a nuget package being installed on older platforms and the alternatives (anonymous classes, old tuples) allocate.

@CyrusNajmabadi
Copy link
Member Author

But this relies on a nuget package being installed on older platforms and the alternatives (anonymous classes, old tuples) allocate.

Yes. The purpose of IDE features isn't to be perfect, or solve problems for 100% of users in 100% of scenarios. They're to provide a reasonable level of help to a sufficient proportion of users. If the generated code isn't suitable to the user, then they can change things accordingly.

The difference here is that currently we are generating code that gives no indication that it can fail at runtime. The comment addresses that. The tuple approach gives a workable solution, if the user uses tuples. And if they don't, then they'll get a clear message that they need to change something.

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 14, 2018

In general I like the tuple idea because it checks the right boxes (concise, readily understandable and so on). A minor detail that needs to be considered when this gets implemented, is the limited number of elements (8) so this needs to generated code like this (Me.A, Me.B, ..., Me.H).GetHashCode() ^ (Me.I, Me.J,...).GetHashCode().

What I don't understand is how this features fallback mechanism should behave as there are so many options and conditions to consider.

Options

  1. Simple overflow depended algorithm that doesn't relies on any packages (current implementation)
  2. Simple algorithm that does not relies on overflows (we don't have anything working right now, but we are close to find one)
  3. ValueTuple.GetHashCode
  4. System.HashCode.Combine

Some more intressting questions: Are there any plans to change ValueTuple.GetHashCode to use System.HashCode.Combine in the future? Is System.Hashcode distributed as Nuget package so it can be used on older platforms?

Conditions

  1. Platform supports System.Hashcode.
  2. ValueTuple package is installed or platforms supports ValueTuple.
  3. Platform supports unchecked keyword.
  4. Platform doesn't support anything of the above.

I think it is worth to elaborate this and it this already was started here #17646 and should be discussed there (Especially how many fallback steps there are [this could be up to 4 in theory]).

@CyrusNajmabadi
Copy link
Member Author

A minor detail that needs to be considered when this gets implemented, is the limited number of elements (8)

I don't think that's the case. The compiler takes care of that for you. You can have a tuple with more than 8 elements, and the compiler will already appropriately create the nested ValueTuple representation for it.

@CyrusNajmabadi
Copy link
Member Author

What I don't understand is how this features fallback mechanism should behave as there are so many options and conditions to consider.

I think my point is: we really don't have to overthink this that much. We're not the compiler. We're the IDE. It's sufficient to be useful for the common cases, and to also leave it to the user in others.

So, for example, if:

  1. the user is in checked mode.
  2. the user is in VB.
  3. the user doesn't have access to ValueTuple

It's ok if the user then has to write their own GetHashCode implementation. The bug i'm trying to fix here is that we emit code that can overflow without any indication that that can happen.

@CyrusNajmabadi
Copy link
Member Author

Ok. We fallback to using tuples if they're available to the user. Otherwise, we fallback to adding the comment.

@CyrusNajmabadi
Copy link
Member Author

Also updated PR to use System.HashCode if it is available.

@CyrusNajmabadi
Copy link
Member Author

PR has been updated to generate code that i think works in VB if overflow checks are on.

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 19, 2018

to generate code that i think works in VB if overflow checks are on

I wrote a small application and tested this algorithm. These are my findings:

  • No overflow failures in checked code.
  • Distributes the hashes equally.
  • Seems to avoid collisions on adjacent hash codes (e.g. 15 and 10 = -280.053.546 vs 15 and 11 = -280.053.551)

So from my perspective this seems a reasonable well algorithm.

This is what my test suite covered:
Range (i1) with the values: int.MinValue .. int.MinValue + 20, int.MinValue + 214748364 .. int.MinValue + 214748367, -30 .. 109, 214748364 .. 214748370, int.MaxValue-10 .. int.MaxValue

Take every i1 and combine with every i2 in the range int.MinValue .. int.MaxValue. I recorded the number of collisions for each possible hash code:

  • 4.294.966.539 hash codes had 181 collisions.
  • 469 hash codes had 180 collisions.
  • 288 hash codes had 182 collisions.

I also tracked the combinations that produced the most collisions during the test run (to see whether a certain combinations/hash codes are somehow biased).

Max Collisions: 1 at hashcode=143737123, i1=-2147483647, i2=-2147483648
Max Collisions: 2 at hashcode=412172473, i1=-2147483646, i2=-2147483648
Max Collisions: 3 at hashcode=412172476, i1=-2147483646, i2=2147483644
Max Collisions: 4 at hashcode=412172474, i1=-2147483645, i2=-626349352
Max Collisions: 5 at hashcode=412172474, i1=-2147483644, i2=-178957417
Max Collisions: 6 at hashcode=412172474, i1=-2147483643, i2=1342176878
Max Collisions: 7 at hashcode=412172474, i1=-2147483641, i2=-1700091712
Max Collisions: 8 at hashcode=412172474, i1=-2147483640, i2=1610612466
Max Collisions: 9 at hashcode=412172475, i1=-2147483639, i2=-1163220535
Max Collisions: 10 at hashcode=412172474, i1=-2147483638, i2=-1431656124
Max Collisions: 11 at hashcode=412172474, i1=-2147483637, i2=89478171
Max Collisions: 12 at hashcode=412172474, i1=-2147483636, i2=536870122
Max Collisions: 13 at hashcode=949051318, i1=-2147483635, i2=-1700091720
Max Collisions: 14 at hashcode=949051318, i1=-2147483634, i2=-1968527312
Max Collisions: 15 at hashcode=412172474, i1=-2147483633, i2=-984264173
Max Collisions: 16 at hashcode=412172476, i1=-2147483632, i2=-1968527290
Max Collisions: 17 at hashcode=412172476, i1=-2147483631, i2=-447392995
Max Collisions: 18 at hashcode=412172475, i1=-2147483630, i2=-715828584
Max Collisions: 19 at hashcode=2022792662, i1=-2147483629, i2=-1879048059
Max Collisions: 20 at hashcode=1480068200, i1=-2147483628, i2=-1879045949
Max Collisions: 21 at hashcode=1748501294, i1=-1932735284, i2=-2069240031
Max Collisions: 22 at hashcode=949051318, i1=-1932735283, i2=-2146765540
Max Collisions: 23 at hashcode=2022792662, i1=-1932735282, i2=-1341459852
Max Collisions: 24 at hashcode=412172474, i1=-1932735281, i2=-1430937489
Max Collisions: 25 at hashcode=2022792662, i1=-30, i2=-1836323030
Max Collisions: 26 at hashcode=1748501294, i1=-29, i2=-1836320515
...
Max Collisions: 174 at hashcode=1598993337, i1=2147483639, i2=-2104844304
Max Collisions: 175 at hashcode=978068294, i1=2147483640, i2=-2089274146
Max Collisions: 176 at hashcode=2135863434, i1=2147483641, i2=-2104844777
Max Collisions: 177 at hashcode=270009868, i1=2147483642, i2=-2104842050
Max Collisions: 178 at hashcode=1246504196, i1=2147483643, i2=-2089274864
Max Collisions: 179 at hashcode=2051662533, i1=2147483644, i2=-2088897731
Max Collisions: 180 at hashcode=1062123162, i1=2147483645, i2=-2104843769
Max Collisions: 181 at hashcode=803729452, i1=2147483646, i2=-2104840217

This is quite a limited test (It only tested the combination of two hash codes) but better than no test at all.

And I also like how the generated code looks like.

@CyrusNajmabadi
Copy link
Member Author

Sorry, not exactly sure how to interpret this part of your post:

Take every i1 and combine with every i2 in the range int.MinValue .. int.MaxValue. I recorded the number of collisions for each possible hash code:

4.294.966.539 hash codes had 181 collisions.
469 hash codes had 180 collisions.
288 hash codes had 182 collisions.

Can you clarify what you mean by this?

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 19, 2018

I recorded for every hashcode how often it was generated (a collision). The numbers show, that almost all hashcode had 181 collisions.

@CyrusNajmabadi
Copy link
Member Author

I recorded for every hashcode how often it was generated (a collision).

How large was your input size?

@MaStr11
Copy link
Contributor

MaStr11 commented Jan 19, 2018

I took took about 180 numbers (i1) from different portions of the integer range (see above for the exact ranges). I combined each i1 with all integers (i2).

@CyrusNajmabadi
Copy link
Member Author

I combined each i1 with all integers (i2).

And you're saying when you did this, you got around 180 collisions? i.e. you picked a number like 311, and then hashed it against around 2^32 values. And when doing that, you got 180 collisions at the end?

@sharwell
Copy link
Member

@CyrusNajmabadi Let me know today if you want this in 15.7 (would require rebase), otherwise I'll merge it for 15.8. 👍

@sharwell sharwell added this to Waiting to Merge in IDE: sharwell Feb 14, 2018
@CyrusNajmabadi
Copy link
Member Author

What's the eta on 15.8? a few months? or like a year from now? If the former, i can wait. If the latter, i'd be happy to rebase.

@sharwell
Copy link
Member

Your guess is as good as mine 😆

Produce warning for VB code.

Fix spelling mistake.

Add loc files.

Fall back to tuples if they're available when making a hashcode.

Use System.HashCode if its available.

Use 64bit math when computing VB hashcodes in checked contexts.

Use full 32 bits for vb hash.

Cleanup
@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners February 14, 2018 18:44
@CyrusNajmabadi CyrusNajmabadi changed the base branch from master to dev15.7.x February 14, 2018 18:44
@CyrusNajmabadi
Copy link
Member Author

Ok. I rebased to 15.7.x. Thanks!

@jinujoseph
Copy link
Contributor

@Pilchie for ask mode approval

@agocke
Copy link
Member

agocke commented Feb 14, 2018

@CyrusNajmabadi Why did you request compiler review for this?

@sharwell
Copy link
Member

@agocke The request was automatically applied per the policy in CODE_OWNERS.

@CyrusNajmabadi
Copy link
Member Author

@agocke I didn't :) Indeed, i don't think i can actually request reviewers myself.

@sharwell sharwell added this to the 15.7 milestone Feb 14, 2018
@sharwell
Copy link
Member

@Pilchie for approval

@agocke agocke removed the request for review from a team February 14, 2018 21:22
@agocke
Copy link
Member

agocke commented Feb 14, 2018

@sharwell Thanks. I've dropped compilers, since I don't think we're relevant here.

@Pilchie
Copy link
Member

Pilchie commented Feb 14, 2018

Approved to merge for 15.7

@sharwell sharwell merged commit 55c74cc into dotnet:dev15.7.x Feb 14, 2018
@sharwell sharwell added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Feb 14, 2018
@sharwell sharwell removed this from Waiting to Merge in IDE: sharwell Feb 14, 2018
@CyrusNajmabadi
Copy link
Member Author

Thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved to merge Community The pull request was submitted by a contributor who is not a Microsoft employee.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants