Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Randomize ValueTuple hash codes #14141

Merged
merged 3 commits into from Dec 10, 2016
Merged

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Nov 30, 2016

Per the comments in #14046, it was decided that the ValueTuple hash code should be randomized. This will allow for more flexibility if it needs to be changed again in the future.

cc @jkotas, @VSadov, @jcouv, @JonHanna

@@ -6,12 +6,14 @@ namespace System.Numerics.Hashing
{
internal static class HashHelpers
{
public static int RandomSeed { get; } = Guid.NewGuid().GetHashCode();
Copy link
Member

@benaadams benaadams Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do a random seed for hashing something like?

public static int RandomSeed { get; } = GenerateSeed();
private static int GenerateSeed()
{
    byte[] random = new Byte[sizeof(int)];
    var rng = System.Security.Cryptography.RandomNumberGenerator.Create();
    rng.GetNonZeroBytes(random);
    return BitConverter.ToInt32(random, 0);
}

/cc @blowdart

Copy link
Contributor Author

@jamesqo jamesqo Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams I did this in response to @jkotas' suggestion at https://github.com/dotnet/corefx/issues/14046#issuecomment-263295578. I don't think it is worth it to be too prudent randomizing this; if someone is smart enough, they can figure it out anyway. e.g.

int GetTheRandomSeed()
{
    int tuple = (1, 1);
    return Combine(1, Combine(1, 0)) ^ tuple.GetHashCode();
}

This is just a measure to make sure that if we need to change the algorithm in the future, we have more flexibility to do so.

Copy link
Member

@benaadams benaadams Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not trying to prevent the programmer from deriving the seed - don't have issue with that.

It's to reduce the predictability of the seed; CoCreateGuid isn't completely unpredictable.

Then Guid.GetHashCode() is very straight forward - once the seed is discovered externally, expensive hash collisions could be derived.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related issue https://github.com/dotnet/coreclr/issues/2279

Could be solved by wrapping key in a ValueTuple?

/cc @jskeet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ValueTuple is moved down to CoreLib you might not be able to use the Crypto functions there due to layering.

I think using the Guid to generate a random seed is a good place to start. Having a random hash code gives us the freedom to change it later with a different algorithm if we need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guid has enough randomness in it, even though it is not completely random. We use the same trick with Guid in https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Random.cs#L154.

Somebody raised concern in offline conversation that guid generation is somewhat expensive operation because of it fetches network card ID, etc. I am not sure how real the problem is. If it is, a good alternative may be "new Random().Next(Int32.MinValue, Int32.MaxValue)". It basically piggy backs on whatever is done to generate the random seed for System.Random.

System.Random also uses Guid right now but it is easy to fix if the guid generation is a concern. It can call "Microsoft.Win32.Random" that is a thin wrapper around /dev/random on Unix and BCryptGenRandom on Windows.

@@ -6,12 +6,14 @@ namespace System.Numerics.Hashing
{
internal static class HashHelpers
{
Copy link
Member

@jkotas jkotas Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be readonly field. Making everything a property make sense to public APIs. For internal APIs, property is just an extra weight and busy work for the JIT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to a static readonly.

@@ -249,7 +249,7 @@ internal static int CombineHashCodes(int h1, int h2)
// Forward to helper class in Common for this
// We keep the actual hashing logic there, so
// other classes can use it for hashing
return HashHelpers.Combine(h1, h2);
return HashHelpers.Combine(HashHelpers.Combine(h1, h2), HashHelpers.RandomSeed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to end up incorporating the random seed multiple times into hash codes? For example, now the function:

internal static int CombineHashCodes(int h1, int h2, int h3) 
{
    return CombineHashCodes(CombineHashCodes(h1, h2), h3);
}

is essentially:

internal static int CombineHashCodes(int h1, int h2, int h3) 
{
    int h4 = HashHelpers.Combine(HashHelpers.Combine(h1, h2), HashHelpers.RandomSeed);
    return HashHelpers.Combine(HashHelpers.Combine(h4, h3), HashHelpers.RandomSeed);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reason why it would be better to think about this as hashcode building not hashcode combining... .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub I refactored things so the random seed is only combined once (at the beginning of the hash).

@karelz karelz modified the milestone: 1.2.0 Dec 6, 2016
Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks :-)

FYI we still have a unittest related to hashcodes which is disabled (see https://github.com/dotnet/corefx/issues/10207). I need to take a look to see if it can be re-enabled or should be fixed or removed.

@jcouv jcouv merged commit 15d0033 into dotnet:master Dec 10, 2016
@jamesqo jamesqo deleted the randomized-hashing branch December 10, 2016 00:56
steveharter pushed a commit to steveharter/dotnet_corefx that referenced this pull request Dec 19, 2016
@jcouv
Copy link
Member

jcouv commented Dec 27, 2016

Linking to dotnet/roslyn#13177 (umbrella issue for ValueTuple changes).

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants