Conversation
@@ -6,12 +6,14 @@ namespace System.Numerics.Hashing | |||
{ | |||
internal static class HashHelpers | |||
{ | |||
public static int RandomSeed { get; } = Guid.NewGuid().GetHashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do a random seed for hashing something like?
public static int RandomSeed { get; } = GenerateSeed();
private static int GenerateSeed()
{
byte[] random = new Byte[sizeof(int)];
var rng = System.Security.Cryptography.RandomNumberGenerator.Create();
rng.GetNonZeroBytes(random);
return BitConverter.ToInt32(random, 0);
}
/cc @blowdart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benaadams I did this in response to @jkotas' suggestion at https://github.com/dotnet/corefx/issues/14046#issuecomment-263295578. I don't think it is worth it to be too prudent randomizing this; if someone is smart enough, they can figure it out anyway. e.g.
int GetTheRandomSeed()
{
int tuple = (1, 1);
return Combine(1, Combine(1, 0)) ^ tuple.GetHashCode();
}
This is just a measure to make sure that if we need to change the algorithm in the future, we have more flexibility to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not trying to prevent the programmer from deriving the seed - don't have issue with that.
It's to reduce the predictability of the seed; CoCreateGuid
isn't completely unpredictable.
Then Guid.GetHashCode()
is very straight forward - once the seed is discovered externally, expensive hash collisions could be derived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue https://github.com/dotnet/coreclr/issues/2279
Could be solved by wrapping key in a ValueTuple?
/cc @jskeet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ValueTuple is moved down to CoreLib you might not be able to use the Crypto functions there due to layering.
I think using the Guid to generate a random seed is a good place to start. Having a random hash code gives us the freedom to change it later with a different algorithm if we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guid has enough randomness in it, even though it is not completely random. We use the same trick with Guid in https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Random.cs#L154
.
Somebody raised concern in offline conversation that guid generation is somewhat expensive operation because of it fetches network card ID, etc. I am not sure how real the problem is. If it is, a good alternative may be "new Random().Next(Int32.MinValue, Int32.MaxValue)". It basically piggy backs on whatever is done to generate the random seed for System.Random
.
System.Random
also uses Guid right now but it is easy to fix if the guid generation is a concern. It can call "Microsoft.Win32.Random" that is a thin wrapper around /dev/random
on Unix and BCryptGenRandom on Windows.
@@ -6,12 +6,14 @@ namespace System.Numerics.Hashing | |||
{ | |||
internal static class HashHelpers | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be readonly field. Making everything a property make sense to public APIs. For internal APIs, property is just an extra weight and busy work for the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to a static readonly.
@@ -249,7 +249,7 @@ internal static int CombineHashCodes(int h1, int h2) | |||
// Forward to helper class in Common for this | |||
// We keep the actual hashing logic there, so | |||
// other classes can use it for hashing | |||
return HashHelpers.Combine(h1, h2); | |||
return HashHelpers.Combine(HashHelpers.Combine(h1, h2), HashHelpers.RandomSeed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to end up incorporating the random seed multiple times into hash codes? For example, now the function:
internal static int CombineHashCodes(int h1, int h2, int h3)
{
return CombineHashCodes(CombineHashCodes(h1, h2), h3);
}
is essentially:
internal static int CombineHashCodes(int h1, int h2, int h3)
{
int h4 = HashHelpers.Combine(HashHelpers.Combine(h1, h2), HashHelpers.RandomSeed);
return HashHelpers.Combine(HashHelpers.Combine(h4, h3), HashHelpers.RandomSeed);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reason why it would be better to think about this as hashcode building not hashcode combining... .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub I refactored things so the random seed is only combined once (at the beginning of the hash).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks :-)
FYI we still have a unittest related to hashcodes which is disabled (see https://github.com/dotnet/corefx/issues/10207). I need to take a look to see if it can be re-enabled or should be fixed or removed.
Merging on behalf of @jamesqo
Linking to dotnet/roslyn#13177 (umbrella issue for ValueTuple changes). |
Merging on behalf of @jamesqo Commit migrated from dotnet/corefx@15d0033
Per the comments in #14046, it was decided that the
ValueTuple
hash code should be randomized. This will allow for more flexibility if it needs to be changed again in the future.cc @jkotas, @VSadov, @jcouv, @JonHanna