Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span-based HashAlgorithm fails seemingly randomly #27088

Closed
mrmartan opened this issue Aug 8, 2018 · 6 comments
Closed

Span-based HashAlgorithm fails seemingly randomly #27088

mrmartan opened this issue Aug 8, 2018 · 6 comments
Milestone

Comments

@mrmartan
Copy link

mrmartan commented Aug 8, 2018

I've recently noticed that .NET Core 2.1 span-based hash computation sometimes generates a result of 0000000000000000000000000000000000000000000000000000000000000000. It does not depend on the input and repeated runs with the same input do produce correct results. I also noticed this is happening only on Linux (docker container based on microsoft/dotnet:2.1.2-aspnetcore-runtime and Microsoft.AspNetCore.App 2.1.2), or rather it did not happen on Windows for me.

According to application logs this is happening in 1 out of 10k cases. Access to the instance of the class is multi-threaded.

public class SubjectIdentityBuilder : IDisposable
{
private readonly SHA256 sha;
private readonly uint[] lookupTable;
private readonly ReadOnlyMemory<byte> salt;
private readonly ReadOnlyMemory<byte> vs = new byte[32]; // byte array of 0s

public SubjectIdentityBuilder()
{
    this.sha = SHA256.Create();
    this.lookupTable = CreateLookup32();
    this.salt = Encoding.Unicode.GetBytes("b398b5d74421a180817d2da5eb7c9b1b").AsMemory();
}

public string ComputeSubjectIdentityValue(string email)
{
    if (string.IsNullOrWhiteSpace(email))
    {
        throw new ArgumentNullException(nameof(email));
    }

    // decode to bytes
    // using Unicode/UTF-16 because subject identity hashes are sometimes calculated in SQL Server SPs (they work with UTF-16)
    var source = Encoding.Unicode.GetBytes(email).AsSpan();
            
    // add salt
    Span<byte> hashInput = stackalloc byte[source.Length + salt.Length];
    source.TryCopyTo(hashInput);
    salt.Span.TryCopyTo(hashInput.Slice(source.Length));            

    // compute hash
    Span<byte> hash = stackalloc byte[32];
    if (!this.sha.TryComputeHash(hashInput, hash, out var _))
    {
        throw new CryptographicException($"Failed to compute hash for '{email}'");
    }

    if (hash.SequenceEqual(vs.Span))
    {
        throw new SubjectIdentityBuilderComputationException($"Failed to compute hash for '{email}'");
    }

    // encode as hex string
    return ByteSpanToHexString(hash);
}

#region https://stackoverflow.com/a/24343727
private string ByteSpanToHexString(Span<byte> sourceArray)
{
    Span<char> result = stackalloc char[sourceArray.Length * 2];
    for (int i = 0; i < sourceArray.Length; i++)
    {
        var val = this.lookupTable[sourceArray[i]];
        result[2 * i] = (char)val;
        result[2 * i + 1] = (char)(val >> 16);
    }

    return new string(result);
}

private static uint[] CreateLookup32()
{
    var l = new uint[256];
    for (int i = 0; i < 256; i++)
    {
        string s = i.ToString("X2");
        var v = ((uint)s[0]) + ((uint)s[1] << 16);
        l[i] = v;
    }
    return l;
}
#endregion

#region Generated IDisposable Support
        
// omitted

#endregion
}
@vcsjones
Copy link
Member

vcsjones commented Aug 8, 2018

On macOS at least, it appears that SHA256.Create does not create a thread-safe hash object.

If I do this:

var bag = new ConcurrentBag<string>();
var plo = new ParallelOptions { MaxDegreeOfParallelism = 4 };
var builder = new SubjectIdentityBuilder();
Parallel.For(0, 1_000, plo, () => new List<string>(),  (i, pls, list) => {
    list.Add(builder.ComputeSubjectIdentityValue("kevin@vcsjones.com"));
    return list;
}, list => {
    list.ForEach(bag.Add);
});
var distinct = bag.Distinct().Count();
Console.WriteLine(distinct);

Which uses the SHA256 object of the SubjectIdentityBuilder with multiple threads, the distinct result will be all other the place, like ~500 distinct results even though the hashing is deterministic. If I put a lock around the use of the SHA256:

lock (this.sha)
{
    if (!this.sha.TryComputeHash(hashInput, hash, out var _))
    {
         throw new CryptographicException($"Failed to compute hash for '{email}'");
    }
}

then I start getting a correct distinct value of 1.

@filipnavara
Copy link
Member

I don't think HashAlgorithm.ComputeHash or TryComputeHash was ever thread-safe.

In .NET Framework it was just a nice way of combining HashCore+HashFinal+Initialize into one nice call. The fact remains that the state between HashCore and HashFinal is still held by the class instance and hence it is not thread-safe.

@vcsjones
Copy link
Member

vcsjones commented Aug 8, 2018

Seems that way.

https://github.com/dotnet/corefx/blob/57608f6a5cfeadf338dc6c5d0300147b39168012/src/System.Security.Cryptography.Primitives/src/System/Security/Cryptography/HashAlgorithm.cs#L64-L65

These days there is HashProvider that holds the state of the hash.

@mrmartan your issue is likely because sharing a hashing instance is not thread safe.

@mrmartan
Copy link
Author

mrmartan commented Aug 8, 2018

Thanks for the info.

I half suspected the issue would be thread-safety. Can't tell what led me to believe HashAlgorithm was thread-safe. the documentation states non-static members are not guaranteed to be thread-safe.
HashAlgorithm is disposable though and I assume creating millions of its instances is not going to be cheap (and I need to create millions of hashes in a cheap way).

The HashProvider seems to be the way to go. All I was able to find were the sources for the internal abstract class HashProvider and nothing about using it for SHA256. All documentation seems to point to HashAlgorithm-based implementations.

@vcsjones Can you point me in the right direction?

@vcsjones
Copy link
Member

vcsjones commented Aug 8, 2018

Sorry, I didn't mean to say that HashProvider is the solution. I would simply change your code to use a new instance of SHA256. You could do something more clever like use ThreadLocal to have a per-thread instance.

@mrmartan
Copy link
Author

mrmartan commented Aug 8, 2018

Oh, ok. Seems usable. It's a bummer though (if I understand you correctly) that there is no thread-safe implementation of SHA hashing algorithms in .NET available.

@mrmartan mrmartan closed this as completed Aug 8, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants