Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method consuming ReadOnlySpan<byte> to System.HashCode #48702

Closed
eiriktsarpalis opened this issue Feb 24, 2021 · 10 comments · Fixed by #54168
Closed

Add a method consuming ReadOnlySpan<byte> to System.HashCode #48702

eiriktsarpalis opened this issue Feb 24, 2021 · 10 comments · Fixed by #54168
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Feb 24, 2021

Background and Motivation

Originally posted by @GrabYourPitchforks in #48410 (comment)

Undefined meaning they don't work today on all supported platforms, or meaning we won't promise they'll work in the future?

With very specific exceptions, they're explicitly not tested for this scenario today even on supported platforms, there are no guarantees about future behavior. See also #44497.

IMO a better course of action would be to add ROS<byte>-consuming APIs to System.HashCode. It could be used both by this and by a bunch of other scenarios, such as if we ever need to add randomized hash code calculation to other primitives like double.

Proposed API

namespace System
{
    public struct HashCode
    {
         public void Add(ReadOnlySpan<byte> value);
    }
}

Usage Examples

Useful in defining equality comparison for buffers, which has a few applications in libraries code. See #48410 for context.

Risks

None known.

cc @stephentoub

@eiriktsarpalis eiriktsarpalis added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime labels Feb 24, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 24, 2021
@GrabYourPitchforks
Copy link
Member

Thanks! Might want to add an instance member as well so that hash codes of complex data types can be created.

@eiriktsarpalis
Copy link
Member Author

Something like this you mean?

public struct HashCode
{
    public void Add(ReadOnlySpan<byte> ros);
}

@GrabYourPitchforks
Copy link
Member

Yes. That would be required if we wanted to solve #4761 for complex key types like tuples. For instance, a collision-resistant hash code routine for ValueTuple<long, long> could be:

var hashCode = new HashCode();
hashCode.Add(in tuple.Item1); // call to internal API
hashCode.Add(in tuple.Item2); // call to internal API
return hashCode.ToHashCode();

With the following internal implementation APIs:

public struct HashCode
{
    internal void Add<T>(in T value) where T : INonPigeonholingHashCode;
}

internal interface INonPigeonholingHashCode
{
    void AddHashCodeTo(ref HashCode hashCodeCombiner);
}

public readonly struct Int64 : INonPigeonholingHashCode
{
    void INonPigeonholingHashCode.AddHashCodeTo(ref HashCode hashCodeCombiner)
    {
        // predicated on HashCode.Add(ROS<byte>) existing
        hashCodeCombiner.Add(MemoryMarshal.AsBytes(MemoryMarshal.AsReadOnlySpan(ref this, 1));
    }
}

@eiriktsarpalis
Copy link
Member Author

Would it make sense to also add a generic overload that takes ReadOnlySpan<T>?

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Feb 24, 2021
@GrabYourPitchforks
Copy link
Member

Probably wouldn't do generic ROS<T> until there's a need for it. Then we'd need to have a whole discussion on what the appropriate behavior is: loop, try to convert to ROS<byte> if it's a bunch of integral primitives, etc.

@stephentoub
Copy link
Member

I don't have a strong opinion, but is public static int GetHashCode(ReadOnlySpan<byte> ros); common enough that it's needed to enable a one-liner, rather than having someone write:

HashCode h = default;
h.Add(ros);
return h.ToHashCode();

? Or is the perf of that sufficiently different that it warrants a dedicated implementation?

@GrabYourPitchforks
Copy link
Member

If there is a perf difference I imagine it'll be trivial. There's nothing stopping us from adding the one-shot later if people ask for it.

@eiriktsarpalis eiriktsarpalis added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Feb 25, 2021
@eiriktsarpalis eiriktsarpalis self-assigned this Feb 25, 2021
@eiriktsarpalis eiriktsarpalis added area-System.Linq and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Feb 25, 2021
@ghost
Copy link

ghost commented Feb 25, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Originally posted by @GrabYourPitchforks in #48410 (comment)

Undefined meaning they don't work today on all supported platforms, or meaning we won't promise they'll work in the future?

With very specific exceptions, they're explicitly not tested for this scenario today even on supported platforms, there are no guarantees about future behavior. See also #44497.

IMO a better course of action would be to add ROS<byte>-consuming APIs to System.HashCode. It could be used both by this and by a bunch of other scenarios, such as if we ever need to add randomized hash code calculation to other primitives like double.

Proposed API

namespace System
{
    public struct HashCode
    {
         public int Add(ReadOnlySpan<byte> ros);
    }
}

Usage Examples

Useful in defining equality comparison for buffers, which has a few applications in libraries code. See #48410 for context.

Risks

None known.

cc @stephentoub

Author: eiriktsarpalis
Assignees: eiriktsarpalis
Labels:

api-ready-for-review, area-System.Linq, area-System.Runtime

Milestone: -

@Sergio0694
Copy link
Contributor

This would be very nice to have 😄

Related, we currently have the Microsoft.Toolkit.HighPerformance.Helper.HashCode<T>.Combine(ReadOnlySpan<T>) API as a workaround for the lack of this primitive in the BCL, supporting both managed and unmanaged types. For the former, I just run an unrolled DJB2 hash on the input span and then combine that with HashCode.Combine to make it non-deterministic, whereas for unmanaged types I have a vectorized implementation using Vector<T> (here), and combine the result with HashCode as before.
(Probably not the best possible collision-resistant solution ever as it's based on DJB2, but it's decent and it's quite fast compared to eg. a non-vectorized implementation or just doing HashCode.Combine(T) in a loop. This is especially important since this code is also internally used by StringPool to hash the input character sequences (source here), and minimizing that overhead is a priority).

If this API proposal gets approved I'd be happy to have the implementation in the HighPerformance converge to this on .NET 6 (plan to add a .NET 6 target to the package soon), and eventually just obsolete that if it becomes completely redundant with these new APIs in the BCL (depending on whether you also decide to add support for types other than byte as well).

@stephentoub stephentoub added this to the 6.0.0 milestone May 28, 2021
@stephentoub stephentoub added the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 13, 2021
@terrajobst
Copy link
Member

terrajobst commented Jun 14, 2021

Video

  • We don't believe we'll ever need to add Add<T>(ReadOnlySpan<T> value)
  • We don't want to constrain the implementation to do the same as calling Add<T>(T value) in a loop
namespace System
{
    public struct HashCode
    {
         public void Add(ReadOnlySpan<byte> value);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels Jun 14, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 14, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 15, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants