-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MemoryExtensions.GetHashCode(ReadOnlySpan<Char>, StringComparison) Method #50322
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
To make sure I understand the request here, you're asking for functionality equivalent to the existing string.GetHashCode but shipped in a separate nuget targeting downlevel? |
Tagging subscribers to this area: @tannergooding, @pgovind, @GrabYourPitchforks Issue DetailsBackground and Motivation
It should also include a
The reason is obvious: Proposed APInamespace System
{
public static class MemoryExtensions {
+ public static bool GetHashCode(this ReadOnlySpan<char> span, StringComparison comparisonType) {
+ ...
+ }
}
} Usage ExamplesReadOnlySpan<char> span = ...;
int hashCode = span.GetHashCode(StringComparision.OrdinalIgnoreCase); Alternative DesignsI believe this is the best place to add this API. RisksDon't see any risk - already have
|
I don't think we should add this API. The downlevel performance would be extremely poor, as it would require converting the spans back into a string instance and calling |
(I agree, just trying to ensure I understand the request. ) |
Yes, the request is to ship functionality equivalent to the existing The real use case is a custom JSON deserializer which requires a custom Dictionary to lookup JSON property by UTF8 span name, compares ordinal or ordinal case insensitive. It's a library should target downlevel to .Net Standard 2.0 and Framework 4.6.1 for broader adoption. Performance is critical so converting the spans back into a string instance is not acceptable (it also destroys the whole point of providing this API). Returning a different value than
|
Unfortunately I don't think this is something we can deliver in a downlevel package. It would rely on APIs like Honestly I think your best bet is when running downlevel to use a regular string-based dictionary. Yes, it'll take a perf hit, but at least it will work as expected. If that's truly not acceptable for your scenario, you could clone some of the logic from the implementations in this repo. For instance, if all you need is |
Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop Issue DetailsBackground and Motivation
It should also include a
The reason is obvious: Proposed APInamespace System
{
public static class MemoryExtensions {
+ public static int GetHashCode(this ReadOnlySpan<char> span, StringComparison comparisonType) {
+ ...
+ }
}
} Usage ExamplesReadOnlySpan<char> span = ...;
int hashCode = span.GetHashCode(StringComparision.OrdinalIgnoreCase); Alternative DesignsI believe this is the best place to add this API. RisksDon't see any risk - already have
|
@weifenluo maybe for interest: #27229 and #2010 |
@GrabYourPitchforks Thanks a lot for your great explanation and advice. I managed to work out the solution which falling back to string only when there is non-ASCII char. After all, JSON property name rarely contains non-ASCII char.
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
internal static partial class MemoryExtensions
{
// A span-based equivalent of String.GetHashCode(StringComparison). Uses the specified comparison type.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetHashCode(this ReadOnlySpan<char> value, StringComparison comparisonType)
{
switch (comparisonType)
{
case StringComparison.Ordinal:
return GetHashCode(value);
case StringComparison.OrdinalIgnoreCase:
return GetHashCodeOrdinalIgnoreCase(value);
default:
throw new ArgumentException($"Comparison type '{comparisonType}' is not supported.", nameof(comparisonType));
}
}
// A span-based equivalent of String.GetHashCode(). Computes an ordinal hash code.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int GetHashCode(ReadOnlySpan<char> value)
{
ulong seed = Marvin.DefaultSeed;
return Marvin.ComputeHash32(MemoryMarshal.AsBytes(value), seed);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static int GetHashCodeOrdinalIgnoreCase(ReadOnlySpan<char> value)
{
ulong seed = Marvin.DefaultSeed;
return Marvin.ComputeHash32OrdinalIgnoreCase(ref MemoryMarshal.GetReference(value), value.Length /* in chars, not bytes */, (uint)seed, (uint)(seed >> 32));
}
} This goes down to Marvin.ComputeHash32 and Marvin.ComputeHash32OrdinalIgnoreCase, with two small tricks:
private static ulong GenerateSeed()
{
//***********************************************************
// Replace original implementation using Interop
// with hashcode of two GUIDs.
//***********************************************************
//ulong seed;
//Interop.GetRandomBytes((byte*)&seed, sizeof(ulong));
//return seed;
var hashCode1 = (ulong)Guid.NewGuid().GetHashCode();
var hashCode2 = (ulong)Guid.NewGuid().GetHashCode();
return (hashCode1 << 32) & hashCode2;
} and private static unsafe int ComputeHash32OrdinalIgnoreCaseSlow(ref char data, int count, uint p0, uint p1)
{
Debug.Assert(count > 0);
//****************************************************************************
// The original implementation does not work because Ordinal.ToUpperOrdinal
// will hit on GlobalizationMode.Invariant and GlobalizationMode.UseNls
// which are OS-specific
//***************************************************************************
//char[]? borrowedArr = null;
//Span<char> scratch = (uint)count <= 64 ? stackalloc char[64] : (borrowedArr = ArrayPool<char>.Shared.Rent(count));
//int charsWritten = System.Globalization.Ordinal.ToUpperOrdinal(new ReadOnlySpan<char>(ref data, count), scratch);
//Debug.Assert(charsWritten == count); // invariant case conversion should involve simple folding; preserve code unit count
//// Slice the array to the size returned by ToUpperInvariant.
//// Multiplication below will not overflow since going from positive Int32 to UInt32.
//int hash = ComputeHash32(ref Unsafe.As<char, byte>(ref MemoryMarshal.GetReference(scratch)), (uint)charsWritten * 2, p0, p1);
//// Return the borrowed array if necessary.
//if (borrowedArr != null)
//{
// ArrayPool<char>.Shared.Return(borrowedArr);
//}
//return hash;
//*****************************************************************************
// The only way to calculate ordinal ignore case hashcode for non-ASCII string
// is using StringComparer.OrdinalIgnoreCase.GetHashCode() because
// there is no ToUpperOrdinal API for string.
// The result is different from the original implementation, it's ok
// because we will use the hash code privately only.
//*****************************************************************************
var charSpan = new ReadOnlySpan<char>(Unsafe.AsPointer(ref data), count);
string s = charSpan.ToString();
int hash = StringComparer.OrdinalIgnoreCase.GetHashCode(s);
return HashCode.Combine(p0, p1, hash);
} |
@gfoidl Wow, didn't expect coming across a world champion here! Thanks a lot for your information, which helped me to realize that |
@weifenluo I'm glad you found a solution that works for you! Please remember to have your code security-reviewed, as Marvin32 is intended to be a security-sensitive algorithm, and calling |
Background and Motivation
MemoryExtensions
hasEquals
extension method to compare twoReadOnlySpan<char>
structs:It should also include a
GetHashCode
method forReadOnlySpan<char>
struct:The reason is obvious:
Equals
andGetHashCode
should appear in pair. This API is helpful when writing libraries targeting legacy .Net Framework or .Net Standard 2.0 for broader audience, whereString.GetHashCode(ReadOnlySpan<char>, StringComparison)
is not available.Proposed API
Usage Examples
Alternative Designs
I believe this is the best place to add this API.
Risks
Don't see any risk - already have
Equals
any way. I believe the implementation is already there, only a few lines of code needs to be added.The text was updated successfully, but these errors were encountered: