Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Span.SequenceEqual for small buffers. #32364

Merged
merged 1 commit into from
Feb 15, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 55 additions & 15 deletions src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1312,28 +1312,32 @@ public static unsafe int LastIndexOfAny(ref byte searchSpace, byte value0, byte
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static unsafe bool SequenceEqual(ref byte first, ref byte second, nuint length)
{
if (Unsafe.AreSame(ref first, ref second))
goto Equal;

IntPtr offset = (IntPtr)0; // Use IntPtr for arithmetic to avoid unnecessary 64->32->64 truncations
IntPtr lengthToExamine = (IntPtr)(void*)length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like lengthToExamine is cast to a byte* most places it's used. Should it just be stored as one in the first place? Same for offset. (Looking at diff on my phone so maybe I'm just not seeing the reason.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these should be fixed to use nuint/nint. It will be easier once Roslyn adds native support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking this up in a follow up PR (for this method)


if (Vector.IsHardwareAccelerated && (byte*)lengthToExamine >= (byte*)Vector<byte>.Count)
if ((byte*)lengthToExamine >= (byte*)sizeof(UIntPtr))
{
lengthToExamine -= Vector<byte>.Count;
while ((byte*)lengthToExamine > (byte*)offset)
// Only check that the ref is the same if buffers are large, and hence
// its worth avoiding doing unnecessary comparisons
if (Unsafe.AreSame(ref first, ref second))
goto Equal;

if (Vector.IsHardwareAccelerated && (byte*)lengthToExamine >= (byte*)Vector<byte>.Count)
{
if (LoadVector(ref first, offset) != LoadVector(ref second, offset))
lengthToExamine -= Vector<byte>.Count;
while ((byte*)lengthToExamine > (byte*)offset)
{
goto NotEqual;
if (LoadVector(ref first, offset) != LoadVector(ref second, offset))
{
goto NotEqual;
}
offset += Vector<byte>.Count;
}
offset += Vector<byte>.Count;
return LoadVector(ref first, lengthToExamine) == LoadVector(ref second, lengthToExamine);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256? I'm curious if/how alignment affects this final comparison, which would seem to generally be unaligned in such a case. Not an issue? I ask simply because in other implementations I've seen us go out of our way to try to align such operations.

Copy link
Member Author

@ahsonkhan ahsonkhan Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256?

No, our perf tests aren't very extensive for some of the APIs. I was told to not bloat the number of test permutations too much. Right now we only test length 512.
We can certainly do one-offs locally though.

Feel free to add more here:
https://github.com/dotnet/performance/blob/1930f660f56f80b0cad5bfc749fe4c46464801f4/src/benchmarks/micro/libraries/System.Memory/Span.cs#L14-L48

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally for me, Vector<byte>.Count = 32.

Here are the results for what's in master atm:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-BCFXLD : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  MaxIterationCount=10  MinIterationCount=5  
WarmupCount=3  
Method Length Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
SequenceEqual 32 2.813 ns 0.0692 ns 0.0180 ns 2.817 ns 2.782 ns 2.826 ns - - - -
SequenceEqual 33 3.439 ns 0.1479 ns 0.0880 ns 3.415 ns 3.341 ns 3.556 ns - - - -
SequenceEqual 34 4.010 ns 0.0969 ns 0.0252 ns 4.001 ns 3.993 ns 4.055 ns - - - -
SequenceEqual 63 3.401 ns 0.0959 ns 0.0502 ns 3.414 ns 3.286 ns 3.443 ns - - - -
SequenceEqual 64 3.300 ns 0.0775 ns 0.0201 ns 3.305 ns 3.278 ns 3.327 ns - - - -
SequenceEqual 65 4.118 ns 0.1121 ns 0.0586 ns 4.122 ns 3.993 ns 4.201 ns - - - -
SequenceEqual 256 7.424 ns 0.2525 ns 0.1503 ns 7.441 ns 7.211 ns 7.705 ns - - - -
SequenceEqual 257 7.622 ns 0.2085 ns 0.1379 ns 7.673 ns 7.433 ns 7.848 ns - - - -

}
return LoadVector(ref first, lengthToExamine) == LoadVector(ref second, lengthToExamine);
}

if ((byte*)lengthToExamine >= (byte*)sizeof(UIntPtr))
{
Debug.Assert((byte*)lengthToExamine >= (byte*)sizeof(UIntPtr));

lengthToExamine -= sizeof(UIntPtr);
while ((byte*)lengthToExamine > (byte*)offset)
{
Expand All @@ -1346,11 +1350,39 @@ public static unsafe bool SequenceEqual(ref byte first, ref byte second, nuint l
return LoadUIntPtr(ref first, lengthToExamine) == LoadUIntPtr(ref second, lengthToExamine);
}

while ((byte*)lengthToExamine > (byte*)offset)
Debug.Assert((byte*)lengthToExamine < (byte*)sizeof(UIntPtr));

// On 32-bit, this will never be true since sizeof(UIntPtr) == 4
#if TARGET_64BIT
if ((byte*)lengthToExamine >= (byte*)sizeof(int))
{
if (LoadInt(ref first, offset) != LoadInt(ref second, offset))
{
goto NotEqual;
}
offset += sizeof(int);
lengthToExamine -= sizeof(int);
}
#endif

if ((byte*)lengthToExamine >= (byte*)sizeof(short))
{
if (LoadShort(ref first, offset) != LoadShort(ref second, offset))
{
goto NotEqual;
}
offset += sizeof(short);
lengthToExamine -= sizeof(short);
}

if (lengthToExamine != IntPtr.Zero)
{
Debug.Assert((int)lengthToExamine == 1);

if (Unsafe.AddByteOffset(ref first, offset) != Unsafe.AddByteOffset(ref second, offset))
{
goto NotEqual;
offset += 1;
}
}

Equal:
Expand Down Expand Up @@ -1611,6 +1643,14 @@ private static int LocateLastFoundByte(ulong match)
0x02ul << 40 |
0x01ul << 48) + 1;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe short LoadShort(ref byte start, IntPtr offset)
=> Unsafe.ReadUnaligned<short>(ref Unsafe.AddByteOffset(ref start, offset));

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe int LoadInt(ref byte start, IntPtr offset)
=> Unsafe.ReadUnaligned<int>(ref Unsafe.AddByteOffset(ref start, offset));

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe UIntPtr LoadUIntPtr(ref byte start, IntPtr offset)
=> Unsafe.ReadUnaligned<UIntPtr>(ref Unsafe.AddByteOffset(ref start, offset));
Expand Down