Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Span.SequenceEqual for small buffers. #32364

Merged
merged 1 commit into from
Feb 15, 2020

Conversation

ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Feb 15, 2020

Helps #32363

summary:
better: 6, geomean: 1.168
worse: 1, geomean: 1.216
total diff: 7

Slower diff/base Base Median (ns) Diff Median (ns) Modality
Threshold.SequenceEqual(Length: 0) 1.22 2.59 3.15
Faster base/diff Base Median (ns) Diff Median (ns) Modality
Threshold.SequenceEqual(Length: 7) 1.24 8.29 6.70
Threshold.SequenceEqual(Length: 6) 1.22 8.05 6.62
Threshold.SequenceEqual(Length: 5) 1.17 7.45 6.35
Threshold.SequenceEqual(Length: 2) 1.14 5.15 4.52
Threshold.SequenceEqual(Length: 3) 1.14 6.08 5.36
Threshold.SequenceEqual(Length: 1) 1.11 4.10 3.69

cc @jkotas, @benaadams, @GrabYourPitchforks

@ahsonkhan ahsonkhan added area-System.Memory tenet-performance Performance related issue labels Feb 15, 2020
@ahsonkhan ahsonkhan added this to the 5.0 milestone Feb 15, 2020
@ahsonkhan
Copy link
Member Author

CI failures are unrelated:
#32377
#32378
#32367

@ahsonkhan ahsonkhan merged commit 8ac93bb into dotnet:master Feb 15, 2020
@ahsonkhan ahsonkhan deleted the OptimizeSeqEqualForSmall branch February 15, 2020 18:28
@benaadams
Copy link
Member

This is very good :)

@@ -1312,28 +1312,32 @@ public static unsafe int LastIndexOfAny(ref byte searchSpace, byte value0, byte
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static unsafe bool SequenceEqual(ref byte first, ref byte second, nuint length)
{
if (Unsafe.AreSame(ref first, ref second))
goto Equal;

IntPtr offset = (IntPtr)0; // Use IntPtr for arithmetic to avoid unnecessary 64->32->64 truncations
IntPtr lengthToExamine = (IntPtr)(void*)length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like lengthToExamine is cast to a byte* most places it's used. Should it just be stored as one in the first place? Same for offset. (Looking at diff on my phone so maybe I'm just not seeing the reason.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these should be fixed to use nuint/nint. It will be easier once Roslyn adds native support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking this up in a follow up PR (for this method)

}
offset += Vector<byte>.Count;
return LoadVector(ref first, lengthToExamine) == LoadVector(ref second, lengthToExamine);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256? I'm curious if/how alignment affects this final comparison, which would seem to generally be unaligned in such a case. Not an issue? I ask simply because in other implementations I've seen us go out of our way to try to align such operations.

Copy link
Member Author

@ahsonkhan ahsonkhan Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256?

No, our perf tests aren't very extensive for some of the APIs. I was told to not bloat the number of test permutations too much. Right now we only test length 512.
We can certainly do one-offs locally though.

Feel free to add more here:
https://github.com/dotnet/performance/blob/1930f660f56f80b0cad5bfc749fe4c46464801f4/src/benchmarks/micro/libraries/System.Memory/Span.cs#L14-L48

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally for me, Vector<byte>.Count = 32.

Here are the results for what's in master atm:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-BCFXLD : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  MaxIterationCount=10  MinIterationCount=5  
WarmupCount=3  
Method Length Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
SequenceEqual 32 2.813 ns 0.0692 ns 0.0180 ns 2.817 ns 2.782 ns 2.826 ns - - - -
SequenceEqual 33 3.439 ns 0.1479 ns 0.0880 ns 3.415 ns 3.341 ns 3.556 ns - - - -
SequenceEqual 34 4.010 ns 0.0969 ns 0.0252 ns 4.001 ns 3.993 ns 4.055 ns - - - -
SequenceEqual 63 3.401 ns 0.0959 ns 0.0502 ns 3.414 ns 3.286 ns 3.443 ns - - - -
SequenceEqual 64 3.300 ns 0.0775 ns 0.0201 ns 3.305 ns 3.278 ns 3.327 ns - - - -
SequenceEqual 65 4.118 ns 0.1121 ns 0.0586 ns 4.122 ns 3.993 ns 4.201 ns - - - -
SequenceEqual 256 7.424 ns 0.2525 ns 0.1503 ns 7.441 ns 7.211 ns 7.705 ns - - - -
SequenceEqual 257 7.622 ns 0.2085 ns 0.1379 ns 7.673 ns 7.433 ns 7.848 ns - - - -

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants