New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize the CRC64 implementation #85221
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThis significantly improves performance for System.IO.Hashing.Crc64 for cases where the source span is 16 bytes or larger on Intel x86/x64 and modern ARM architectures. The vectorization change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7. This is a continuation of work done in #83321 which added vectorization to CRC32. BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1631) PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
BenchmarkDotNet=v0.13.2.2052-nightly, OS=ubuntu 22.04 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
|
/cc @tannergooding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very impressive improvements @brantburnett !
Vector128<ulong> x6 = CarrylessMultiplyLower(x2, x0); | ||
Vector128<ulong> x7 = CarrylessMultiplyLower(x3, x0); | ||
Vector128<ulong> x8 = CarrylessMultiplyLower(x4, x0); | ||
x5 = VectorHelper.CarrylessMultiplyLower(x1, x0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a good idea to move these methods to other type and reuse them. 👍
To avoid the need of adding the type name everywhere these methods were used you could just use using static at the top of the file
using static System.IO.Hashing.VectorHelper;
|
||
// Work with a reference to where we're at in the ReadOnlySpan and a local length | ||
// to avoid extraneous range checks. | ||
ref byte srcRef = ref MemoryMarshal.GetReference(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer to store a reference to ulong
rather than byte
and in every loop iteration update the index rather than the reference, but since similar pattern was used in #83321 and approved by people more knowledgeable in this area, so I won't suggest it.
- ref byte srcRef = ref MemoryMarshal.GetReference(source);
+ ref ulong srcRef = ref Unsafe.As<byte, ulong>(ref MemoryMarshal.GetReference(source));
This significantly improves performance for System.IO.Hashing.Crc64 for cases where the source span is 16 bytes or larger on Intel x86/x64 and modern ARM architectures. The vectorization change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7.
This is a continuation of work done in #83321 which added vectorization to CRC32.
BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1631)
Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=8.0.100-preview.3.23178.7
[Host] : .NET 8.0.0 (8.0.23.17408), X64 RyuJIT AVX2
Job-FPBBMO : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-FTHZKV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
BenchmarkDotNet=v0.13.2.2052-nightly, OS=ubuntu 22.04
AWS m6g.xlarge Graviton2
.NET SDK=8.0.100-preview.3.23178.7
[Host] : .NET 8.0.0 (8.0.23.17408), Arm64 RyuJIT AdvSIMD
Job-OYJLBY : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Job-GKZVCN : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1