Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. #988

pCYSl5EDgo · 2020-07-27T09:01:48Z

The latest LTS version of .NET Core is 3.1.

.NET Core 3.1 provides Hardware Intrinsics.
The SIMD Intrinsics would make some formatters more faster.

Accelerating UTF-8 Decoding Using SIMD Instructions
Above article shows important implications for how to efficiently encode fixed-length elements to variable length.

AArnott · 2020-07-27T17:24:35Z

Thanks for contributing, @pCYSl5EDgo. But I fail to see how this PR will improve performance at all. It just adds HARDWARE_INTRINSICS_X86 as a preprocessor symbol but we don't have any source code that checks for this at the moment.
If you want to add such code with this PR, that's fine. But I don't think it makes sense as-is.

pCYSl5EDgo · 2020-07-27T23:13:28Z

I will make some PR based on this PR.
This is the first and basic PR.
I add the symbol just for preparation.

AArnott · 2020-07-28T00:14:13Z

Ok, I appreciate you wanting focused PRs, but I want at least some value in each PR. All this one does is add build time. 🙂
So please add the value to this one.
Also we'll need to make sure tests run on netcoreapp2.1 as well as netcoreapp3.1 to test all code paths.
Also if we really are adding x86-only paths to the code, we'll need to add automated testing on a non-x86 platform to make sure we don't regress those.

pCYSl5EDgo · 2020-07-28T00:34:44Z

Thank you for your reminding me of the tests running on netcoreapp3.1!
I forgot it.

Ok, I'll make this PR involving one small improvement.
Thank you!

Need more tuning.

pCYSl5EDgo · 2020-07-29T16:16:43Z

Interim Report

I tried to improve the performance of MessagePackWriter.Write(string) by examining that each char value of input string is in ASCII range.
It slows down because of iterating twice of input chars.

I'll try another type.

AArnott · 2020-07-31T03:39:39Z

Thanks for measuring impact rather than just assuming the change makes things faster!

pCYSl5EDgo · 2020-08-01T15:27:32Z

Interim Report

I add API void MessagePackWriter.Write(sbyte[]) and void MessagePackWriter.Write(ReadOnlySpan<sbyte>).

The table below compares the elapsed time between new Hardware Intrinsic code and MessagePack-CSharp v2.1.152.
In the test code, these 2 competitors encoded a 16MByte random sbyte array.

Method	Mean	Error	StdDev
SerializeSimd	41.64 ms	0.810 ms	1.188 ms
SerializeNoSimd	119.85 ms	2.391 ms	2.559 ms

pCYSl5EDgo · 2020-08-03T02:47:38Z

Interim Report

I removed api void MessagePackWriter.Write(sbyte[]).
I changed the implementation of StandardClassFormatters.

I improved the performance of the serialization of short[].

Method	Mean	Error	StdDev
SerializeSimd	74.14 ms	0.467 ms	0.414 ms
SerializeNoSimd	243.51 ms	1.664 ms	1.475 ms
SerializeSimdZero	42.07 ms	0.135 ms	0.119 ms
SerializeNoSimdZero	80.48 ms	1.264 ms	1.183 ms

AArnott

Thanks for contributing. I imagine this took quite a bit of effort. I nevertheless have some concerns that I'm interested in how you'll respond to.

...ssagePack.UnityClient/Assets/Scripts/MessagePack/Formatters/StandardClassLibraryFormatter.cs

AArnott · 2020-08-03T15:19:45Z

...ssagePack.UnityClient/Assets/Scripts/MessagePack/Formatters/StandardClassLibraryFormatter.cs

+                        0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7, 128,
+                        128, 0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7,
+                    };
+                    fixed (byte* pShuffle = shuffle)


I'm very impressed by what you wrote here. But I'm also very concerned about maintainability. I have no idea what any of this does and have never seen code like it before. The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me. If we keep this, IMO we would need a lot of code comments and links to docs or blogs that explain it.

Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[] to the memory returned from GetSpan?

Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[] to the memory returned from GetSpan?

It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.

I have no idea what any of this does and have never seen code like it before.

It is known among the C/C++ programmers who use SIMD hardware intrinsics.
This is a good time for C#ers to learn and use SIMD programming.

The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me.

I will explain where this shuffle table came from in the T4 file.

I look forward to the .tt file update.

It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.

Huh. Good point. I guess when we write out byte[] we skip the attempted compression of each byte because we precede it with a special msgpack 'binary' header, which we don't do for sbyte[], so I guess we're on the hook to properly encode each one. But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding. But I guess I get what you're going for. Thanks for explaining.

simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.

I did the same thing for personal usage...

I have written comments in the T4 file.
The following list is for reference

References

Official Intel API Reference

The most useful website to understand the SIMD intruction behaviours

.NET Hardware Intrinsics API

The sbyte[] formatter is optimized differently than the other types...

The below table is measured by the Int8ArrayBenchmarkMessagePackNoSimdVsMessagePackSimd benchmark.

Method Mean Error StdDev

SerializeSimd 19.59 ms 0.345 ms 0.472 ms

SerializeNoSimd 116.50 ms 1.306 ms 1.221 ms

SerializeSimdZero 10.05 ms 0.152 ms 0.127 ms

SerializeNoSimdZero 56.92 ms 0.635 ms 0.594 ms

SerializeSimdM32 10.11 ms 0.092 ms 0.086 ms

SerializeNoSimdM32 59.48 ms 1.171 ms 2.112 ms

SerializeSimdM33 28.11 ms 0.375 ms 0.333 ms

SerializeNoSimdM33 91.95 ms 0.790 ms 0.701 ms

But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.

I change the formatter to cast sbyte[] as byte[] and encode it.
This is the benchmark result.

Method Mean Error StdDev

SerializeSimd_ConvertByteArray 11.05 ms 0.206 ms 0.426 ms

SerializeNoSimd 127.54 ms 2.472 ms 2.064 ms

SerializeSimdZero_ConvertByteArray 11.32 ms 0.222 ms 0.247 ms

SerializeNoSimdZero 64.56 ms 1.243 ms 2.830 ms

SerializeSimdM32_ConvertByteArray 10.89 ms 0.183 ms 0.290 ms

SerializeNoSimdM32 63.10 ms 1.178 ms 1.157 ms

SerializeSimdM33_ConvertByteArray 10.77 ms 0.150 ms 0.133 ms

SerializeNoSimdM33 97.38 ms 0.998 ms 0.885 ms

My PR is not that bad compared to that which converts to byte[].

AArnott · 2020-08-03T15:22:07Z

...ssagePack.UnityClient/Assets/Scripts/MessagePack/Formatters/StandardClassLibraryFormatter.cs

@@ -35,6 +40,581 @@ public byte[] Deserialize(ref MessagePackReader reader, MessagePackSerializerOpt
        }
    }

+    public sealed class SByteArrayFormatter : IMessagePackFormatter<sbyte[]>


I'm curious why use sbyte here? Who uses sbyte[]? If there's a perf improvement to be made in writing out an sbyte[] array, wouldn't that also apply to byte[], which is far more popular?

The specialized Formatter has already existed for byte[]. I think that formatter is fastest. I don't need to improve that.

Implemented : sbyte[], short[]
Work In Progress : int[], ushort[], uint[]

The reason of sbyte[] is that its implementation difficulty seemed easy to me compared to others.
I get used to implementing improved formatters.
Yes, this is a practice.

I will write int[] formatter which seems more difficult to implement than sbyte[].

well, before you go to any more work, I'd like to feel settled on what you've done so far so you don't waste effort if we're not going to take the PR ultimately anyway. I'm not saying we won't... I'm just saying that since you submitted some, let us review this and understand it enough to justify your continued effort here.

src/MessagePack.UnityClient/Assets/Scripts/MessagePack/MessagePackWriter.cs

pCYSl5EDgo · 2020-09-06T18:21:33Z

Add double[] formatter.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1049 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.401
  [Host]  : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  LongRun : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT

Job=LongRun  IterationCount=100  LaunchCount=3  
WarmupCount=15

Method	Size	Mean	Error	StdDev	Median
SerializeSingleInstructionMultipleData	64	227.3 ns	3.22 ns	16.52 ns	224.0 ns
SerializeNoSingleInstructionMultipleData	64	604.4 ns	5.13 ns	25.96 ns	596.5 ns
SerializeSingleInstructionMultipleData	1024	1,490.5 ns	18.70 ns	93.75 ns	1,463.4 ns
SerializeNoSingleInstructionMultipleData	1024	9,230.7 ns	354.94 ns	1,827.70 ns	8,519.3 ns
SerializeSingleInstructionMultipleData	16777216	142,284,585.5 ns	2,387,494.64 ns	12,208,105.14 ns	137,715,500.0 ns
SerializeNoSingleInstructionMultipleData	16777216	267,093,020.7 ns	4,500,563.82 ns	23,053,582.08 ns	259,778,500.0 ns

AArnott · 2020-09-12T00:56:20Z

I'm re-reviewing, but so far almost all the diffs I see in the existing projects are style changes. Please revert everything that's unrelated to your perf work. You can submit another PR with the style changes if you feel so inclined and we can weigh those separately.

AArnott · 2020-09-12T02:27:16Z

Never mind, I'll take care of it while I'm reviewing.

Also: * deleted `IntegerArrayFormatterHelper.cs` which the PR had added but seems to not use. * replaced MessagePack_2_1_165.dll with the one from the nuget package by that version. The one placed here previously was slightly different and I don't know why, but using the official build seems prudent.

AArnott

Looks good. Thanks for contributing!

AArnott · 2020-09-12T03:09:23Z

src/MessagePack.Experimental/PrimitiveArrayResolver.cs

+
+using MessagePack.Formatters;
+
+namespace MessagePack.Experimental.Resolvers


If we use the actual namespace and type names that we would in the primary library, we can eventually move these types from Experimental to the main library without a binary breaking change in the future. Any concerns with dropping Experimental from the namespace?

I do not have any concerns.
I had forgotten the perspective "binary breaking change in the future".
Thank you for your dropping Experimental!

AArnott · 2020-09-12T03:12:36Z

But IMO we should target 2.2 with this change, which is only days away from being released anyway.

pCYSl5EDgo · 2020-09-12T03:39:40Z

Thank you for your review, revert and approval!

MessagePack_2_1_165.dll with the one from the nuget package by that version. The one placed here previously was slightly different and I don't know why, but using the official build seems prudent.

I'm worried the benchmark project will not work.
I made the change because I was referring to the similar benchmark project "benchmark/SerializerBenchmark".
"benchmark/SerializerBenchmark" has "MessagePack_1_7_3_6.dll".
"MessagePack_1_7_3_6.dll" is different from Nuget's one.

The differences between "MessagePack_1_7_3_6.dll" and Nuget Official one

File name
Assembly name
Module name

C#'s extern alias official explanation saids that extern alias needs different assembly name.
I made a change to the Nuget one to change its assembly and module name.

pCYSl5EDgo · 2020-09-12T05:47:05Z

CI seems to fail in the step of dotnet restore.
There seems to be a problem about the version dependency in MessagePack.Generator.Tests.
CI saids that MessagePack.Generator.Tests depends on v3.4.0 and MessagePack.GeneratorCore depends on v3.6.0.
I read both csproj and do not find 3.6.0 dependency...

I do not know how to fix the CI failure. :(

AArnott · 2020-09-13T14:42:32Z

I'll take a look at the CI break.

Regarding assembly names, it should not be necessary to change the assembly name if the assembly version is already unique, but I will confirm this.

AArnott · 2020-09-13T15:30:33Z

The CI break resolved itself. I suspect you had retargeted to the v2.2 branch before I merged master into v2.2 yesterday, so the build didn't include the fix I included in #1036 for the package restore failure. That also would explain why you didn't see the failure locally: your branch hadn't actually merged with v2.2 either. Anyway, although this PR currently shows a build failure, I rebuilt the PR in Azure Pipelines and it succeeded so if we push to your PR again, it'll work.

Regarding the assembly name, yes the assembly names must be unique, and yours wasn't because you were building in a v2.1 branch and had checked in messagepack with an assembly version of 2.1.0.0, which matched what was built in that branch. So once your change merges with the v2.2 branch and the assembly version changes to 2.2.0.0, it works. So I'm going to push a change to your PR that merges with v2.2 then reverts the custom build of messagepack.dll so it's just the standard one since at that point we won't need the assembly name change.

This reverts commit dad1c6f.

AArnott · 2020-09-13T15:41:58Z

@neuecc I'm satisfied with this. Are you?

neuecc

good.

pCYSl5EDgo · 2020-09-14T00:16:55Z

CI break

I appreciate your fix.

Assembly Name

Oh, I'll study it again. Thank you for your explanatation.

AArnott · 2020-09-14T02:56:06Z

@pCYSl5EDgo do you mind if I squash your PR instead of merge it, given it has 47 commits?

pCYSl5EDgo · 2020-09-14T04:22:04Z

@AArnott
I do not mind it. Thank you.

AArnott · 2020-09-14T12:38:10Z

Thanks for your contribution, @pCYSl5EDgo !

Add Support for .NET Core 3.1

0db4b89

AArnott changed the title ~~Add Support for .NET Core 3.1~~ Add HARDWARE_INTRINSICS_X86 preprocessor symbol to netcoreapp3.1 targeted build Jul 27, 2020

pCYSl5EDgo added 7 commits July 29, 2020 14:27

[update]MessagePackWriter.Write(string) uses AVX2

c992ead

Add netcoreapp3.1 to test target frameworks

c24e5f3

Update target framework to netcoreapp3.1

d0aa263

Add Benchmark project to test hardware intrinsics

d72c589

Change the module and assembly name

d1056cb

This is too slow.

29225ce

Need more tuning.

Speed up but lose in performance

fa18ea8

Rollback

115884f

Faster sbyte[] Serialization

3db2056

pCYSl5EDgo added 5 commits August 2, 2020 00:30

Add Document Comment

d32dd70

Add Write(sbyte[]) to public APIs

87b4748

Modify not using Generic Unmanaged Struct Pointer

d16a0bf

Remove API MessagePackWriter.Write(sbyte[])

d751366

Fix naming rule

494b534

pCYSl5EDgo added 2 commits August 3, 2020 11:49

Make short[] serialization faster

fcc7e81

Remove empty line

ca04a57

AArnott requested changes Aug 3, 2020

View reviewed changes

pCYSl5EDgo added 3 commits August 4, 2020 15:02

Revert changes in MessagePackWriter.cs

f7dd2e7

Add Hardware Counters

a4ae0a2

Add Alignment of 32

1979e31

Add Comment to float/double[] formatters.

3655060

AArnott approved these changes Sep 12, 2020

View reviewed changes

pCYSl5EDgo changed the base branch from master to v2.2 September 12, 2020 03:52

pCYSl5EDgo added 3 commits September 12, 2020 13:34

Re-change the assembly/module name of MessagePack_2_1_165.dll

dad1c6f

Remove namespace of "Experimental"

a181ba1

Disable warning CS0436 "Same type name in the primary package"

4f53912

AArnott added 2 commits September 13, 2020 09:40

Merge remote-tracking branch 'upstream/v2.2' into netcoreapp3_1

aadf21d

Revert "Re-change the assembly/module name of MessagePack_2_1_165.dll"

f274fc0

This reverts commit dad1c6f.

AArnott requested a review from neuecc September 13, 2020 15:41

neuecc approved these changes Sep 13, 2020

View reviewed changes

pCYSl5EDgo changed the title ~~Add HARDWARE_INTRINSICS_X86 preprocessor symbol to netcoreapp3.1 targeted build~~ Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. Sep 14, 2020

AArnott merged commit bf2ea7a into MessagePack-CSharp:v2.2 Sep 14, 2020

AArnott added this to the v2.2 milestone Sep 14, 2020

AArnott added the enhancement label Sep 14, 2020

pCYSl5EDgo deleted the netcoreapp3_1 branch September 14, 2020 13:06

pCYSl5EDgo mentioned this pull request Jan 24, 2024

.NET 8 Update: Hardware Intrinsics #1744

Closed

Method	Mean	Error	StdDev
SerializeSimd	19.59 ms	0.345 ms	0.472 ms
SerializeNoSimd	116.50 ms	1.306 ms	1.221 ms
SerializeSimdZero	10.05 ms	0.152 ms	0.127 ms
SerializeNoSimdZero	56.92 ms	0.635 ms	0.594 ms
SerializeSimdM32	10.11 ms	0.092 ms	0.086 ms
SerializeNoSimdM32	59.48 ms	1.171 ms	2.112 ms
SerializeSimdM33	28.11 ms	0.375 ms	0.333 ms
SerializeNoSimdM33	91.95 ms	0.790 ms	0.701 ms

Method	Mean	Error	StdDev
SerializeSimd_ConvertByteArray	11.05 ms	0.206 ms	0.426 ms
SerializeNoSimd	127.54 ms	2.472 ms	2.064 ms
SerializeSimdZero_ConvertByteArray	11.32 ms	0.222 ms	0.247 ms
SerializeNoSimdZero	64.56 ms	1.243 ms	2.830 ms
SerializeSimdM32_ConvertByteArray	10.89 ms	0.183 ms	0.290 ms
SerializeNoSimdM32	63.10 ms	1.178 ms	1.157 ms
SerializeSimdM33_ConvertByteArray	10.77 ms	0.150 ms	0.133 ms
SerializeNoSimdM33	97.38 ms	0.998 ms	0.885 ms


		using MessagePack.Formatters;

		namespace MessagePack.Experimental.Resolvers

Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. #988

Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. #988

Conversation

pCYSl5EDgo commented Jul 27, 2020

AArnott commented Jul 27, 2020

pCYSl5EDgo commented Jul 27, 2020

AArnott commented Jul 28, 2020

pCYSl5EDgo commented Jul 28, 2020

pCYSl5EDgo commented Jul 29, 2020

AArnott commented Jul 31, 2020

pCYSl5EDgo commented Aug 1, 2020

pCYSl5EDgo commented Aug 3, 2020

AArnott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pCYSl5EDgo Aug 4, 2020 • edited

Choose a reason for hiding this comment

pCYSl5EDgo Aug 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pCYSl5EDgo Aug 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pCYSl5EDgo Aug 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pCYSl5EDgo commented Sep 6, 2020

AArnott commented Sep 12, 2020

AArnott commented Sep 12, 2020

AArnott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AArnott commented Sep 12, 2020

pCYSl5EDgo commented Sep 12, 2020 • edited

pCYSl5EDgo commented Sep 12, 2020

AArnott commented Sep 13, 2020

AArnott commented Sep 13, 2020

AArnott commented Sep 13, 2020

neuecc left a comment

Choose a reason for hiding this comment

pCYSl5EDgo commented Sep 14, 2020

AArnott commented Sep 14, 2020

pCYSl5EDgo commented Sep 14, 2020

AArnott commented Sep 14, 2020

pCYSl5EDgo Aug 4, 2020 •

edited

pCYSl5EDgo Aug 7, 2020 •

edited

pCYSl5EDgo Aug 7, 2020 •

edited

pCYSl5EDgo Aug 3, 2020 •

edited

pCYSl5EDgo commented Sep 12, 2020 •

edited