Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression with System.Numerics.Vector when retargeting from .NET 8 to .NET 9 #110223

Closed
pkulikov opened this issue Nov 27, 2024 · 3 comments
Labels

Comments

@pkulikov
Copy link

pkulikov commented Nov 27, 2024

We're observing performance regression when retargeting one of our apps from .NET 8 to .NET 9.
This is the same app with which we observed performance regression one year ago: switching from .NET 7 to .NET 8 then. (For more information about that case, see #94920)

The core calculation is to find the maximum absolute of element-wise difference of two float arrays.

The attached test app (with benchmarks) does that in four ways:

  • With Vector<float>: both safe and unsafe way to instantiate a vector
  • With TensorPrimitives
  • With a for loop

PerformanceRegression.zip

As a year ago, the regression seems to concern System.Numerics.Vector usage.

The benchmark results are as follows:

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
13th Gen Intel Core i7-13850HX, 1 CPU, 28 logical and 20 physical cores
.NET SDK 9.0.100
  [Host] : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  .NET 9 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

Server=True  

Method Job Runtime Mean Error StdDev
VectorSafe .NET 8 .NET 8.0 369.2 ns 1.75 ns 1.55 ns
VectorUnsafe .NET 8 .NET 8.0 323.8 ns 1.35 ns 1.20 ns
Tensor .NET 8 .NET 8.0 1,088.8 ns 6.26 ns 5.86 ns
ForLoop .NET 8 .NET 8.0 1,427.2 ns 1.58 ns 1.23 ns
VectorSafe .NET 9 .NET 9.0 1,115.9 ns 2.87 ns 2.24 ns
VectorUnsafe .NET 9 .NET 9.0 1,109.8 ns 4.41 ns 3.91 ns
Tensor .NET 9 .NET 9.0 1,091.9 ns 6.90 ns 6.12 ns
ForLoop .NET 9 .NET 9.0 1,393.0 ns 7.12 ns 6.66 ns

Somehow in .NET 9, the difference between a for loop and vectorized code is not as large as one would expect.

/cc: @tannergooding

@pkulikov pkulikov added the tenet-performance Performance related issue label Nov 27, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 27, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 27, 2024
@huoyaoyuan huoyaoyuan added area-System.Numerics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 27, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member

The issue here is "by design" and comes about from needing to ensure Max, MaxMagnitude, MaxMagnitudeNumber, MaxNumber, and the various Min* variants are deterministic across platforms as part of exposing the full set of math APIs on the vector and tensor types. The fix is to use the newer MaxNative or MinNative API which won't guarantee determinism around -0 and NaN, but which instead will do whatever is fastest and remains correct for the underlying platform for all other inputs.

Here is the results on my machine:

Method Job Runtime Mean Error StdDev
VectorSafe .NET 8 .NET 8.0 240.3 ns 4.80 ns 6.73 ns
VectorUnsafe .NET 8 .NET 8.0 198.2 ns 1.61 ns 1.25 ns
Tensor .NET 8 .NET 8.0 1,029.5 ns 16.73 ns 16.43 ns
ForLoop .NET 8 .NET 8.0 1,182.3 ns 23.36 ns 21.85 ns
VectorSafe .NET 9 .NET 9.0 491.1 ns 3.22 ns 2.69 ns
VectorUnsafe .NET 9 .NET 9.0 485.2 ns 3.33 ns 3.11 ns
Tensor .NET 9 .NET 9.0 979.6 ns 7.98 ns 6.23 ns
ForLoop .NET 9 .NET 9.0 1,138.2 ns 22.12 ns 27.98 ns

And for comparison, here is the results when using Vector.MaxNative instead of Vector.Max:

Method Job Runtime Mean Error StdDev
VectorMaxNativeSafe .NET 9 .NET 9.0 202.6 ns 2.33 ns 2.06 ns
VectorMaxNativeUnsafe .NET 9 .NET 9.0 199.5 ns 3.84 ns 4.57 ns
TensorMaxMagnitude .NET 9 .NET 9.0 751.5 ns 14.84 ns 27.14 ns

The TensorMaxMagnitudeTest is then a change to the following:

  TensorPrimitives.Subtract(one, another, workspace);
- var index = TensorPrimitives.IndexOfMaxMagnitude(workspace);
- return MathF.Abs(workspace[index]);
+ return MathF.Abs(TensorPrimitives.MaxMagnitude(workspace));

TensorPrimitives is notably slower because it is touching more memory due to the intermediate buffer required to store the one - another result. It would match or surpass VectorMaxNativeUnsafe if a dedicated TensorPrimitives.MaximumAbsoluteDifference API was provided. -- This is a general problem that exists with any vectorized kernel, where each isolated kernel is fast but when used sequentially they can by several times slower due to repeated memory accesses. The only fix for that is to write a new combined kernel that iterates memory the least number of times possible. Doing that, however, involves writing a lot of code and the long term goal here is to come up with some kind of reusable code infrastructure that makes this much more trivial, likely in the form of something similar to the InvokeSpanSpanIntoSpan helpers that TensorPrimitives is currently using internally to make this all "trivial".

@pkulikov
Copy link
Author

@tannergooding

Thanks! MaxNative "fixes" the performance drop. With that, I think this issue can be closed :)

By the way, on my machine I don't observe TensorMaxMagnitude to be faster.

It would match or surpass VectorMaxNativeUnsafe if a dedicated TensorPrimitives.MaximumAbsoluteDifference API was provided

Yes, would be nice to have it in .NET as one more variant (MaximumDistance) of distance metric (now .NET provides at least two variants: Distance and HammingDistance as far as I see): https://en.wikipedia.org/wiki/Chebyshev_distance

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants