Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving `Math.Abs(double)` and `Math.Abs(float)` to be implemented in managed code. #14156

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@tannergooding
Copy link
Member

tannergooding commented Sep 23, 2017

This is part of #14155.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 23, 2017

The managed implementation of Abs is simple enough that it shouldn't cause any problems.

test Windows_NT_x64 perf

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 23, 2017

FYI. @mellinoe

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 23, 2017

@dotnet-bot help please

@dotnet dotnet deleted a comment from dotnet-bot Sep 23, 2017

@dotnet dotnet deleted a comment from dotnet-bot Sep 23, 2017

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 24, 2017

This will make these methods slower when they are not expanded as intrinsics. Related to my comment #14155 (comment).

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

@jkotas, I don't believe that is the case and does not match the numbers I was seeing locally (will try to get actual numbers and post sometime tomorrow).

The managed implementation should get inlined by the JIT, where-as the FCALL (when not compiled to an intrinsic) is simple enough that the resulting call ends up removing any benefits that the slightly better codegen the CRT implementation had.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 24, 2017

Have you tried what happens when somebody calls this via delegate? Func<double,double> d = Math.Abs ? I would expect it to be quite a bit slower - especially on x86.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

Have you tried what happens when somebody calls this via delegate

I will be sure to include this in the numbers I collect.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

The following are from a local run of the Math Functions benchmark (1000 outer iterations, 5000 inner iterations) for the double overloads

Math.Abs is the existing implementation prior to this change
MyMath.Abs is the new implementation

I have ordered these result from least time to greatest time:

Test nsec/inner iteration x64 nsec/inner iteration x86
Math.Abs 0.841085699890226 0.850034026439170
MyMath.Abs 0.940543214717660 7.315399443037920
Math.Abs Delegate 2.293222412081260 4.121587809301480
MyMath.Abs Delegate 3.645559635181860 14.87063683587174

Notes:

  • All timings were on a Ryzen 1800X with 32GB of RAM in 64-bit.
  • When not used as a delegate and not treated as an intrinsic
    • x64: MyMath.Abs is 59% faster
    • x86: MyMath.Abs is 761% slower
  • When not used as a delegate and treated as an intrinsic, performance is equal (x64 and x86)
  • When used as a delegate and not treated as an intrinsic:
    • x64: MyMath.Abs is 59% slower
    • x86: MyMath.Abs is 261% slower
  • When used as a delegate and using the hardware intrinsics feature for codgen, performance is equal (x64 and x86)
    • The feature is still a WIP and my test was mostly hacked together
  • For all cases of intrinsic use, the timings are about 0.84 nsec/inner iteration (x64 and x86).
  • For x86, the worse timings look to be mostly related to poor codegen
    • We are generating mixed SSE and x87 FPU instructions
    • We are not inlining the Abs call

So I think, at the very least, there are some x86 codegen bugs that need to be filed.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 24, 2017

I expect that you will get similar results for all platforms that did not get as much codegen investments as x64. Fallback to C implementation is a better option for these platforms.

I think that the Math methods should be wired to use the default C implementation by default. The managed or hardware intrinsic implementation should be opt-in per platform - for platforms where we have deeper codegen investments that allows us to do better than the C implementation. This strategy should work well for all Math methods. Abs is outlier because of it is simpler and you can be more creative with it, but I do not think that it is a good reason for inventing different scheme for it.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 25, 2017

I think that the Math methods should be wired to use the default C implementation by default.

@jkotas, wiring to the C runtime implementation for the platform is "easy", but has a high chance of introducing various bugs, performance differences, input/output differences, etc (as has already been found just from bringing Linux and OSX on-board).

Additionally, the last Math API review meeting (dotnet/corefx#16428) made it fairly clear that we did not want to continue bloating corlib with more math APIs and instead wanted to do something like provide them in a separate library where they are implemented in managed code using hardware intrinsics with a software fallback.

What is your proposal/thoughts to resolving these issues to ensure that the underlying implementations are consistent, performant, easily updatable, etc?

To me, it still seems like the best way forward is to (for all Math and MathF methods):

  1. Implement a purely software based implementation in managed code
  2. When possible, provide alternate code paths that use hardware intrinsics (in managed code) for better codegen
  3. Optionally, mark the methods as [Intrinsic] and wire together ValueNum support and the like in the JIT

From what I can tell, this is roughly what we are already doing in CoreRT (although the pure software implementation is a P/Invoke into the C Runtime for most methods).

This is definitely what Math.Round and MathF.Round are doing now that the code is being shared with CoreRT (we do not call into the C Runtime for this function on either side) -- The caveat here is that, for the ValueNum support, we effectively have the algorithm duplicated in the runtime as well (see FloatingPointUtils::round).

For reference, some of the currently tracked bugs (at least that I filed) are:

I know other users have logged others bugs in similar areas as well.

Additionally, some users have expressed want for fast and precise modes for these math functions (and math operations in general). The math operations in general tend to be precise (which prevents the JIT from performing things like fused multiply add transformations), while the Math and MathF functions are fast by default (which means they may give up some precision in favor of speed).

I believe this would, as well, be most easily provided through a managed implementation with runtime knowledge of the special switches (if such a feature would be approved and provided).

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 25, 2017

There have also been several bugs that have already been fixed/resolved that I did not list.

With this code being based on the underlying C Runtime, rather than being in shared managed code, it also means that each runtime (CoreCLR, CoreRT, etc) each has to be updated and kept in sync manually when a workaround is added (and this is easily overlooked or forgotten).

For example, while moving the non-extern methods in System.Math and System.MathF to be shared over the weekend, there were two additional fixes that had to be made due to them being down in one of the runtimes, but not the other(s).

@fanoI

This comment has been minimized.

Copy link

fanoI commented Sep 26, 2017

The important thing is that the "software based implementation in managed code" is called when neither 2 or 3 are provided... we surely are interested in Cosmos to have SSE intrinsic supported but not for now.
Maybe will be sufficient to "plug" all the xxx.IsSupported properties to return always false?

@tannergooding tannergooding deleted the tannergooding:managed-math-abs branch Jan 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.