Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable floating-point behavior (Fast Math) #12753

Open
EgorBo opened this issue May 25, 2019 · 5 comments
Open

Configurable floating-point behavior (Fast Math) #12753

EgorBo opened this issue May 25, 2019 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented May 25, 2019

As far as I understand, currently Math/MathF native code is compiled with /fp:fast (only on Windows) however, a pure C# code behaves more like /fp:precise
So it could be a problem for e.g. developers who develop games (or backends for them) where they want all floating point operations to be 100% repeatable on all clients/hardware or different software compiled in /fp:precise mode. E.g. online games with client-side physics.

However, most users don't care about it and could gain an additional performance from /fp:fast for both native and managed code.

/fp:fast mode for C# could allow us to apply the following optimizations in JIT (inspired by LLVM):

1) a * b + c to fmadd

float z = a * b + c;
float z = a * b - c;
float z = c + a * b;
float z = -c + a * b;

could be done in a single instruction fmadd (see https://github.com/dotnet/coreclr/issues/17541) instead of mul + add.
There are lots of places in BCL where it can be inserted, especially around System.Numerics.*

Benchmark: (Coffee Lake i7 8700K)

Method Mean Ratio
Old 129.41 ns 1.00
New 64.95 ns 0.50

2) a / c to a * (1 / c)

float z = a / 1000;
// could be:
float z = a * 0.001f;

See dotnet/coreclr#24584 which currently works only for power-of-two constants. But could handle any constant in the fast math mode.

Benchmark:

Method Mean Ratio
Old 403.9 ns 1.00
New 297.7 ns 0.74

3) Comparisons and ternary operations

float z = a > b ? a : b;
float z = a >= b ? a : b;
float z = MathF.Max(a, b);

could generate a single vmaxss instruction and in general be less strict around +0.0/-0.0/NaN.
Related: dotnet/coreclr#22965 and dotnet/coreclr#16306

4) a - b - a to -b

float z = a - b - a;
// could be just:
float z = -b;

5) (a * b) + (a * c) to a * (b + c)

float z = (a * b) + (a * c);
// could be:
float z = a * (b + c);

Benchmark:

Method Mean Ratio
Old 148.76 ns 1.00
New 51.31 ns 0.34

6) a * a * a * a to two vmulss

float z = a * a * a * a;

could be done in two vmulss instead of three.

7) Combinations of Math calls

float z = MathF.Sin(x) / MathF.Cos(x);
// could be:
float z = MathF.Tan(x);

This one doesn't really look useful but there can be more, need to check

See godbolt and sharplab playgrounds to compare all of these cases between .NET Core and clang/LLVM.

So we could have 3 modes: precise, mixed (current) and fast to set via some env variable e.g. COMPlus_FpMode=fast or an attribute [FloatingPointMode(FloatingPointModeOptions.Fast)] to be able to set the mode per method. Also, a runtime constant e.g.

if (FloatingPointMode.IsFastMathEnabled && Sse2.IsSupported)
{
    // simdify only when fast math is allowed
}

The only problem - we would need a second version of Math internal calls compiled in /fp:precise mode. If it's a problem then two modes: mixed and fast.

PS: Since Mono has LLVM back-end all of these optimizations can be easily turned on there (but only globally):

mono --aot=llvm,llvmllc="-mcpu=haswell -fp-contract=fast" Program.exe

/cc: @tannergooding @mikedn

category:proposal
theme:floating-point
skill-level:expert
cost:extra-large

@EgorBo
Copy link
Member Author

EgorBo commented May 26, 2019

hm... so since dotnet/coreclr#9369 was not merged Math internal calls are compiled with /fp:fast on Windows but have rather precise behavior on macOS/Linux?

Also, not sure it's related to /fp:fast but:

Console.WriteLine(MathF.Asinh(0.48549962f));
Console.WriteLine(MathF.Acos (0.57316583f));
Console.WriteLine(MathF.Log2 (1 / 3.0f));

Output on Windows:

0.46820483
0.96043230
-1.5849625

Output on macOS:

0.46820486
0.96043223
-1.5849624

Both OSs have dotnet --version = 3.0.100-preview5-011568

@tannergooding
Copy link
Member

Also CC. @CarolEidt who I've talked with about this before.

@tannergooding
Copy link
Member

I think, in general, it would be good to expose something like this long term. Developers have varying needs and sometimes precision is desired (and should be the default) and sometimes speed is desired instead.

I think in general, it should be easy enough to allow optimizations at a per-method level and that for a method users should be able to impact how System.Math calls operate. What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries).

@CarolEidt
Copy link
Contributor

I had thought we already had an issue along these lines, but I can't find it, so it probably doesn't yet exist.

What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries).

This is the tricky design issue, and not just for inlining but whether and how these decisions are made across methods, classes, assemblies, etc. I think a reasonable position can be taken and supported, but it will certainly require some design and discussion.

@EgorBo EgorBo changed the title Configurable floating-point behavior Configurable floating-point behavior (Fast Math) Jul 14, 2019
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

6 participants