Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable floating-point behavior (Fast Math) #24784

Open
EgorBo opened this issue May 25, 2019 · 4 comments

Comments

@EgorBo
Copy link
Contributor

commented May 25, 2019

As far as I understand, currently Math/MathF native code is compiled with /fp:fast (only on Windows) however, a pure C# code behaves more like /fp:precise
So it could be a problem for e.g. developers who develop games (or backends for them) where they want all floating point operations to be 100% repeatable on all clients/hardware or different software compiled in /fp:precise mode. E.g. online games with client-side physics.

However, most users don't care about it and could gain an additional performance from /fp:fast for both native and managed code.

/fp:fast mode for C# could allow us to apply the following optimizations in JIT (inspired by LLVM):

1) a * b + c to fmadd

float z = a * b + c;
float z = a * b - c;
float z = c + a * b;
float z = -c + a * b;

could be done in a single instruction fmadd (see #17541) instead of mul + add.
There are lots of places in BCL where it can be inserted, especially around System.Numerics.*

Benchmark: (Coffee Lake i7 8700K)

Method Mean Ratio
Old 129.41 ns 1.00
New 64.95 ns 0.50

2) a / c to a * (1 / c)

float z = a / 1000;
// could be:
float z = a * 0.001f;

See #24584 which currently works only for power-of-two constants. But could handle any constant in the fast math mode.

Benchmark:

Method Mean Ratio
Old 403.9 ns 1.00
New 297.7 ns 0.74

3) Comparisons and ternary operations

float z = a > b ? a : b;
float z = a >= b ? a : b;
float z = MathF.Max(a, b);

could generate a single vmaxss instruction and in general be less strict around +0.0/-0.0/NaN.
Related: #22965 and #16306

4) a - b - a to -b

float z = a - b - a;
// could be just:
float z = -b;

5) (a * b) + (a * c) to a * (b + c)

float z = (a * b) + (a * c);
// could be:
float z = a * (b + c);

Benchmark:

Method Mean Ratio
Old 148.76 ns 1.00
New 51.31 ns 0.34

6) a * a * a * a to two vmulss

float z = a * a * a * a;

could be done in two vmulss instead of three.

7) Combinations of Math calls

float z = MathF.Sin(x) / MathF.Cos(x);
// could be:
float z = MathF.Tan(x);

This one doesn't really look useful but there can be more, need to check

See godbolt and sharplab playgrounds to compare all of these cases between .NET Core and clang/LLVM.

So we could have 3 modes: precise, mixed (current) and fast to set via some env variable e.g. COMPlus_FpMode=fast or an attribute [FloatingPointMode(FloatingPointModeOptions.Fast)] to be able to set the mode per method. Also, a runtime constant e.g.

if (FloatingPointMode.IsFastMathEnabled && Sse2.IsSupported)
{
    // simdify only when fast math is allowed
}

The only problem - we would need a second version of Math internal calls compiled in /fp:precise mode. If it's a problem then two modes: mixed and fast.

PS: Since Mono has LLVM back-end all of these optimizations can be easily turned on there (but only globally):

mono --aot=llvm,llvmllc="-mcpu=haswell -fp-contract=fast" Program.exe

/cc: @tannergooding @mikedn

UPD I wrote a blog post about different peephole optimizations: https://egorbo.com/peephole-optimizations.html

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

hm... so since #9369 was not merged Math internal calls are compiled with /fp:fast on Windows but have rather precise behavior on macOS/Linux?

Also, not sure it's related to /fp:fast but:

Console.WriteLine(MathF.Asinh(0.48549962f));
Console.WriteLine(MathF.Acos (0.57316583f));
Console.WriteLine(MathF.Log2 (1 / 3.0f));

Output on Windows:

0.46820483
0.96043230
-1.5849625

Output on macOS:

0.46820486
0.96043223
-1.5849624

Both OSs have dotnet --version = 3.0.100-preview5-011568

@tannergooding tannergooding added this to the Future milestone May 28, 2019

@tannergooding

This comment has been minimized.

Copy link
Member

commented May 28, 2019

Also CC. @CarolEidt who I've talked with about this before.

@tannergooding

This comment has been minimized.

Copy link
Member

commented May 28, 2019

I think, in general, it would be good to expose something like this long term. Developers have varying needs and sometimes precision is desired (and should be the default) and sometimes speed is desired instead.

I think in general, it should be easy enough to allow optimizations at a per-method level and that for a method users should be able to impact how System.Math calls operate. What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries).

@CarolEidt

This comment has been minimized.

Copy link
Member

commented May 28, 2019

I had thought we already had an issue along these lines, but I can't find it, so it probably doesn't yet exist.

What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries).

This is the tricky design issue, and not just for inlining but whether and how these decisions are made across methods, classes, assemblies, etc. I think a reasonable position can be taken and supported, but it will certainly require some design and discussion.

@EgorBo EgorBo changed the title Configurable floating-point behavior Configurable floating-point behavior (Fast Math) Jul 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.