# Configurable floating-point behavior (Fast Math) #24784

Open
opened this issue May 25, 2019 · 4 comments

Contributor

### EgorBo commented May 25, 2019 • edited

As far as I understand, currently `Math/MathF` native code is compiled with `/fp:fast` (only on Windows) however, a pure C# code behaves more like `/fp:precise`
So it could be a problem for e.g. developers who develop games (or backends for them) where they want all floating point operations to be 100% repeatable on all clients/hardware or different software compiled in `/fp:precise` mode. E.g. online games with client-side physics.

However, most users don't care about it and could gain an additional performance from `/fp:fast` for both native and managed code.

`/fp:fast` mode for C# could allow us to apply the following optimizations in JIT (inspired by LLVM):

### 1) `a * b + c` to `fmadd`

```float z = a * b + c;
float z = a * b - c;
float z = c + a * b;
float z = -c + a * b;```

could be done in a single instruction `fmadd` (see #17541) instead of `mul` + `add`.
There are lots of places in BCL where it can be inserted, especially around `System.Numerics.*`

Benchmark: (Coffee Lake i7 8700K)

Method Mean Ratio
Old 129.41 ns 1.00
New 64.95 ns 0.50

### 2) `a / c` to `a * (1 / c)`

```float z = a / 1000;
// could be:
float z = a * 0.001f;```

See #24584 which currently works only for power-of-two constants. But could handle any constant in the fast math mode.

Method Mean Ratio
Old 403.9 ns 1.00
New 297.7 ns 0.74

### 3) Comparisons and ternary operations

```float z = a > b ? a : b;
float z = a >= b ? a : b;
float z = MathF.Max(a, b);```

could generate a single `vmaxss` instruction and in general be less strict around +0.0/-0.0/NaN.
Related: #22965 and #16306

### 4) `a - b - a` to `-b`

```float z = a - b - a;
// could be just:
float z = -b;```

### 5) `(a * b) + (a * c)` to `a * (b + c)`

```float z = (a * b) + (a * c);
// could be:
float z = a * (b + c);```
Method Mean Ratio
Old 148.76 ns 1.00
New 51.31 ns 0.34

### 6) `a * a * a * a` to two `vmulss`

`float z = a * a * a * a;`

could be done in two `vmulss` instead of three.

### 7) Combinations of Math calls

```float z = MathF.Sin(x) / MathF.Cos(x);
// could be:
float z = MathF.Tan(x);```

This one doesn't really look useful but there can be more, need to check

See godbolt and sharplab playgrounds to compare all of these cases between .NET Core and clang/LLVM.

So we could have 3 modes: precise, mixed (current) and fast to set via some env variable e.g. `COMPlus_FpMode=fast` or an attribute `[FloatingPointMode(FloatingPointModeOptions.Fast)]` to be able to set the mode per method. Also, a runtime constant e.g.

```if (FloatingPointMode.IsFastMathEnabled && Sse2.IsSupported)
{
// simdify only when fast math is allowed
}```

The only problem - we would need a second version of `Math` internal calls compiled in `/fp:precise` mode. If it's a problem then two modes: mixed and fast.

PS: Since Mono has LLVM back-end all of these optimizations can be easily turned on there (but only globally):

``````mono --aot=llvm,llvmllc="-mcpu=haswell -fp-contract=fast" Program.exe
``````

UPD I wrote a blog post about different peephole optimizations: https://egorbo.com/peephole-optimizations.html

Contributor Author

### EgorBo commented May 26, 2019 • edited

 hm... so since #9369 was not merged `Math` internal calls are compiled with `/fp:fast` on Windows but have rather precise behavior on macOS/Linux? Also, not sure it's related to /fp:fast but: ```Console.WriteLine(MathF.Asinh(0.48549962f)); Console.WriteLine(MathF.Acos (0.57316583f)); Console.WriteLine(MathF.Log2 (1 / 3.0f));``` Output on Windows: ``````0.46820483 0.96043230 -1.5849625 `````` Output on macOS: ``````0.46820486 0.96043223 -1.5849624 `````` Both OSs have `dotnet --version` = `3.0.100-preview5-011568`

Member

Member

### tannergooding commented May 28, 2019

 I think, in general, it would be good to expose something like this long term. Developers have varying needs and sometimes precision is desired (and should be the default) and sometimes speed is desired instead. I think in general, it should be easy enough to allow optimizations at a per-method level and that for a method users should be able to impact how `System.Math` calls operate. What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries).
Member

### CarolEidt commented May 28, 2019

 I had thought we already had an issue along these lines, but I can't find it, so it probably doesn't yet exist. What isn't clear is how far that should be taken, such as if methods can opt into the caller's precision control when inlined (which may be desirable for some libraries). This is the tricky design issue, and not just for inlining but whether and how these decisions are made across methods, classes, assemblies, etc. I think a reasonable position can be taken and supported, but it will certainly require some design and discussion.

### EgorBo changed the title Configurable floating-point behaviorConfigurable floating-point behavior (Fast Math)Jul 14, 2019

referenced this issue Jul 25, 2019
referenced this issue Aug 19, 2019
referenced this issue Aug 20, 2019