Optimization with 'fast math' produces incorrect results and segfaults #1155

ischoegl · 2021-12-08T03:36:17Z

Problem description

As reported in #1150, compilation with default optimization options for the Intel compiler suite icx/icpx results in incorrect results. The behavior can be reproduced for gcc with the -ffinite-math-only option (which is one of the -ffast-math flags), e.g. for a vanilla gcc toolchain on Ubuntu 20.04:

$ scons build optimize_flags="-O3 -ffinite-math-only"

The option breaks strict IEEE compliance, as it does "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." There are some instances where Cantera uses NaN internally which presumably breaks 'fast math' optimizations.

Steps to reproduce

Compile as indicated above.
An attempt to run the test suite results in numerous segfaults.
Output for gcc with fast math (same as for icx in Optimized Cantera build with the LLVM-based Intel compilers do not work correctly #1150) ...

In [3]: gas2 = ct.Solution('gri30.yaml')

In [4]: gas2.partial_molar_cp
Out[4]: 
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.])

In [5]: gas2.partial_molar_entropies
Out[5]: 
array([-1.84618157e-12,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06,  5.74342730e+06,  5.74342730e+06,  5.74342730e+06,
        5.74342730e+06])

Behavior

System information

Cantera version: 2.5.1, 2.6.0a3
OS: Ubuntu 20.04 (and presumably others)
Python/MATLAB/other software versions: gcc / icx

Additional context

See initial issue report Optimized Cantera build with the LLVM-based Intel compilers do not work correctly #1150; a new issue is opened as the behavior is not compiler-dependent.
In case the issue is worth pursuing, some workarounds need to be found for {fmt} output, which currently produces numerous warnings related to NaN compliance.
An alternative check for NaN can be implemented, see SO

The text was updated successfully, but these errors were encountered:

bryanwweber · 2021-12-08T15:16:04Z

@ischoegl and @tpg2114, thanks for digging in to this! Does the -ffast-math actually improve performance for standard workloads that we run? If it does, then this is probably worth digging in to. Otherwise, it might be best to leave the code as-is.

ischoegl · 2021-12-08T15:26:11Z

Does the -ffast-math actually improve performance for standard workloads that we run?

Based on my understanding, yes. But as IEEE 754 is not enforced (which is where performance gains come from), this may make for some 'fun' refactoring. I mainly created this issue to make sure that users are aware of the limitation. It's somewhat puzzling that icx enables -fp-model fast by default. FWIW, here's a blog post that I found informative and a SO answer, but I didn't dig very deep.

bryanwweber · 2021-12-09T02:48:02Z

Based on my understanding, yes.

I'm not asking you to do this, but if anyone decides to pursue this issue, it'd be nice to have concrete benchmarks that adding these flags does improve real-world performance that justifies the (presumably) increased code complexity to handle this case.

ischoegl · 2021-12-09T03:19:59Z

I'm not asking you to do this …

no worries. I’ll pass on this one 😜

speth · 2021-12-21T19:30:21Z

I investigated this a bit, using Clang++ on Ubuntu 20.04. The only flag that's part of -ffast-math that is a significant problem is -ffinite-math-only. With the rest of the flags enabled by specifying optimize_flags=-O3 -ffast-math -fno-finite-math-only`, I get 4 test failures:

python: test_kinetics.TestReaction.test_Blowers_Masel_change_enthalpy
python: test_reaction.TestBlowersMaselRate.test_from_parts
VCS-LiSi-verbose
cxx-kinetics1

Where the first two at least are just an issue of assertEquals being used where assertNear would be a better choice, and I think the latter two can be resolved without too much difficulty.

Enabling -ffinite-math-only requires eliminating any check that relies on isnan working or even the comparison nan != <some number> returning false. I made some really crude changes to do this in this commit on my fork: speth@5abdaa8, and did some benchmarking of an ignition delay problem using a couple different mechanisms. For these tests, I used Eigen for the linear algebra and the vendored copy of Sundials, so this is about the maximum impact that these flags can have, since there is very little outside code. I tested both GRI 3.0 and a larger mechanism with ~400 species. What I found was:

Compared to the baseline (-O3, using -O3 -ffast-math -fno-finite-math-only) is about 2% faster in terms of time steps per second.
Compared to the baseline, -O3 -ffast-math is about 4% faster in terms of time steps per second
The differences in the calculations lead to the number of time steps needed for an individual simulation to vary unpredictably, with differences in individual simulations of up to 10%, with the average around 2.5%, so it's hard to say that these optimizations necessarily lead to higher performance on a wall-time basis.

Given the unsatisfactory nature of the changes required to support using -ffinite-math-only and the relatively small performance gains, my recommendation is that we add a configuration-time check for whether isnan works correctly and if not, abort compilation with an error message stating that Cantera doesn't work with this flag.

ischoegl · 2021-12-21T23:29:28Z

Thanks for looking into this further, @speth! The two non-python tests look familiar, and I agree that assertEquals is easy to fix (should be done regardless).

As an aside, one thing I noticed in my own tests was a plethora of warnings coming out of fmt via AnyMap. In addition, there were also several 'legacy' fmt use cases that caused annoying warnings. Fixing what causes these fmt warnings may be the largest issue on hand, as I don't really like the nuclear option that disables these warnings.

Overall, I agree that it probably doesn't make sense to make this a priority. At the same time, it probably also makes sense to avoid using NaN as a sentinel value (which I have recently used), and to avoid exact / bit-wise equality checks in new code. It should be relatively simple to replace the NaN checks in the new reaction rate evaluators, which probably should happen prior to 2.6. In other words, what I'm arguing for is not to make it a priority to fix, but also not to make it harder to fix going forward.

speth · 2021-12-22T02:28:59Z

I don't see any warnings related to fmt with either GCC or Clang, so I guess that's specific to the Intel compiler.

I would argue that most of the ways that we use NaN in Cantera are very reasonable, and that the alternatives are often worse. For instance, in the modification I made to the CachedValue class to run with -ffinite-math, the initial cache check value now has to be some arbitrary but finite number. And while it's unlikely that the a cached value would be checked against this initial value and return an erroneous result, it's not impossible.

ischoegl · 2021-12-22T02:50:11Z

I don't see any warnings related to fmt with either GCC or Clang, so I guess that's specific to the Intel compiler.

That sounds plausible.

I would argue that most of the ways that we use NaN in Cantera are very reasonable, and that the alternatives are often worse.

I tend to agree: using NaN as a sentinel is efficient from a coding perspective. But apparently we're forcing the code to check for this possibility, which is less ideal from a computational perspective (although the penalty appears to be small). I also don't think that we need to avoid NaN as an output value, it's just that there probably need to be internal booleans that replace the checks. Overall, I don't think that there's any urgency to 'fix' this issue ... although it probably needs to remain open.

Fixes Cantera#1155

Fixes #1155

g3bk47 · 2021-12-23T22:36:20Z

@speth @bryanwweber @ischoegl Although the issue has been closed, I am reporting some additional data: I ran a small test program benchmarking the computation of reaction rates with 16 different compilers/versions, each with and without fast-math. Quick summary: using fast-math, g++ becomes about 15 % faster and the Intel compilers less than 5 % faster. The relative accuracy in my test is within 10^-10 %. However, even without fast-math, final results for reaction rates between g++/clang++ and icpc/icpx are slightly different, which I cannot explain at the moment. For all details, see here:
https://github.com/g3bk47/CanteraCompilerPerformance
Let me know if I should benchmark any other code snippets with this setup.

ischoegl · 2021-12-24T00:46:10Z

@g3bk47 … thanks for those results! Your comparison suite is impressive. From my perspective, a performance gain of up to 15% would be worth pursuing. While segfaults and erroneous results are fixed here (I.e. those issues are addressed), I believe this warrants an enhancement request?

Fixes #1155

ischoegl mentioned this issue Dec 8, 2021

Add GH Actions runners for mingw and Intel compiler #1144

Merged

5 tasks

ischoegl added the compiling label Dec 8, 2021

speth added a commit to speth/cantera that referenced this issue Dec 22, 2021

[SCons] Require a working implementation of std::isnan

a83944f

Fixes Cantera#1155

speth mentioned this issue Dec 22, 2021

Prevent errors due to unsafe math optimizations #1161

Merged

5 tasks

speth added a commit to speth/cantera that referenced this issue Dec 22, 2021

[SCons] Require a working implementation of std::isnan

069fb63

Fixes Cantera#1155

speth closed this as completed in #1161 Dec 22, 2021

speth added a commit that referenced this issue Dec 22, 2021

[SCons] Require a working implementation of std::isnan

9daebd9

Fixes #1155

g3bk47 mentioned this issue Dec 27, 2021

Make Cantera compatible with -ffast-math Cantera/enhancements#125

Open

bryanwweber pushed a commit that referenced this issue Jan 13, 2022

[SCons] Require a working implementation of std::isnan

978de0c

Fixes #1155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization with 'fast math' produces incorrect results and segfaults #1155

Optimization with 'fast math' produces incorrect results and segfaults #1155

ischoegl commented Dec 8, 2021 •

edited

bryanwweber commented Dec 8, 2021

ischoegl commented Dec 8, 2021 •

edited

bryanwweber commented Dec 9, 2021

ischoegl commented Dec 9, 2021 •

edited

speth commented Dec 21, 2021

ischoegl commented Dec 21, 2021

speth commented Dec 22, 2021

ischoegl commented Dec 22, 2021 •

edited

g3bk47 commented Dec 23, 2021

ischoegl commented Dec 24, 2021 •

edited

Optimization with 'fast math' produces incorrect results and segfaults #1155

Optimization with 'fast math' produces incorrect results and segfaults #1155

Comments

ischoegl commented Dec 8, 2021 • edited

bryanwweber commented Dec 8, 2021

ischoegl commented Dec 8, 2021 • edited

bryanwweber commented Dec 9, 2021

ischoegl commented Dec 9, 2021 • edited

speth commented Dec 21, 2021

ischoegl commented Dec 21, 2021

speth commented Dec 22, 2021

ischoegl commented Dec 22, 2021 • edited

g3bk47 commented Dec 23, 2021

ischoegl commented Dec 24, 2021 • edited

ischoegl commented Dec 8, 2021 •

edited

ischoegl commented Dec 8, 2021 •

edited

ischoegl commented Dec 9, 2021 •

edited

ischoegl commented Dec 22, 2021 •

edited

ischoegl commented Dec 24, 2021 •

edited