-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn off FMA option for the intel and nvhpc compiler #121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this in the machine specific file instead of the generic intel.cmake file?
I guess the options like "--host=cray" and "-march=core-avx2" are specific to Derecho (Cray machine and AMD CPU)? |
Can I push that change back to your PR? Move no-fma flag to generic intel.cmake file. |
Oh, sorry that I misunderstand your point. You mean we should turn off FMA whenever we use the intel compiler on any machine? That is fine with me but I am not sure if other people want to leave it on for their machines (e.g., people do no use ECT for verification). |
Machines like constance and izumi all turn on FMA by default. Moving |
Are we running ect on those systems? Seems like the ect test confirms that we should not be using fma on any system. |
I agreed that based on ECT, we should not use FMA for any system (intel, nvhpc, etc). I do not have access to those systems but I assume that they will run some regressions tests like |
If it fails the ECT with FMA on, I'd be inclined to think it should be off
everywhere, no? Not just the systems where we run the ECT. Failing means
it's a statistically different climatology.
- Brian
…On Fri, Sep 15, 2023 at 12:04 PM Jian Sun ***@***.***> wrote:
I agreed that based on ECT, we should not use FMA for any system. I do not
have access to those systems but I assume that they will run some
regressions tests like aux_cam for BFB. If we add FMA to the generic
file, their tests will fail and they may not want that if they do not trust
ECT like we do. This is my concern.
—
Reply to this email directly, view it on GitHub
<#121 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL2HPJ7CNWF2QOFWVSEC2DX2SKEHANCNFSM6AAAAAA42C76F4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
We will mark the PR as answer changing so that people understand that baselines may fail. |
Thanks @jedwards4b and @briandobbins for your comments. That sounds good to me. My last comment is how we could make sure that |
A quick test on Derecho by moving If anyone finds out an issue about the changes here, they can always modify the flags specifically for their own system. |
endif() | ||
if (DEBUG) | ||
string(APPEND CFLAGS " -O0 -g") | ||
string(APPEND CFLAGS " -O0 -g -no-fma") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you shouldn't do this seperately for DEBUG=TRUE vs DEBUG=FALSE. You can add it once at line 1 or if you want to make sure it's after the other flags add another line after 10
string(APPEND CFLAGS "-no-fma")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jim. I just updated those files.
It looks like the FMA option was also causing my ERP/PEM/ERC tests to fail. I still need to do more ERP/PEM/ERC testing. |
That totally makes sense since changing task count with fma enabled will change answers. |
@jedwards4b can you explain more about why changing task count with FMA enabled will change answers? I thought changing task count just affected the domain decomposition. |
I misspoke there - I was thinking about vector math and how using different pelayouts changes the length of the vectors - but that doesn't have anything to do with FMA. |
Thanks Jim for your clarification. I assumed Chris was doing the ERP test on Derecho and my understanding was that its failure should not be caused by the FMA option. But it seems that turning off FMA on Derecho passes the ERP test somehow? |
Maybe we should confirm that with another test? |
Agreed and I think Chris is already working on it? I would like to let Chris update more details here in case I have a misunderstanding. |
I reran the prealpha tests over the weekend. These following tests were all failing before FMA was turned off.
|
Thanks Chris for posting these new results. So turning off FMA does help pass the ERP tests on Derecho somehow. Are there any other type of tests besides ERP that also pass due to the disablement of FMA? |
This PR turns off the FMA option by default to the intel and nvhpc compilers. Based on the ensemble consistency test (ECT), turning on FMA is likely to generate a statistically different climatology.
This makes sure that the ECT is passed when using the
intel/2023
compiler on Derecho for the test simulations and comparing the results to the baseline generated on Cheyenne withintel/19.1.1
. See more discussions here: ESCOMP/CAM#883.