New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve QuantumESPRESSO easyblock by cleaning up and extending configure step + running test suite #3241
improve QuantumESPRESSO easyblock by cleaning up and extending configure step + running test suite #3241
Conversation
30dd9fc
to
4727bef
Compare
Co-authored-by: ocaisa <alan.ocais@cecam.org>
…uild-easyblocks into feature-improve_qe_eblock
Co-authored-by: ocaisa <alan.ocais@cecam.org>
Co-authored-by: ocaisa <alan.ocais@cecam.org>
Co-authored-by: ocaisa <alan.ocais@cecam.org>
Co-authored-by: ocaisa <alan.ocais@cecam.org>
@ocaisa Just checked those failures in the bot run. Apparently QE implemented the NPROCS for the test_suite in 7.2 and not 7.0. Before that only run with |
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 7 (7 easyconfigs in total) |
@boegelbot please test @ generoso |
@ocaisa: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1986953955 processed Message to humans: this is just bookkeeping information for me, |
@boegelbot please test @ jsc-zen3 |
@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1988182263 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 7 out of 15 (7 easyconfigs in total) |
@Crivella I've got passing tests almost everywhere now (here and in easybuilders/easybuild-easyconfigs#20070), apart from an issue with the software stack for version 7.0 (which has nothing to do with this PR). I'll do a final review now and then merge |
'ph_ahc_diam', # Test detects a ! as an energy in baseline | ||
'tddfpt_magnons_fe', # Too strict thresholds | ||
], "List of test suite targets that are allowed to fail (name can partially match)", CUSTOM], | ||
'test_suite_threshold': [0.97, "Threshold for test suite success rate", CUSTOM], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not that comfortable with giving a %
threshold here, I'd prefer to give a specific number (with the default being zero). The we explicitly number the failures in the easyconfig, expecting it to be version (and perhaps toolchain) specific. Pytorch uses
'max_failed_tests': [0, "Maximum number of failing tests", CUSTOM],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the testing I've done, I have the exact numbers for almost everything already, I just need to dig them out. We can then add them to the easyconfigs and do a final rerun of the builds. If we do a rerun of builds, do you think we should add the fast-math
as well to see how we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the falkyness of some tests failing just because the absolute/relative errors are slightly higher than the thresholds sets in the baselines, i think this could be tricky.
What I am mostly worried about is that if that number is not carefully curated we might be missing some segfaults that could arise.
This is why before i added what should be the flaky test to test_suite_allow_failures
and raised (removed in commit f182aea) if a test not in that list failed (most likely it would fail because the calculation does not actually finish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the testing I've done, I have the exact numbers for almost everything already, I just need to dig them out. We can then add them to the easyconfigs and do a final rerun of the builds. If we do a rerun of builds, do you think we should add the
fast-math
as well to see how we do?
Sure we could try. This night i just finished testing an easyconfig for 7.3 (will open a PR for it soon) with the option. I think it should work for all versions just by adding
'extra_cflags': '-ffast-math',
'extra_fflags': '-ffast-math',
'extra_fcflags': '-ffast-math',
'extra_f90flags': '-ffast-math',
to the toolchain options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the falkyness of some tests failing just because the absolute/relative errors are slightly higher than the thresholds sets in the baselines, i think this could be tricky. What I am mostly worried about is that if that number is not carefully curated we might be missing some segfaults that could arise. This is why before i added what should be the flaky test to
test_suite_allow_failures
and raised (removed in commit f182aea) if a test not in that list failed (most likely it would fail because the calculation does not actually finish)
But i guess we can use the failures
array to check the number of failures without the ignored ones.
I would still leave the threshold on the total number (without the ignored ones) though as the ignored tests including relax
actually could be excluding a non trivial number of tests failures
Keep in mind that |
@boegelbot please test @ jsc-zen3 |
@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1999632000 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 6 (6 easyconfigs in total) |
@boegelbot please test @ jsc-zen3 |
@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2005027818 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 3 (3 easyconfigs in total) |
This has been extensively tested here and via easybuilders/easybuild-easyconfigs#20070 thanks for all the effort @Crivella |
This PR, opened in relation to issue #3234, changes the behavior of the QuantumESPRESSO easyblock in order to make sure the correct compilation flags are used for the majority of versions from 5.x to 7.x.
Also the inclusion of flags has been modularized in order to be easier to mange
The new easyblock has been tested by compiling several QE versions, starting from the latest available
intel
andfoss
recipes and using the--try-software-version
flag to install different software.When a valid version of the
HDF5
LibXC
orELPA
libraries was not readily available it has been manually disabled from the original config file.In details:
NOTE: Due to a problem with compiling and running QE with the
intel
toolchain +openmp
,openmp
was disabled when using theintel2022b
Below the results of a small reframe test (see PR) used to check that all codes are able to reach the end without any errors or segfaults.
(The version was scaled down only to check that all codes reach the
JOB DONE
line, considering how small this calculation was, the timings themself are not very significative)