Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce disk I/O operation in cmake #2808

Merged
merged 5 commits into from
Dec 16, 2020
Merged

Conversation

ye-luo
Copy link
Contributor

@ye-luo ye-luo commented Dec 9, 2020

Proposed changes

When generating a test in build folder, all the files in a source test folder are symlinked to the destination.
This is largely unnecessary. Each test only needs a single input xml file. So this PR does selective copy/symlink.
It also keep the test folder much cleaner.

On my workstation with an SSD, cmake time from scratch reduces from 19 sec to 14 sec.

What type(s) of changes does this code introduce?

  • Build related changes

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Bora, Ryzen-box

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

@prckent
Copy link
Contributor

prckent commented Dec 9, 2020

A cursory reading suggests this adds an unstated set of naming requirements to efficiently setup tests. This complicates the setting up of new tests, and I am also not sure what this optimization means for our more complex tests. Generally I am only in favor of improving the ease of setting up tests. 1/3 speedup is a nice improvement, but not revolutionary and the complexity is now higher. Is the tradeoff going to be worth it?

@ye-luo
Copy link
Contributor Author

ye-luo commented Dec 9, 2020

A cursory reading suggests this adds an unstated set of naming requirements to efficiently setup tests. This complicates the setting up of new tests, and I am also not sure what this optimization means for our more complex tests. Generally I am only in favor of improving the ease of setting up tests. 1/3 speedup is a nice improvement, but not revolutionary and the complexity is now higher. Is the tradeoff going to be worth it?

Currently, when I see a an xml file, I really need to spend seconds to figure out if a file is a control file or wavefunction file. In addition, each testing folder has all the input files of 20 testing cases. This is the what I care most.
before the change:

build_gnu_real/tests/solids/diamondC_2x1x1_pp/deterministic-diamondC_2x1x1_pp-vmc_sdj-1-1$ ls
C.BFD.xml                                       pwscf.pwscf.h5                   qmc_short_sdbatch_vmcbatch_4b.in.xml
CMakeLists.txt                                  qmc_long_excited.in.xml          qmc_short_vmcbatch_4b_asynctwf.in.xml
det_qmc_short_delayedupdate.in.xml              qmc_long.in.xml                  qmc_short_vmcbatch_4b.in.xml
det_qmc_short_excited.in.xml                    qmc_long_vmc_dmc_excited.in.xml  qmc_short_vmcbatch_dmcbatch_4r.in.xml
det_qmc_short.in.xml                            qmc_long_vmc_dmc.in.xml          qmc_short_vmcbatch_dmcbatch.in.xml
det_qmc_short_sdbatch_vmcbatch_mwalkers.in.xml  qmc_long_vmc_dmc_reconf.in.xml   qmc_short_vmcbatch.in.xml
det_qmc_short_vmcbatch.in.xml                   qmc-ref                          qmc_short_vmc_dmc_4r.in.xml
det_qmc_short_vmcbatch_mwalkers.in.xml          qmc_short_delayedupdate.in.xml   qmc_short_vmc_dmc_excited.in.xml
det_qmc_short_vmc_dmc_excited.in.xml            qmc_short_excited.in.xml         qmc_short_vmc_dmc.in.xml
det_qmc_short_vmc_dmc.in.xml                    qmc_short_hybridrep.in.xml       qmc_short_vmc_dmc_reconf.in.xml
dft-inputs                                      qmc_short.in.xml

Most people cannot figure out easily which in.xml was used for running the test. After this PR.

build_gnu_real_fast/tests/solids/diamondC_2x1x1_pp/deterministic-diamondC_2x1x1_pp-vmc_sdj-1-1$ ls
C.BFD.xml  det_qmc_short.in.xml  pwscf.pwscf.h5  qmc-ref

I think we do have enough flexibility, qmc-ref softlink is always added.

I would say it motivates mandating more strict naming rules for files. Most unexpected file names are from old tests. File names from converters are standardized. Now making a new test requires naming and placing files more carefully, I see it positive push to keep testing folder more organized. For example, Ni.opt_L2.xml it can be a NLPP file or it also looks like a calculation input XML to me.

The speed-up is more of a by-product. If I check timing on invoking on pre-existing build folders, the timing is 10 sec vs 16 second. Most time are spent in the python script adding test labels.

@prckent
Copy link
Contributor

prckent commented Dec 9, 2020

I think it would be helpful to have a known current/accurate (but short!) write up of test requirements and how to add new tests in the developer section of the manual. Then we could all discuss from a point of common knowledge.

I agree on the test labels, although they are definitely another topic. A different strategy for the test setup and python interpreter invocations and file checks could lead to large speedups.

@prckent
Copy link
Contributor

prckent commented Dec 9, 2020

If there are filename requirements, they should be checked for where possible and error trapped. e.g. No XML file -> hard abort with error.

@ye-luo
Copy link
Contributor Author

ye-luo commented Dec 9, 2020

If there are filename requirements, they should be checked for where possible and error trapped. e.g. No XML file -> hard abort with error.

Developers need to agree naming rules and then add checks against them by QMCPACK.

@ye-luo
Copy link
Contributor Author

ye-luo commented Dec 16, 2020

@prckent I added "Integration Tests" in the manual with some high level summary and all the contents of https://github.com/QMCPACK/qmcpack/wiki/How-to-add-tests Let me know if more things are needed to complete this PR.

@prckent
Copy link
Contributor

prckent commented Dec 16, 2020

Confirming that I can build the docs locally with my combination of sphinx tools. I get the usual errors about duplicate citations and a note about the bullet lists in methods.rst, also not new.

  • The documentation problems appear to be an issue with a major new version of sphinxcontrib-bibtex that is not backwards compatible.
  • Per Example for using v2.X with Read the Docs mcmtroffaes/sphinxcontrib-bibtex#214, there is not yet a convenient way of using the 2.0 sphinxcontrib-bibtex version with readthedocs. e.g. We would have to cache a build artifact, which is not practical for us.
  • One workaround is to force use of <2.0. This will have to be addressed separately, so for now we will have to test doc updates by hand. Since lots of projects use these combinations of dependencies, hopefully there will be a simple path forward that also allows use of the 2.x version and its improved features.

Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ye!

I expanded the documentation a little. Once this is in the wiki page can be deleted.

@ye-luo ye-luo merged commit 6e593f1 into QMCPACK:develop Dec 16, 2020
@ye-luo ye-luo deleted the fast-cmake branch December 16, 2020 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants