Nexus: support supercell twists in PySCF workflows#5073
Conversation
|
Ready to proceed. |
|
could you suggest a reviewer? |
|
Yes, already flagged @anbenali for this. |
Oh thanks. I missed it. |
prckent
left a comment
There was a problem hiding this comment.
Noting that I am currently running the examples locally. These appear to be running OK. PySCF convergence tolerance is high so they take some minutes.
ntest_nexus_pyscf_input failed on me, but I have yet to investigate further
1452: Test name : pyscf_input
1452: Test sublabel : test_write
1452: Test exception: "AssertionError: "
1452: Test backtrace:
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 478, in run
1452: self.operation()
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 1210, in pyscf_input
1452: nunit('write')
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 349, in nunit
1452: run_external_unit_test(test_name,unit_test)
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
1452: unit_test()
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 577, in test_write
1452: assert(text_eq(text,ref_text))
1452: ^^^^^^^^^^^^^^^^^^^^^^
1452:
1452: Test status: fail
Hopefully unrelated, but note that my installation of graphviz can't produce PNGs, so this trips up a few tests all the time.
|
c4q does not complete successfully for me in both example cases. Additionally, Nexus doesn't notice the error and keeps checking status, this time overnight. (I have seen this on other occasions, so my guess is that there is some changed handling or error messages or signals that are not being caught by the workstation/"wsNN" infrastructure, perhaps with openmpi runs). These were run on nitrogen2 with the nightly test configuration for gcc "new"+openmpi. i.e. reasonably new versions of all software including python (3.11.9) installed via spack. Note the broken scf.h5 link. PySCF is 2.5.0 in this case. Happy to poke further -- this could well be completely unrelated to Nexus and something to do with the converter or a PySCF version dependency etc. c4q.err |
|
For the nexus test failure, please add two print statements after this line to investigate: For the examples added with this PR, These features have bug-fixes needed at the QMCPACK/QMCPACK-converter levels (I've made Anouar aware already). This PR implements the Nexus-side features needed to drive these workflows, but does not guarantee that QMCPACK and its converters function properly. |
|
Thanks Jaron -- the situation is clear to me now. @anbenali How far off are the updates to savetoqmcpack? I put PySCF 2.6.2 in spack so can easily test the latest version. In would be nice to have patch so that Nexus support can be tested, but perhaps there are puzzles to solve? |
|
At least one difference is that one entry in newly generated sp_kpoints array is not quite zero: Selected output from via Note that this is failing in GitHub actions CI as well -- the centos builds show the same issue. Presumably either something is not zeroed out or rounding is slightly different and a numerical tolerance is needed. |
| [0, 1]]) | ||
| for n,kp in enumerate(sp_kpoints): | ||
| savetoqmcpack(cell,mf,'scf.twistnum_{{}}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts[sp_kmap[n]],sp_twist=kp) | ||
| #end for |
There was a problem hiding this comment.
In order for this to work with kpoints in a ncsf style, we need to also pass the sp_kmap. The reason is that when running N kpts for example, mf method has N mo_energies and N mo_coeffs. but the tiling will only set a subset of kpts. we need then to use the index of the kptks provided by the sp_kmap.
savetoqmcpack(cell,mf,'scf.twistnum_{{}}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts[sp_kmap[n]],sp_twist=kp,kmap=sp_kmap[n])
|
Everything is working on my end. I fixed the files that did not work. Please check if I forgot something. |
|
Test this please |
|
^^^ To get some more variety in the test configurations run. Hopefully only the floating point rounding/tolerance issue needs to be tackled. Thanks Anouar. Great to have this path supported. |
There was a problem hiding this comment.
(Noting for ease of seeing the status on the PR summary)
The k-points tolerance issue should be addressed - besides CentOS failures, presumably this is what is causing the failures on sulfur (Intel RHEL9), since the same test is failing.
Definitely not needed for this PR, but from the point of view of reading CI failures, it would be very helpful to replace all instances of assert(text_eq(a,b)) with a custom function that gave actionable info by printing the failure and the a,b before aborting.
|
While driving. I realized that the QMC input requires explicitly the twist
coordinate to compute the phase to apply to the orbital. In the current
qmc input from nexus we get a twist number. I am not sure if that is
accessed in the H5 file or of of it os even stored. I can look at it in the
afternoon.
…On Thu, Jul 11, 2024, 8:21 AM Paul R. C. Kent ***@***.***> wrote:
***@***.**** requested changes on this pull request.
(Noting for ease of seeing the status on the PR summary)
The k-points tolerance issue should be addressed - besides CentOS
failures, presumably this is what is causing the failures on sulfur
(RHEL9), since the same test is failing.
Definitely not needed for this PR, but from the point of view of reading
CI failures, it would be very helpful to replace all instances of
assert(text_eq(a,b)) with a custom function that gave actionable info by
printing the failure and the a,b before aborting.
—
Reply to this email directly, view it on GitHub
<#5073 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF4XVQ2HFRLRQRMTTDU4BSLZL2BHLAVCNFSM6AAAAABKGISPDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZRHA4DOMRYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@anbenali There is an indentation bug (or more) in PyscfToQmcpack.py. I got the following running the diamond prim example: |
|
Interestingly what happens in that example is that on my system the pyscf output step fails as above, then the converter step runs and also fails, since the HDF5 files is not present. Somehow though, Nexus does catch the failures and the converter step runs forever. Maybe something in the workstation class (?) |
|
@jtkrogel: there are still errors to be fixed for the kpts case. Then if you have 2 twists generated, we pass the following "title='scf.twistnum_{}'.format(str(n).zfill(3))' But then it sends also Obviously scf.h5 does not exist as well. After fixing these by hand to push the simulation, unfortunately it still fails at the vmc with: I Think it would be faster for you to track the error. |
|
After discussion with Anoaur, I've confirmed that Nexus is now fully functional in this PR. Anouar plans to push another fix to PyscfToQmcpack.py. I will fix the unit test failure issue (which doesn't reproduce on my laptop) in a week or so after this PR goes in and I am back from vacation. This will require some updates to the Nexus testing system itself, as requested by Paul. |
|
If anyone wants to look at the test failure in the interim, it's likely that this will fix the issue (adds a 1e-8 absolute tolerance in addition to the default 1e-6 relative tolerance): |
|
Didn't know about that tolerance option. Will try it. If updating the tolerance that way works, we can probably merge this PR. The default relative tolerance is likely unreliable with zero as one of the values. We can't merge something that is known to break a deterministic test (recommended installation test 'ctest -L deterministic') and the CI without putting in a bypass (expected failure). This in turn would forgive other errors, so while PRs don't have to have a complete feature (and probably shouldn't if a large feature), they do need to pass in CI somehow. |
|
I remember some discussion long time ago that twistnum is error-prone and explicit twist coordinates are preferred. Can nexus move away from twistnum? |
|
Test this please |
There was a problem hiding this comment.
With the tolerance change, CI passes, so we can merge this. There are still homework items to work on in later PRs, but these look to be tied to the converter not Nexus.
Summarizing the current status:
- ctest -L deterministic passes . Core ntest_nexus* tests are good.
- nexus/examples/qmcpack/rsqmc_pyscf/01_h2o_hf_qmc works (was untouched)
- nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/diamond_pp_hf_gamma.py seems OK. Got through multiple rounds of wavefunction optimization before I halted it.
- nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/diamond_pp_hf_twistavg_prim.py aborts at c4q step. Error is not caught but c4q.err has
ERROR HDF5 read failure in hdf_archive::read numAO(???) - nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/diamond_pp_hf_twistavg.py Completed the conversion, generating 3 scf.twistnum_00n.h5 files and successfully started VMC. Interrupted at this point.
I can provide exact version numbers of all libraries used should that be helpful.
Proposed changes
This PR adds support for arbitrary supercell twists/twist grids in workflows involving PySCF and QMCPACK.
This PR is now ready for review.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
Laptop, Improv at ALCF
Checklist
Path out of WIP