-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test TeraChem MPI interface #62
Conversation
4712bae
to
cd0f1af
Compare
cd0f1af
to
8cd4bd9
Compare
Codecov Report
@@ Coverage Diff @@
## master #62 +/- ##
==========================================
+ Coverage 63.87% 68.93% +5.05%
==========================================
Files 37 38 +1
Lines 5645 5676 +31
==========================================
+ Hits 3606 3913 +307
+ Misses 2039 1763 -276
|
This reverts commit fb5ee80.
4b0b644
to
9e250ec
Compare
9e250ec
to
abada1d
Compare
@@ -14,6 +14,9 @@ env: | |||
CURL_OPTS: -S --no-silent | |||
CODECOV_OPTIONS: -Z -X coveragepy -X xcode | |||
|
|||
# FFLAGS for building ABIN, applicable for most jobs | |||
ABIN_FFLAGS: -O0 -fopenmp -Wall --coverage -ffpe-trap=invalid,zero,overflow,denormal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only job where we don't use this is the optimized_build
Also check for the received count for each MPI_Recv call in TCServerMock. MPI fails automatically if it receives more then specified, but allows to receive less than the buffer capacity so we need to check that manually. I wonder what else should we be checking.
96bdb4b
to
cc4181f
Compare
2c6a972
to
811bf8c
Compare
There's a bug in hydra_nameserver where it crashes when multiple TC servers call MPI_Unpublish_name pmodels/mpich#5058 Hence, we're passing TC port names to ABIN via files.
49b2ede
to
ff87382
Compare
I'm getting segfaults from MPI_Comm_connect, maybe a bug in MPICH? On the other hand, we try to parallelize the TC initialization, sending number of atoms and atom types at the beginning.
da4a78d
to
bdbed14
Compare
@suchanj feel free to spend as much or as little time on this. :-) I tried to write up the general gist in the PR description, feel free to ask if anything is not clear. In the meantime I will have to resolve conflicts with |
Conflicts: codecov.yml src/abin.F90 src/arrays.F90 src/force_abin.F90 src/force_tera.F90 src/force_terash.F90 src/forces.F90 src/init.F90 src/io.F90 src/modules.F90 src/read_cmdline.F90 tests/test.sh
What a chonky PR! Description nicely sums it up, a great step forward. |
There's a remaining TODO related to REMD, but I'll fix that in a separate PR. |
While the Amber MPI interface that we use for classical MD expects coordinates in angstromgs, the FMS interface expects bohr, which I didn't notice when I was refactoring this code. I really need to write those tests for the TC-SH interface.
While the Amber MPI interface that we use for classical MD expects coordinates in angstroms, the FMS interface expects bohrs, which I didn't notice when I was refactoring this code. I really need to write those tests for the TC-SH interface.
The primary goal here was to write tests for the TeraChem MPI interface implemented in
src/force_tera.F90
. There's also a separate interface for Surface Hopping defined inforce_terash.F90
, but that will be handled in a follow-up PR.I also ended up doing a little refactor and separated some common functions from
force_tera.F90
totera_mpi_api.F90
. The advantage is that they can now be more easily reused inforce_terash.F90
andforce_tera.F90
andforce_terash.F90
are now independent. I also spend quite some time improving the error handling so the interface should now be more robust and easier to debug.The tricky thing here is of course the TC part. We don't want to run real TC as part of our tests so I essentially ended up extracting the TeraChem side of the API from TeraChem source code and we compile and run it as a fake TC server as part of the tests (see key files below). Moreover, the TC server is compiled together with the q-TIP4P/Fw force field so it returns real forces and energies (of course only for pure water). I reused the same code that is part of ABIN itself and that we use in other tests (
pot='mmwater'
). Thanks to this trick, I was able to generate the test reference files (*ref
) by simply running ABIN with qTIP4P force field, and that way I could verify that the fake TC server works correctly and returns correct energies and forces.There are several tests:
tests/TERAPI
: The most common use case, classical MD.tests/TERAPI-PIMD
: Mostly the same as above but with several PI beads. The only difference in terms of the API is that we pass a different scratch directory name for each bead so that each bead has it's separate scratch dir. (the fake TC server does not really populate the scratch directory. Instead, for testing purposes it creates a file with the same name as the scratch directory).tests/TERAPI-PIMD-PARALLEL
: Same as above, but we connect to several TC servers so that we can run different beads in parallel. This is controlled by thenteraservers
variable in the ABIN input.tests/TERAPI-FAILS
: This is in fact a collection of different tests verifying different failure modes and error handling. Thanks to these tests I was able to achieve almost 100% test coverage (see Codecov report).There are a lot of new files, but most of them are new test files. In any case, it's probably easier to check out this branch locally instead of browsing it in GitHub UI.
Key files:
src.force_tera.F90
andsrc/tera_mpi_api.F90
tests/tc_mpi_api.cpp
,tests/tc_mpi_api.h
- TeraChem side of the APItests/TERAPI*/tc_server.cpp
, the actual code for the server that uses the API and acts as TeraChem (executes in an MD loop).test.sh
file. Common BASH helper function were extracted totests/test_tc_server_utils.sh
TODO