Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test TeraChem MPI interface #62

Merged
merged 74 commits into from
Apr 28, 2021
Merged

Test TeraChem MPI interface #62

merged 74 commits into from
Apr 28, 2021

Conversation

danielhollas
Copy link
Contributor

@danielhollas danielhollas commented Feb 3, 2021

The primary goal here was to write tests for the TeraChem MPI interface implemented in src/force_tera.F90. There's also a separate interface for Surface Hopping defined in force_terash.F90, but that will be handled in a follow-up PR.

I also ended up doing a little refactor and separated some common functions from force_tera.F90 to tera_mpi_api.F90. The advantage is that they can now be more easily reused in force_terash.F90 and force_tera.F90 and force_terash.F90 are now independent. I also spend quite some time improving the error handling so the interface should now be more robust and easier to debug.

The tricky thing here is of course the TC part. We don't want to run real TC as part of our tests so I essentially ended up extracting the TeraChem side of the API from TeraChem source code and we compile and run it as a fake TC server as part of the tests (see key files below). Moreover, the TC server is compiled together with the q-TIP4P/Fw force field so it returns real forces and energies (of course only for pure water). I reused the same code that is part of ABIN itself and that we use in other tests (pot='mmwater'). Thanks to this trick, I was able to generate the test reference files (*ref) by simply running ABIN with qTIP4P force field, and that way I could verify that the fake TC server works correctly and returns correct energies and forces.

There are several tests:

  • tests/TERAPI: The most common use case, classical MD.
  • tests/TERAPI-PIMD: Mostly the same as above but with several PI beads. The only difference in terms of the API is that we pass a different scratch directory name for each bead so that each bead has it's separate scratch dir. (the fake TC server does not really populate the scratch directory. Instead, for testing purposes it creates a file with the same name as the scratch directory).
  • tests/TERAPI-PIMD-PARALLEL: Same as above, but we connect to several TC servers so that we can run different beads in parallel. This is controlled by the nteraservers variable in the ABIN input.
  • tests/TERAPI-FAILS : This is in fact a collection of different tests verifying different failure modes and error handling. Thanks to these tests I was able to achieve almost 100% test coverage (see Codecov report).

There are a lot of new files, but most of them are new test files. In any case, it's probably easier to check out this branch locally instead of browsing it in GitHub UI.

Key files:

  • src.force_tera.F90 and src/tera_mpi_api.F90
  • tests/tc_mpi_api.cpp, tests/tc_mpi_api.h - TeraChem side of the API
  • tests/TERAPI*/tc_server.cpp, the actual code for the server that uses the API and acts as TeraChem (executes in an MD loop).
  • each test is driven by it's own test.sh file. Common BASH helper function were extracted to tests/test_tc_server_utils.sh

TODO

  • Verify that everything works with actual TeraChem, both on Argon and Neon clusters.
  • Verify parallelization over multiple TC servers (with real TeraChem)
  • Figure out how to handle REMD in combination with TeraChem

@danielhollas danielhollas added the testing Any changes to Github Actions or testing scripts. label Feb 3, 2021
@danielhollas danielhollas self-assigned this Feb 3, 2021
@danielhollas danielhollas force-pushed the test-terapi branch 2 times, most recently from 4712bae to cd0f1af Compare February 3, 2021 20:39
@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #62 (4131a6a) into master (c6a7c67) will increase coverage by 5.05%.
The diff coverage is 77.98%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
+ Coverage   63.87%   68.93%   +5.05%     
==========================================
  Files          37       38       +1     
  Lines        5645     5676      +31     
==========================================
+ Hits         3606     3913     +307     
+ Misses       2039     1763     -276     
Impacted Files Coverage Δ
src/force_abin.F90 64.06% <ø> (ø)
src/force_terash.F90 0.00% <0.00%> (ø)
src/init.F90 66.13% <88.88%> (+4.41%) ⬆️
src/force_tera.F90 95.52% <97.05%> (+95.52%) ⬆️
src/tera_mpi_api.F90 97.66% <97.66%> (ø)
src/abin.F90 83.06% <100.00%> (-0.14%) ⬇️
src/arrays.F90 69.09% <100.00%> (ø)
src/forces.F90 83.83% <100.00%> (+1.19%) ⬆️
src/io.F90 71.05% <100.00%> (+42.10%) ⬆️
src/modules.F90 56.46% <100.00%> (+1.15%) ⬆️
... and 7 more

@danielhollas danielhollas force-pushed the test-terapi branch 2 times, most recently from 4b0b644 to 9e250ec Compare February 4, 2021 04:58
@@ -14,6 +14,9 @@ env:
CURL_OPTS: -S --no-silent
CODECOV_OPTIONS: -Z -X coveragepy -X xcode

# FFLAGS for building ABIN, applicable for most jobs
ABIN_FFLAGS: -O0 -fopenmp -Wall --coverage -ffpe-trap=invalid,zero,overflow,denormal
Copy link
Contributor Author

@danielhollas danielhollas Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only job where we don't use this is the optimized_build

src/force_tera.F90 Outdated Show resolved Hide resolved
Also check for the received count for each MPI_Recv call
in TCServerMock. MPI fails automatically if it receives
more then specified, but allows to receive less than the
buffer capacity so we need to check that manually.
I wonder what else should we be checking.
There's a bug in hydra_nameserver where
it crashes when multiple TC servers call MPI_Unpublish_name
pmodels/mpich#5058

Hence, we're passing TC port names to ABIN via files.
@danielhollas danielhollas changed the title WIP: Test TC-MPI interface Test TeraChem MPI interface Feb 23, 2021
@danielhollas danielhollas marked this pull request as ready for review March 22, 2021 02:54
@danielhollas
Copy link
Contributor Author

danielhollas commented Mar 22, 2021

@suchanj feel free to spend as much or as little time on this. :-) I tried to write up the general gist in the PR description, feel free to ask if anything is not clear.

In the meantime I will have to resolve conflicts with master and do a bit of testing with real TeraChem on NEON.

danielhollas and others added 2 commits March 22, 2021 12:50
 Conflicts:
	codecov.yml
	src/abin.F90
	src/arrays.F90
	src/force_abin.F90
	src/force_tera.F90
	src/force_terash.F90
	src/forces.F90
	src/init.F90
	src/io.F90
	src/modules.F90
	src/read_cmdline.F90
	tests/test.sh
@suchanj
Copy link
Contributor

suchanj commented Mar 23, 2021

What a chonky PR! Description nicely sums it up, a great step forward.

@danielhollas danielhollas merged commit c86dc82 into master Apr 28, 2021
@danielhollas
Copy link
Contributor Author

There's a remaining TODO related to REMD, but I'll fix that in a separate PR.

@danielhollas danielhollas deleted the test-terapi branch May 30, 2021 18:34
danielhollas added a commit that referenced this pull request Feb 4, 2022
While the Amber MPI interface that we use for classical MD
expects coordinates in angstromgs, the FMS interface expects bohr,
which I didn't notice when I was refactoring this code.
I really need to write those tests for the TC-SH interface.
danielhollas added a commit that referenced this pull request Feb 10, 2022
While the Amber MPI interface that we use for classical MD
expects coordinates in angstroms, the FMS interface expects bohrs,
which I didn't notice when I was refactoring this code.
I really need to write those tests for the TC-SH interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Any changes to Github Actions or testing scripts.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants