Re-enable REMD with TC-MPI interface#68
Conversation
| write (*, *) | ||
| if (iqmmm == 3 .or. pot == 'mm') write (*, nml=qmmm) | ||
| write (*, *) | ||
| if (inose >= 1) then |
There was a problem hiding this comment.
Unrelated cleanup. In general I am getting rid of one line if statements since they don't play nicely with code coverage (as can be seen by the Codecov warning below).
| #ifdef USE_MPI | ||
| call MPI_Barrier(MPI_COMM_WORLD, ierr) | ||
| write (*, '(A,I0,A,I0)') 'MPI rank: ', my_rank, ' PID: ', GetPID() | ||
| call MPI_Barrier(MPI_COMM_WORLD, ierr) |
There was a problem hiding this comment.
Just for a nicer output.
| @@ -0,0 +1,37 @@ | |||
|
|
|||
There was a problem hiding this comment.
I took this file and the reference files from the REMD test.
Codecov Report
@@ Coverage Diff @@
## master #68 +/- ##
==========================================
+ Coverage 68.93% 68.97% +0.03%
==========================================
Files 38 38
Lines 5676 5689 +13
==========================================
+ Hits 3913 3924 +11
- Misses 1763 1765 +2
|
a98d266 to
af80eba
Compare
|
@suchanj can you take a look? The only (small) functional change is in |
|
@suchanj I am trying to test this on NEON but with no luck. :-( It looks like some bad interaction between MPI and SGE, but this is the first time that I see this. I've tried multiple MPICH versions with no luck. Have you encountered this before? I was only able to test on |
This has worked before but was removed in the previous commit. Here I add it back and add a test. The current approach is a bit hacky and requires nreplica == number of TC servers.
|
@danielhollas I haven't seen this error before. It is also not very specific... Well, this functionality is not critical so we might wait for new cluster to be installed. I reported the n26 SGE CUDA error. (PS: I also remember having minor doubts about future r.terabin functionality based on previous commits, just don't remember why.) |
|
@suchanj thanks. I'm going to merge this but will need to figure it out. The problem is that I am not able to run normal MD either. It might also be an issue with the fact the TeraChem is compiled with different MPICH version, I am not sure. |
This has worked before but was removed in the previous commit. Here I add it back and add a test.
The current approach is a bit hacky, and requires that the number of TC servers is exactly equal to number of replicas ,
as each MPI process is connecting to its own TC server.
This is untenable for larger number of replicas so we might need to improve upon this in the future.
(note that for technical reasons,
nteraserversmust be 1 in the ABIN input). Though maybe we should just ignore the setting and set it automatically in the code?)