Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning and broadcasting _replica_thermodynamic_states #562

Merged
merged 4 commits into from Mar 30, 2022

Conversation

ijpulidos
Copy link
Contributor

@ijpulidos ijpulidos commented Mar 29, 2022

Description

Fixes #449 by ensuring that _mix_replicas returns the _replica_thermodynamic_states. Also broadcasting it to the MPI context, according to mpiplus broadcasting

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelogNotable points that this PR has either accomplished or will accomplish.

Status

  • Ready to go

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this appears to have been the cause of the bug! Thanks for catching this.

At some point in the future, we may want to extend the mpiplus API to make run() more consistent---the inconsistency of

self._do_action_1()
self._result = self._do_action_2()

bugs me a bit.

If we could later figure out how to extend the API of mpiplus so we can just have

self._do_action_1()
self._do_action_2()
self._do_action_3()

that would be optimal.

Alternatively, we could refactor to do

self._result1 = self._do_action_1()
self._result2 = self._do_action_2()
self._result3 = self._do_action_3()

but we would want to make sure the methods consistently always return new arguments without side-effects or cleanly update and distribute state no matter which solution we choose. Perhaps we can open an issue for that?

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be sure to update the revision history and mark this as an important bugfix?

We should cut a bugfix release after this.

@jchodera jchodera added this to the 0.21.3 milestone Mar 29, 2022
@jchodera
Copy link
Member

Alternatively, we could rework self._propagate_replicas() to assume that the self._replica_thermodynamic_states is only correct on node 0 and to pass the appropriate state index along with each replica index, but I imagine this is what was happening originally and the non-obvious pattern here caused the propagation and mixing implementations to drift apart from each other.

@zhang-ivy
Copy link
Contributor

Confirming that this PR fixes the mixing problem for 1) ala dipeptide in solvent t-repex and 2) apo barstar h-repex.
@ijpulidos : Have you double checked whether the existing mixing tests are now passing? I think they may not have been failing as expected before.. so we may want add ala dipeptide in solvent t-repex as a test to test for this problem in the future?

@mikemhenry
Copy link
Contributor

Looking at our schedule CI, it looks like there may be an issue with the windows openmm builds

@mikemhenry
Copy link
Contributor

Now that we have the changelog notes added, do we want to get this merged and a bug fix cut? This would also get our real time anaylsis logging on conda-forge as well.

@mikemhenry
Copy link
Contributor

@ijpulidos is this ready to merge? It looks good RE what we talked about at our meeting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI bug when multiple GPUs are used per calculation
4 participants