Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix equilibration detection #610

Merged
merged 14 commits into from Aug 1, 2022
Merged

Fix equilibration detection #610

merged 14 commits into from Aug 1, 2022

Conversation

zhang-ivy
Copy link
Contributor

@zhang-ivy zhang-ivy commented Jul 27, 2022

Description

Changes:

  • Do not allow user to specify the statistical_inefficiency without n_equilibration_iterations
  • When n_equilibration_iterations is specified and statistical_inefficiency is not, make sure that the final n_equilibration_iterations includes the t0 selected from the time origin analysis (aka i_t[i_max] here)
  • Always discard the first time origin in _get_equilibration_data_per_sample()
  • Add API point for specifying max_subset when creating the a MultiStateSamplerAnalyzer

For more context on why these changes are necessary, see #609 (comment)

Fixes #609

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

  • Ready to go

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented Jul 27, 2022

I've tested these changes locally by checking that:

  • An exception is raise when the user specifies the statistical_inefficiency without n_equilibration_iterations
  • When n_equilibration_iterations is specified and statistical_inefficiency is not, the final n_equilibration_iterations includes the t0 selected from the time origin analysis (aka i_t[i_max] here)
  • The arrays returned by _get_equilibration_data_per_sample() are of length max_subset - 1
  • The API point for specifying max_subset when creating the a MultiStateSamplerAnalyzer works and does change the max_subset in _get_equilibration_data_per_sample()
  • When using pymbar's detectEquilibration() to generate the same data as _get_equilibration_data() + get_equilibration_data_per_sample(), I get the same results

@zhang-ivy zhang-ivy added this to the 0.21.5 milestone Jul 27, 2022
docs/releasehistory.rst Outdated Show resolved Hide resolved
Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks!

docs/releasehistory.rst Outdated Show resolved Hide resolved
openmmtools/multistate/utils.py Outdated Show resolved Hide resolved
@zhang-ivy
Copy link
Contributor Author

@mikemhenry @ijpulidos : I just pushed changes to address Mike and John's feedback. Can you re-review and merge?

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. We should try to come up with some sort of test that can help us validate the logic. I know some of these tests can be hard (it probably requires a trajectory if we want to test the MultiStateAnalyzer, for example) but maybe some others don't need a full trajectory. We had a conversation with @zhang-ivy on the possibility of sharing some of her notebooks and then, hopefully I can come up with some sort of test based on them.

@jchodera
Copy link
Member

We should try to come up with some sort of test that can help us validate the logic.

I think we have some tests that run the multistate sampler that run on the GPU. Perhaps we can just add an analysis step that checks to make sure that t0 is not 0 or 1?

@zhang-ivy
Copy link
Contributor Author

Note that the gpu tests have been disabled:

Screen Shot 2022-08-01 at 9 43 57 AM

@zhang-ivy
Copy link
Contributor Author

I think we have some tests that run the multistate sampler that run on the GPU. Perhaps we can just add an analysis step that checks to make sure that t0 is not 0 or 1?

I just pushed a commit that adds an analysis step checking to make sure n_equilibration_iterations is > 1 in TestHarmonicOscillatorsMultiStateSampler()!

@zhang-ivy
Copy link
Contributor Author

I just pushed changes that also check:

  • specifying only statistical_inefficiency (without n_equilibration_iterations) raises an exception
  • specifying only n_equilibration_iterations=10 (without statistical_inefficiency) returns n_equilibration_iterations > 10 (because the equilibration detection analysis will be run and added to the user specified value)
  • specifying both statistical_inefficiency and n_equilibration_iterations returns the user specified values

@zhang-ivy
Copy link
Contributor Author

@mikemhenry @ijpulidos : Can you review again and merge this?

@mikemhenry
Copy link
Contributor

mikemhenry commented Aug 1, 2022

Looking at it now!
RE GPU Tests: I've disable them until we figure out which ones we want to run, since right now we run all of them which is a waste of AWS money, but not if we think running all the tests are worth it.

@mikemhenry mikemhenry merged commit 7034e41 into main Aug 1, 2022
@mikemhenry mikemhenry deleted the fix-equilibration branch August 1, 2022 21:40
Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for going through this. Just a minor non-blocking suggestion.

openmmtools/tests/test_sampling.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapt _get_equilibration_data_per_sample() to discard the first 1% of samples by default
4 participants