-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix protein mutation repex consistency tests #1054
Conversation
I think we'll want to skip the charge changing test for now -- @ijpulidos do you know how to do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some suggested changes to skip the charge change test and a question with the change of the padding.
perses/tests/test_repex.py
Outdated
@@ -39,6 +37,7 @@ def test_RESTCapableHybridTopologyFactory_repex_neutral_mutation(): | |||
"1", | |||
"2", | |||
mutant_name.upper(), | |||
padding=1.7 * unit.nanometers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that we need to increase the default padding here? Thinking that would make the test to take more time to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decreased it back down and left a TODO to increase it once we move to testing against openmm >= 7.8
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
…into fix-repex-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! We have to double check how much the tests will take on our GPU CI and that the charge changes test is not getting run.
Just wanted to say, that according to our latest GPU CI workflow, we are correctly testing just the neutral mutation. It is passing and taking ~5.5 hours in total, but the test itself is using ~4.5 hours. |
This is odd. I was wondering why it took so long, so I ran the test on lilac on a GTX 1080 Ti, using
to print a timestamp for each step. It's been stuck at
@zhang-ivy : Is it possible this is actually taking an hour? Any idea why? |
Hm---I think this is a red herring. If I disable the endstates in the test, it goes silent after the previous line:
Some of our code must be disabling or changing the logging so I'm not getting any profess information even when requesting it via hss.setup(n_states=12, temperature=300 * unit.kelvin, t_max=300 * unit.kelvin,
storage_file=reporter, minimisation_steps=0, endstates=True)
hss.energy_context_cache = cache.ContextCache(capacity=None, time_to_live=None, platform=platform)
hss.sampler_context_cache = cache.ContextCache(capacity=None, time_to_live=None, platform=platform)
# Run simulation
hss.extend(n_iterations) Any ideas why @ijpulidos or @mikemhenry? |
@jchodera : Yes there is something weird going on with the logger. When I run the neutral mutations repex test, the per iteration repex logger results don't show for me either. I had to adapt the script to set up a logger (with level as DEBUG) to get the per iteration results.
Note that the test neutral mutation repex test runs both ALA->THR and THR->ALA for 3000 iterations each (with 50 steps per iteration). While the test takes ~4.5 hours total, this would meant that each leg takes ~2 hours, which seems reasonable to me? How long would you expect each leg to take? |
For small molecules in solvent, I thought I had been seeing ~30 min
execution times before.
Instead of looking at the overall time (which is not very informative), I
really want to see the breakdown of where time is being spent once we get
past the first iteration.
|
@jchodera : I posted the logs (for the last 2 iterations) of a charge changing mutation (ALA->ARG) as well as a neutral mutation (THR->ALA) here -- you should be able to see the breakdown there.
Hmm, I don't think we've finished writing the tests for small molecule repex consistency yet, so I'm not sure where that would be from, but maybe @dominicrufa knows which data you're referring to? |
I had been referring to timings of small molecules in solvent from my own
previous experiments.
Two things worry me from the logs:
1. The energy matrix computation is way too long at over 3s. This should be
less than 1s. We need to look into why this is happening. Are long range
dispersion corrections slowing things down?
2. The replica propagation time is nearly 7s. This seems way too long for a
small solvent box. Do you have this calculation staged on disk somewhere?
|
@jchodera : For the neutral (THR->ALA) mutation, which uses n_replicas=12, propagating replicas takes ~2 seconds and computing energy matrix takes < 1 second. However, running 5000 iterations still takes ~3.5 hours, which you mentioned is much longer than expected. Is 2 seconds still much longer than you'd expect for the replica propagation time? If not, then I'm not sure what else would be making these so slow. For the charge changing (ALA->ARG) mutation, which uses uses n_replicas=36, propagating replicas and computing the energy matrix take longer, as you mentioned above. I think the reason is because we are using so many more replicas. Also since the solvation procedure has changed in openmm, i just want to double check that i'm using the appropriate amount of solvent. I'm using padding = 1.7 nm. Here is solvated capped ALA: |
@jchodera : I've run the experiments that we talked about yesterday to investigate what's causing the slow repex tests for ala->thr dipeptide: where the propagation time is computed for running 50 steps (with 4 fs timestep) 12 times (aka 12 replicas) and V1, V2, V3 are defined as: mcmc.LangevinSplittingDynamicsMove(timestep=4.0 * unit.femtoseconds,
collision_rate=1.0 / unit.picosecond,
n_steps=50,
reassign_velocities=True,
n_restart_attempts=20,
splitting="V R R R O R R R V",
constraint_tolerance=1e-06) V2 (aka version using LangevinSplittingDynamics move with simpler splitting and higher constraint tolerance): mcmc.LangevinSplittingDynamicsMove(timestep=4.0 * unit.femtoseconds,
collision_rate=1.0 / unit.picosecond,
n_steps=50,
reassign_velocities=True,
n_restart_attempts=20,
splitting="V R O R V",
constraint_tolerance=1e-05) V3 (aka version using LangevinDynamics move): mcmc.LangevinDynamicsMove(timestep=4.0 * unit.femtoseconds,
collision_rate=1.0 / unit.picosecond,
n_steps=50,
reassign_velocities=True,
n_restart_attempts=20) Note that LangevinSplittingDynamicsMove uses openmmtools BAOAB integrator, whereas LangevinDynamicsMove uses openmm LangevinIntegrator Results:
Takeaways:
Do you agree with my above takeaways? Final thing to note: In the logs I posted before, I was seeing that the replica propagation time (when using V1) was taking closer to 2 seconds, whereas in the above table, I report 0.75 for the propagation time. This is because I've found that generating the htf in the same script as running repex causes the repex times to be >2x slower. I wonder if there is some thread contention that happens here. @ijpulidos : Can you help me adapt the repex consistency tests (protein mutation and small molecule) given the above takeaways? And could you investigate why using the same script to generate the htf + run repex yields 2x slower repex iteration times // is there a workaround we can use to prevent the slowdown? |
I spoke with @dominicrufa about implementing a In general, I think we want to set The other slowdowns---generating the htf in the same script, as well as the difference between hybrid propagation and repex---are things we might want to look into as well later on. |
@jchodera : A new |
We should use a BAOAB-based integrator, which means you cannot use It may be easier just to modify Fixing the velocities resume bug would enable us to disable velocity reassignment every iteration, which is likely slowing down sampling significantly. |
Description
This PR introduces the following changes:
For speed up:
reassign_velocities
to be FalseFor improved convergence:
n_iterations
= 3000 for the neutral mutations and = 3000 for the charge changing mutationsn_replicas=36
for the charge changing testminimisation_steps
= 0, which means that the minimization will run until the energy is below a tolerance (instead of running for 100 steps, which is the default inHybridRepexSampler
)Other:
- Change the solvent padding to be 1.7 nm (which is necessary for the nightly dev builds of openmm). Note this may make the tests slower if we are testing against OpenMM <= 7.7EDIT: We are currently testing against openmm 7.7. We should not bump up the solvent padding until we switch the GPU tests to test against openmm => 7.8Motivation and context
This PR changes some of the parameters in the protein mutation repex consistency tests to ensure convergence (DDG < 6 * dDDG) and speed up the tests.
Resolves #1044
How has this been tested?
Tested on lilac.
Change log