Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate sampling #8

Open
pgrinaway opened this issue Apr 27, 2015 · 19 comments
Open

Accelerate sampling #8

pgrinaway opened this issue Apr 27, 2015 · 19 comments

Comments

@pgrinaway
Copy link
Member

Looks like from my runs last night that this is very slow on the whole dataset. I'll profile and figure out what could possibly be done (possibly parallelize the reweighting step?). Not sure what priority this should be, though.

@kyleabeauchamp
Copy link
Collaborator

Maybe merge your current pull request and I can look from there

@pgrinaway
Copy link
Member Author

FYI not blaming pymbar here--I think it's just that there are lots of molecules to reweight.

@jchodera
Copy link
Member

jchodera commented Apr 27, 2015 via email

@pgrinaway
Copy link
Member Author

I'm profiling now. Also, I agree that pymbar is not the slow part. I was just saying that the step where everything is reweighted is likely the slow step, since there are a lot of molecules.

@pgrinaway
Copy link
Member Author

Ok, so I did some profiling, here is the truncated output, sorted by cumulative time:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.069    0.069 1715.558 1715.558 parameterize-using-database.py:19(<module>)
     1548    0.146    0.000 1681.290    1.086 utils.py:557(hydration_energy)
     1548    6.280    0.004 1681.106    1.086 utils.py:425(compute_hydration_energy)
        1    0.000    0.000 1678.388 1678.388 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/MCMC.py:199(sample)
        1    0.000    0.000 1678.385 1678.385 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Model.py:227(sample)
        1    0.008    0.008 1678.333 1678.333 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/MCMC.py:281(_loop)
14534/10447    0.035    0.000 1678.013    0.161 {method 'get' of 'pymc.LazyFunction.LazyFunction' objects}
5139/4575    0.016    0.000 1677.923    0.367 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:465(get_value)
7774/5857    0.013    0.000 1674.708    0.286 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Container.py:539(get_value)
7774/5857    0.024    0.000 1674.698    0.286 {method 'run' of 'pymc.Container_values.DCValue' objects}
     1500    0.017    0.000 1653.737    1.102 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/StepMethods.py:480(step)
     3100    0.015    0.000 1653.674    0.533 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Node.py:25(logp_of_set)
     3000    0.005    0.000 1653.669    0.551 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/StepMethods.py:302(logp_plus_loglike)
     9395    0.019    0.000 1653.660    0.176 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:904(get_logp)
   495886    2.747    0.000 1565.275    0.003 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:4941(getState)
   495886    0.474    0.000 1561.037    0.003 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:4934(_getStateAsLists)
   495886 1558.519    0.003 1560.563    0.003 {_openmm.Context__getStateAsLists}
     3105    0.078    0.000   46.974    0.015 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:5060(__init__)
     3105   46.764    0.015   46.764    0.015 {_openmm.new_Context}
        1    0.000    0.000   27.775   27.775 utils.py:565(prepare_database)
        1    0.023    0.023   27.337   27.337 utils.py:179(generate_simulation_data)
      600    0.004    0.000   26.968    0.045 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:11896(step)
      600   26.956    0.045   26.956    0.045 {_openmm.LangevinIntegrator_step}

@pgrinaway
Copy link
Member Author

Also, here was the command that I used:

python -m cProfile -o prof_out parameterize-using-database.py --types parameters/gbsa-amber-mbondi2.types \
 --parameters parameters/gbsa-amber-mbondi2.parameters \
--iterations 100 -o MCMC_100_4model.h5 \
 --database /Users/grinawap/freesolv/database.pickle \
 --mol2 /Users/grinawap/freesolv/tripos_mol2 --subset 3

@kyleabeauchamp
Copy link
Collaborator

As @pgrinaway noted, the getState could indicate that energy calculations are the rate limiting step here.

@pgrinaway
Copy link
Member Author

Ok, so I profiled again using Instruments in Xcode to take a closer look at what is going on. As suspected, the biggest consumer of instructions is calcForcesAndEnergy at ~1.7 trillion instructions. 1.6T of those are the result of ReferenceCalcCustomGBForceKernel, so I'm not sure whether switching to the CPU platform would have an effect. However, the CPU platform CustomForces do JIT their code, right? Perhaps there would be a gain.

Alternatively:

  1. Use OpenCL on CPU
  2. Create lots of contexts on the GPU and calculate simultaneously
  3. Stream conformations in more wisely as @kyleabeauchamp suggested.

@kyleabeauchamp
Copy link
Collaborator

So if the energy calculation is rate limiting, it's possible that we could do something like this:

  1. Save MDTraj trajectory object in memory
  2. Send MDTraj frames to sander via Python API

There is an example of this in pytraj:

http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/post_processing_energies_pytraj_pysander.ipynb

@jchodera
Copy link
Member

We're not currently using Amber's sander here, but maybe we should be, since the true SASA model these GB models were parameterized against is implemented there.

It might be interesting to use this strategy to allow either the OpenMM or sander backends to be used for energy computation.

@jchodera
Copy link
Member

I seem to recall that the CustomGBForce object is being created anew each time the energy is called here. This might be streamlined by simply resetting the particle parameters and then doing updateParametersInContext if this was limiting, but it sounds like the actual GB energy calculations are the slow part.

The OpenMM Reference platform implementation of GB is really, really slow, but for small molecules, it always seemed competitive with the CPU platform, possibly because my implementation was forcing the CPU platform to recompile the code each time? I haven't extensively benchmarked this in a while, though.

@pgrinaway
Copy link
Member Author

New updates:

OpenMM::ThreadBody(void*) 53.2% CPU time //calculating energies & related tasks
OpenMM::CustomGBForceimpl::Initialize(OpenMM::ContextImpl&) 31.1% CPU

It seems nearly 31% of the CPU's time was spent compiling the CustomGBForce kernels--I hadn't realized (though the reason is fairly obvious now that I think of it) that the CustomForces are compiled at context initialization time. I'd imagine we could recover that 31% by modifying the PyMC code to use arrays, and then using a distributed computing framework to hold on to contexts and prevent recompilation. That scheme would also allow us to distribute the remaining expense as well.

@jchodera
Copy link
Member

What about just caching the Context objects and doing a force.updateParametersInContext() call?

Actually, are you able to profile the Reference platform? That would be usefully informative too!

@pgrinaway
Copy link
Member Author

What about just caching the Context objects and doing a force.updateParametersInContext() call?

Yeah. I had previously imagined there would be some roadblock to this (I had imagined refactoring the model to pass the hydration_energy functions an array of molecules, rather than one, and return an array, allowing us to use some distributed computing system). But it would be easy to just stick the context in the entry to test performance, so I'll give that a shot.

Actually, are you able to profile the Reference platform? That would be usefully informative too!

Yep! The stuff above from earlier is Reference, but I'll also run the new code with Reference and post the results. (ultimately I'll see if Instruments will output a format that is nice for displaying here).

@pgrinaway
Copy link
Member Author

Ok, profiled the Reference platform. Relevant breakdown:

OpenMM::ContextImpl::calcForcesAndEnergy(...) 88.8% CPU time, ~1.143T instructions

OpenMM::ReferenceCalcCustomGBForceKernel::execute(...) 88.1%, ~1.141T of the above instructions

I'll try caching + CPU now.

@jchodera
Copy link
Member

Note that I'm not sure if updateForcesInContext() triggers a recompile of the force kernel for the CPU platform.

@jchodera
Copy link
Member

(I had imagined refactoring the model to pass the hydration_energy functions an array of molecules, rather than one, and return an array, allowing us to use some distributed computing system).

This may yet be the best idea. Let's chat about this tomorrow?

@pgrinaway
Copy link
Member Author

Yeah, that sonds like a good plan. I think that is probably the best in the long run too, because it will let us distribute the energy computations. Trying to cache the Context object is resulting in weirdnesses where it complains that it doesn't have the Force in the relevant Context, so a discussion on what the most efficient way forward is might be best before going too deep into the rabbit hole.

@jchodera
Copy link
Member

jchodera commented May 20, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants