Accelerate sampling #8

pgrinaway · 2015-04-27T15:48:43Z

Looks like from my runs last night that this is very slow on the whole dataset. I'll profile and figure out what could possibly be done (possibly parallelize the reweighting step?). Not sure what priority this should be, though.

kyleabeauchamp · 2015-04-27T15:54:36Z

Maybe merge your current pull request and I can look from there

pgrinaway · 2015-04-27T15:55:34Z

FYI not blaming pymbar here--I think it's just that there are lots of molecules to reweight.

jchodera · 2015-04-27T16:10:59Z

The whole dataset is a random subset of FreeSolv for the training set, yes---not all of FreeSolv? Might help to do a bit of profiling to see what the slow step actually is. I believe the current code creates a new Force object for each molecule every time. Some caching could speed things up. Unlikely that pymbar is actually the slow part.

pgrinaway · 2015-04-27T16:16:10Z

I'm profiling now. Also, I agree that pymbar is not the slow part. I was just saying that the step where everything is reweighted is likely the slow step, since there are a lot of molecules.

pgrinaway · 2015-04-27T17:25:19Z

Ok, so I did some profiling, here is the truncated output, sorted by cumulative time:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.069    0.069 1715.558 1715.558 parameterize-using-database.py:19(<module>)
     1548    0.146    0.000 1681.290    1.086 utils.py:557(hydration_energy)
     1548    6.280    0.004 1681.106    1.086 utils.py:425(compute_hydration_energy)
        1    0.000    0.000 1678.388 1678.388 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/MCMC.py:199(sample)
        1    0.000    0.000 1678.385 1678.385 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Model.py:227(sample)
        1    0.008    0.008 1678.333 1678.333 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/MCMC.py:281(_loop)
14534/10447    0.035    0.000 1678.013    0.161 {method 'get' of 'pymc.LazyFunction.LazyFunction' objects}
5139/4575    0.016    0.000 1677.923    0.367 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:465(get_value)
7774/5857    0.013    0.000 1674.708    0.286 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Container.py:539(get_value)
7774/5857    0.024    0.000 1674.698    0.286 {method 'run' of 'pymc.Container_values.DCValue' objects}
     1500    0.017    0.000 1653.737    1.102 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/StepMethods.py:480(step)
     3100    0.015    0.000 1653.674    0.533 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/Node.py:25(logp_of_set)
     3000    0.005    0.000 1653.669    0.551 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/StepMethods.py:302(logp_plus_loglike)
     9395    0.019    0.000 1653.660    0.176 /Users/grinawap/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:904(get_logp)
   495886    2.747    0.000 1565.275    0.003 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:4941(getState)
   495886    0.474    0.000 1561.037    0.003 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:4934(_getStateAsLists)
   495886 1558.519    0.003 1560.563    0.003 {_openmm.Context__getStateAsLists}
     3105    0.078    0.000   46.974    0.015 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:5060(__init__)
     3105   46.764    0.015   46.764    0.015 {_openmm.new_Context}
        1    0.000    0.000   27.775   27.775 utils.py:565(prepare_database)
        1    0.023    0.023   27.337   27.337 utils.py:179(generate_simulation_data)
      600    0.004    0.000   26.968    0.045 /Users/grinawap/anaconda/lib/python2.7/site-packages/simtk/openmm/openmm.py:11896(step)
      600   26.956    0.045   26.956    0.045 {_openmm.LangevinIntegrator_step}

pgrinaway · 2015-04-27T17:32:43Z

Also, here was the command that I used:

python -m cProfile -o prof_out parameterize-using-database.py --types parameters/gbsa-amber-mbondi2.types \
 --parameters parameters/gbsa-amber-mbondi2.parameters \
--iterations 100 -o MCMC_100_4model.h5 \
 --database /Users/grinawap/freesolv/database.pickle \
 --mol2 /Users/grinawap/freesolv/tripos_mol2 --subset 3

kyleabeauchamp · 2015-04-27T17:33:05Z

As @pgrinaway noted, the getState could indicate that energy calculations are the rate limiting step here.

pgrinaway · 2015-04-27T18:11:13Z

Ok, so I profiled again using Instruments in Xcode to take a closer look at what is going on. As suspected, the biggest consumer of instructions is calcForcesAndEnergy at ~1.7 trillion instructions. 1.6T of those are the result of ReferenceCalcCustomGBForceKernel, so I'm not sure whether switching to the CPU platform would have an effect. However, the CPU platform CustomForces do JIT their code, right? Perhaps there would be a gain.

Alternatively:

Use OpenCL on CPU
Create lots of contexts on the GPU and calculate simultaneously
Stream conformations in more wisely as @kyleabeauchamp suggested.

kyleabeauchamp · 2015-04-28T00:04:53Z

So if the energy calculation is rate limiting, it's possible that we could do something like this:

Save MDTraj trajectory object in memory
Send MDTraj frames to sander via Python API

There is an example of this in pytraj:

http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/post_processing_energies_pytraj_pysander.ipynb

jchodera · 2015-04-28T00:10:16Z

We're not currently using Amber's sander here, but maybe we should be, since the true SASA model these GB models were parameterized against is implemented there.

It might be interesting to use this strategy to allow either the OpenMM or sander backends to be used for energy computation.

jchodera · 2015-04-28T00:13:01Z

I seem to recall that the CustomGBForce object is being created anew each time the energy is called here. This might be streamlined by simply resetting the particle parameters and then doing updateParametersInContext if this was limiting, but it sounds like the actual GB energy calculations are the slow part.

The OpenMM Reference platform implementation of GB is really, really slow, but for small molecules, it always seemed competitive with the CPU platform, possibly because my implementation was forcing the CPU platform to recompile the code each time? I haven't extensively benchmarked this in a while, though.

pgrinaway · 2015-05-20T03:39:55Z

New updates:

OpenMM::ThreadBody(void*) 53.2% CPU time //calculating energies & related tasks
OpenMM::CustomGBForceimpl::Initialize(OpenMM::ContextImpl&) 31.1% CPU

It seems nearly 31% of the CPU's time was spent compiling the CustomGBForce kernels--I hadn't realized (though the reason is fairly obvious now that I think of it) that the CustomForces are compiled at context initialization time. I'd imagine we could recover that 31% by modifying the PyMC code to use arrays, and then using a distributed computing framework to hold on to contexts and prevent recompilation. That scheme would also allow us to distribute the remaining expense as well.

jchodera · 2015-05-20T03:45:43Z

What about just caching the Context objects and doing a force.updateParametersInContext() call?

Actually, are you able to profile the Reference platform? That would be usefully informative too!

pgrinaway · 2015-05-20T04:07:36Z

What about just caching the Context objects and doing a force.updateParametersInContext() call?

Yeah. I had previously imagined there would be some roadblock to this (I had imagined refactoring the model to pass the hydration_energy functions an array of molecules, rather than one, and return an array, allowing us to use some distributed computing system). But it would be easy to just stick the context in the entry to test performance, so I'll give that a shot.

Actually, are you able to profile the Reference platform? That would be usefully informative too!

Yep! The stuff above from earlier is Reference, but I'll also run the new code with Reference and post the results. (ultimately I'll see if Instruments will output a format that is nice for displaying here).

pgrinaway · 2015-05-20T04:19:43Z

Ok, profiled the Reference platform. Relevant breakdown:

OpenMM::ContextImpl::calcForcesAndEnergy(...) 88.8% CPU time, ~1.143T instructions

OpenMM::ReferenceCalcCustomGBForceKernel::execute(...) 88.1%, ~1.141T of the above instructions

I'll try caching + CPU now.

jchodera · 2015-05-20T04:23:02Z

Note that I'm not sure if updateForcesInContext() triggers a recompile of the force kernel for the CPU platform.

jchodera · 2015-05-20T04:39:15Z

(I had imagined refactoring the model to pass the hydration_energy functions an array of molecules, rather than one, and return an array, allowing us to use some distributed computing system).

This may yet be the best idea. Let's chat about this tomorrow?

pgrinaway · 2015-05-20T04:51:50Z

Yeah, that sonds like a good plan. I think that is probably the best in the long run too, because it will let us distribute the energy computations. Trying to cache the Context object is resulting in weirdnesses where it complains that it doesn't have the Force in the relevant Context, so a discussion on what the most efficient way forward is might be best before going too deep into the rabbit hole.

jchodera · 2015-05-20T05:32:01Z

Sounds good!

kyleabeauchamp mentioned this issue Oct 12, 2015

RMSE vs iteration #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate sampling #8

Accelerate sampling #8

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

jchodera commented Apr 27, 2015 via email

pgrinaway commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 28, 2015

jchodera commented Apr 28, 2015

jchodera commented Apr 28, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015

pgrinaway commented May 20, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015

jchodera commented May 20, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015 via email

Accelerate sampling #8

Accelerate sampling #8

Comments

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

jchodera commented Apr 27, 2015 via email

pgrinaway commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 27, 2015

pgrinaway commented Apr 27, 2015

kyleabeauchamp commented Apr 28, 2015

jchodera commented Apr 28, 2015

jchodera commented Apr 28, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015

pgrinaway commented May 20, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015

jchodera commented May 20, 2015

pgrinaway commented May 20, 2015

jchodera commented May 20, 2015 via email