Parallel trajectory analysis (revisited) #618

mattiafelice-palermo · 2016-01-13T14:00:11Z

With reference to #617
MDAnalysis Discussion --> link
Branch --> link

Aim

Speed up trajectory analysis by spreading the computation on multiple threads. Slices of the trajectory are assigned to each thread, which then proceeds running AnalysisBase istances.

Current syntax (may change)

In order to run a parallel analysis, just invoke the run() method with the optional arguments parallel=True and threads=<nthreads>.

import MDAnalysis as mda
from MDAnalysisTests.datafiles import DCD, PSF
import electromagnetism # some analysis module

universe = mda.Universe(PSF, DCD)
selection = universe.select_atoms('resname TIP3')

el = electromagnetism.TotalDipole(universe=universe, selection=selection)]
el.run(parallel=True, nthreads=4)

To do

Implement a run(parallel=True) method in the AnalysisBase class so that single analysis jobs can be parallelized without using the ParallelProcessor class.
Print progressbar to stderr so that stdout can be redirected to file
Create a progressbar with percentage of completion, configuration counter and a ETA estimate.
Make progressbar optional
Prepare tests
Add documentation to modules, classes and methods once the design of the class is stable.

Benchmarks

Benchmark performed on a system of N=250 k atoms (294 configurations, size ~1 GB).

Efficiency computed as the ratio between core speed (cfg s^-1 threads^-1) and the speed of a serial analysis.

Compatibility issues

Due to the implementation of the parallelization, AnalysisBase needs to take a Universe object as argument.

Last update: Fri Jan 22 11:34:18 CET 2016

…ent test analysis (electromagnetism.py), added parallel_jobs.py module, added test for the parallel_jobs.py module

…s has its own universe, eliminating the problem of accessing to arrays that have been already deallocated. Adapted base.py, parallel_jobs.py and test_parallel_jobs.py according to PEP8 style.

richardjgowers · 2016-01-13T14:25:44Z

package/MDAnalysis/analysis/base.py

@@ -96,10 +92,14 @@ def run(self):
        self._prepare()
        for i, ts in enumerate(
                self._trajectory[self.start:self.stop:self.step]):
-            self._ts = ts


Is there a reason why this line has to go? Could we call _single_frame with ts as the argument?
ie.

for i, ts in enumerate(self._traj[etc]): self._single_frame(ts)

I agree, calling _single_frame with ts as the argument looks much cleaner.

richardjgowers · 2016-01-13T14:27:11Z

This is looking good, at the efficiency is much higher than I was expecting.

mattiafelice-palermo · 2016-01-13T14:35:48Z

Yes! If you or anyone have some analysis class that you think it's easy to adapt (by adding the __iadd__ method as shown in #617 ), let me know if you get similar results.

…bar to track parallel computations.

…d with class ParallelProcessor has been corrected. Other minor fixes

mattiafelice-palermo · 2016-01-17T14:37:28Z

Hi everybody!

So, as @richardjgowers suggested, I modified the AnalysisBase class so that the run() method can be invoked with the optional arguments parallel=True and threads=<nthreads>. I tried benchmarking if two analyses run with run(parallel=True) are slower than computing them in batch with ParallelProcessor but did not find a relevant difference - even though this seems strange to me. When running analyses with some our internal program, reading dcds usually takes a non negligible amount of time. Are we 100% sure that when running two _singleframe() methods of two AnalysisBase objects within the same ts in trajectory loop, the coordinates are not re-read? If that is the case, then I would opt for removing the ParallelProcessor class and just leave the AnalysisBase run(parallel=True)method.

Let me know what you think!

P.s.: I also added a basic progressbar in a folder utils within package/MDAnalysis/analysis, I hope that is fine.

orbeckst · 2016-01-17T18:08:14Z

package/MDAnalysis/analysis/utils/prog.py

@@ -0,0 +1,102 @@
+""" Add docstring later


This should all go into lib.log --- there's already ProgressMeter. ProgressBar would nicely complement it.

Done, moved code to lib.log and also modified the bar with threading so that it supports working with serial jobs. Hopefully that's what you meant.

kain88-de · 2016-01-17T18:35:37Z

BTW did any of you have a look at joblib for this?

richardjgowers · 2016-01-18T17:56:38Z

package/MDAnalysis/analysis/base.py

+            self._conclude()
+
+        else:
+            if threads is None:


I'd prefer if all this code was contained inside _parallel_run, the run function can just end up as:

def run(self, parallel=False, nthreads=None): if not parallel: self._serial_run() else: self._parallel_run(nthreads)

So _serial_run just contains the "original" run code, and parallel run has your new code

Make sense, it's much cleaner this way. I modified the code accordingly.

…Moved the code to lib.log

…ppropriare serial or parallel method. Adapted AnalysisBase and ParallelProcessor to run with new progressbar, which now works also for serial analyses

mattiafelice-palermo · 2016-01-18T23:50:22Z

@kain88-de seems a very interesting lib, thanks for pointing it out! It could probably solve the problem of allocating/deallocating array with positions/masses etc when using a single DCDReader object that we used to discuss a while ago (?), even though the actual implementation circumvent this by having each AnalysisBase object initialized and use its own DCDReader object.

richardjgowers · 2016-01-20T10:36:10Z

Hey Matti,

This is looking good, I think next step is to fix all the things we've broke:

https://travis-ci.org/MDAnalysis/mdanalysis/jobs/103233684

I think these are because you changed the signature of _setup_frames to want a Universe not a Reader.

You should be able to run those tests locally by going to <your repo>/testsuite/MDAnalysisTests/analysis and running nosetests

…ich now takes a universe and not a trajectory, and requires method _single_frame to take a timestep as argument

mattiafelice-palermo · 2016-01-20T13:25:45Z

Good day Richard,

I modified the modules that conflicted with the new signatures of _setup_frames and _single_frame methods. I run nosetests without any errors.

There is one big issue, though. I noticed that e.g. the rdf analysis takes as input two selections g1 and g2, from which the Universe is accessed through g1.universe. I'm pretty sure this is not going to work with the paralell implementation. This is because when creating deep copies of the original analysis object, the self._universe/self._trajectory attributes are stripped away from the __getstate__ method. Then, each copied object is initialized again from the topology and coordinate filenames. If you run the analysis in parallel, what happens is that the original selections you passed as input are still a copy of the original ones, which still reference to the original universe, which is not the one being used in the copied object. Thus, the frames won't be cycled in for ts in trajectory loop because the loop cycles on the new Universe and not the original one. The nosetests are passed because, I think, they just work on a single frame, the first one, so no cycling is done. I did not test this issue with the new code but I've already stumbled on it some time ago when trying to pass selections as input.

Anyway, this issue can be easily solved by making sure that selections are made WITHIN the analysis object, solely in the _prepare method, which is called after each copied object is initialized. Thus the user would be asked to pass a "selection_string" and not the selection itself when initializing an analysis object. I understand this would require some changes in the already existing code, but it's the only easy approach that came to my mind. Any ideas?

richardjgowers · 2016-01-20T13:32:47Z

Yeah I had the same thought too. I think a way around this is to store the AtomGroups within the class not as AtomGroups, but as indices (use ag.indices to get this). Then when the objects are initialised again you use the indices to slice the "local" version of Universe.atoms and you've reconstructed the AtomGroups.

Would this work?

mattiafelice-palermo · 2016-01-20T13:48:38Z

I see what you mean, but still the we do not know a priori how many selections an analysis object is going to need, nor we know which name they were given by the developer of a new analysis object, thus we wouldn't know how to "reference" them when we want to use the indices to slice a local version of Universe.atom. Don't you think?

richardjgowers · 2016-01-20T13:52:52Z

Maybe, but if we stored them as a list called self._ags or something, then we could put something like:

# Storing
try:
    ag_indices = [ag.indices for ag in self._ags]
except AttributeError:  # catches when _ags didn't exist
    pass

# Rebuilding
self._ags = [self.universe.atoms[ag_ids] for ag_ids in ag_indices]

So then InterRDF would use self._ags[0] & self._ags[1] instead of g1 & g2

mattiafelice-palermo · 2016-01-21T17:46:22Z

@mnmelo Thank you very much! I think the shared memory approach is definitely the way to go in the future. As for now, in my opinion perhaps it would be better to test the present working version so that we can get hold of how it works with the already existing analysis routines and if there are any issues that we overlooked.

Anyway, I'm just a contributor to the project, I guess the final word on how to proceed should be of a coredev. @richardjgowers what do you think?

mnmelo · 2016-01-21T17:50:56Z

Yup, let's see what others say. Now it'd be a good time to hit it while the iron is hot (and admittedly, it'd push me towards actually contributing to the parallelization, which I have been lazily putting off!)

richardjgowers · 2016-01-21T18:01:06Z

Maybe simple tests for the parallel analysis. So for the serial AnalysisBase there's the stupid FrameAnalysis which just records frame numbers....

We could make a FrameSummer(AnalysisBase) which just does sum(ts.frame for ts in u.trajectory). Then try and find race conditions by doing that in parallel. Then to check that correct frames are being selected, extend this to sum(ts.frame for ts in u.trajectory[start:stop:step]), with a variety of slice indices.

Edit:
And maybe do FrameAnalysis in serial to make sure that the list gets concatenated in the correct order, so you get [0, 1, 2, 3, 4, 5, 6] and not [4, 5, 6, 0, 1, 2, 3]

dotsdl · 2016-01-21T23:23:20Z

I think once #363 is finished, we can then do for example with three threads:

Universe1                       Universe2                             Universe3
   |                               |                                     |
   |_______________________________|_____________________________________|
                                   |
                                   v
                                Topology

where each of our Universes shares literally the same Topology object, but they each have their own of everything else, including their own trajectory Reader so they can visit frames independently. I think that's the thing that would save the lion's share of memory and would require very little initialization. Universe objects are very thin in the new scheme; the Topology object is where the actual system information is contained.

In fact, @richardjgowers, we could even start playing with shared Topology objects now, since everything should work as is except for needing the ability to add a Topology object to a Universe in place of a topology file.

mnmelo · 2016-01-22T00:50:46Z

@dotsdl While encapsulation is certainly a good thing, I think it brings little extra to this case. We may have a lighter Universe, but the Topology must then also somehow make its way to the parallel workers. We end up with the same question, whether to pass the Topology by pickling or by shared memory. Or am I missing your point?

mattiafelice-palermo · 2016-01-22T10:31:20Z

Ok so while we decide on how to share the topology between the processes, I might start working on the tests suggested by Richard, so we can make sure future versions of the parallel code work properly. On a side node, perhaps in this case it makes more sense to use a shared memory paradigm, even though if the new universe objects are really "thin", then passing copies of the objects between main process and the workers shouldn't impact performance much?

richardjgowers · 2016-01-22T10:48:04Z

Yeah the tests should be independent of how we choose to make it work

mattiafelice-palermo · 2016-01-22T22:13:40Z

I edited FrameAnalysis and tests so that now it works also in parallel. The order of the frames is checked by the tests. I didn't get the last part about FrameSummer though. I think that as long as the parallelization strategy works in "read-only" mode, no race conditions should happen, right?

richardjgowers · 2016-01-22T22:20:46Z

I was trying to expose the race conditions that we were having at the start of this issue, when we only had 1 reader for many threads. You're right that our current strategy will avoid this, but as we play with different ideas, we need to know we've not accidentally gone backwards.

Maybe it's a little overkill for now though.

mattiafelice-palermo · 2016-02-03T21:31:35Z

So do you think we should wait for the topology refactor or perhaps make this first working version available and then modify it if future changes affect it? In the second case, I might start working on docstrings.

richardjgowers · 2016-02-03T22:00:57Z

I think finish #670, merge the new version of develop into this branch and finish this before the refactor. The refactor won't break anything here, it will just make progressing easier. So having this as an experimental start point is a good idea.

dotsdl · 2016-02-18T19:25:27Z

Retarget to 0.15?

mnmelo · 2016-02-18T19:53:04Z

I vote so.

kain88-de · 2016-02-18T21:33:32Z

Yeah sure

dotsdl · 2016-02-18T22:07:26Z

Retargeted. I think with the discussion in #719 this will need to bake a bit longer than we have before 0.14 is released.

mattiafelice-palermo · 2016-02-21T21:19:15Z

Fine by me :) I'll be available once we decide to start working on this again!

orbeckst · 2016-03-13T00:27:26Z

Once we pick it up again: a good candidate for parallelization is the density module: it's slow but pleasingly parallel and you can get bragging rights if you are faster than eg VMD's VolMap.

orbeckst · 2017-10-12T09:59:08Z

See also https://www.mdanalysis.org/pmda/

orbeckst · 2020-08-16T06:50:27Z

With @yuxuanzhuang 's serialization of universes now possible since PR #2723 , this approach should be re-evaluated in a new PR.

mattiafelice-palermo added 3 commits January 12, 2016 15:15

Modified base.py to work with parallel_job.py, added total dipole mom…

f974a80

…ent test analysis (electromagnetism.py), added parallel_jobs.py module, added test for the parallel_jobs.py module

Minor modifications

015a6ea

New AnalysisBase and ParallelProcessor classes. Now each child proces…

5fddd1e

…s has its own universe, eliminating the problem of accessing to arrays that have been already deallocated. Adapted base.py, parallel_jobs.py and test_parallel_jobs.py according to PEP8 style.

richardjgowers self-assigned this Jan 13, 2016

richardjgowers reviewed Jan 13, 2016
View reviewed changes

_single_frame in AnalysisBase now is called with timestep as argument

ba9568f

orbeckst added performance Component-Analysis Work in progress labels Jan 15, 2016

mattiafelice-palermo added 3 commits January 17, 2016 04:57

Added parallel feature to AnalysisBase class. Added a small progress …

447f306

…bar to track parallel computations.

Number of read configurations with AnalysisBase run(parallel=True) an…

dc29f26

…d with class ParallelProcessor has been corrected. Other minor fixes

Progressbar now outputs to stderr

59ce29e

orbeckst reviewed Jan 17, 2016
View reviewed changes

richardjgowers reviewed Jan 18, 2016
View reviewed changes

mattiafelice-palermo added 3 commits January 18, 2016 23:10

Now progressbar works in threading mode (thus also for serial jobs). …

fc94606

…Moved the code to lib.log

Code check with pylint and more meaningful naming of variables

7a54fad

Modified method run() of AnalysisBase class so that it redirects to a…

4fccebd

…ppropriare serial or parallel method. Adapted AnalysisBase and ParallelProcessor to run with new progressbar, which now works also for serial analyses

richardjgowers mentioned this pull request Jan 19, 2016

Persistent topologies in JSON #643

Open

richardjgowers added this to the 0.14 milestone Jan 20, 2016

Modified tests and analysis modules to work with new AnalysisBase (wh…

d83a821

…ich now takes a universe and not a trajectory, and requires method _single_frame to take a timestep as argument

Added tests for parallel usage of AnalysisBase

d6a77f9

richardjgowers mentioned this pull request Feb 8, 2016

Fraction of native contacts (Q) analysis a'la Best-Hummer #702

Closed

dotsdl modified the milestones: Topology refactor - 0.15, 0.14 Feb 18, 2016

orbeckst mentioned this pull request Mar 25, 2016

Inclusion of ENCORE into MDAnalysis #797

Merged

4 tasks

jbarnoud mentioned this pull request May 20, 2016

Diffusion Map Implementation in MDAnalysis #857

Closed

orbeckst mentioned this pull request Sep 23, 2016

define a MDAnalysis.analysis user interface #719

Closed

orbeckst mentioned this pull request Nov 22, 2016

Support sliced iteration for all Readers #1081

Closed

orbeckst added the parallelization label Apr 1, 2017

orbeckst added the close? Evaluate if issue/PR is stale and can be closed. label Sep 28, 2018

orbeckst closed this Aug 16, 2020

Parallel trajectory analysis (revisited) #618

Parallel trajectory analysis (revisited) #618

Conversation

mattiafelice-palermo commented Jan 13, 2016

Aim

Current syntax (may change)

To do

Benchmarks

Compatibility issues

richardjgowers Jan 13, 2016

Choose a reason for hiding this comment

mattiafelice-palermo Jan 13, 2016

Choose a reason for hiding this comment

richardjgowers commented Jan 13, 2016

mattiafelice-palermo commented Jan 13, 2016

mattiafelice-palermo commented Jan 17, 2016

orbeckst Jan 17, 2016

Choose a reason for hiding this comment

mattiafelice-palermo Jan 18, 2016

Choose a reason for hiding this comment

kain88-de commented Jan 17, 2016

richardjgowers Jan 18, 2016

Choose a reason for hiding this comment

mattiafelice-palermo Jan 18, 2016

Choose a reason for hiding this comment

mattiafelice-palermo commented Jan 18, 2016

richardjgowers commented Jan 20, 2016

mattiafelice-palermo commented Jan 20, 2016

richardjgowers commented Jan 20, 2016

mattiafelice-palermo commented Jan 20, 2016

richardjgowers commented Jan 20, 2016

mattiafelice-palermo commented Jan 21, 2016

mnmelo commented Jan 21, 2016

richardjgowers commented Jan 21, 2016

dotsdl commented Jan 21, 2016

mnmelo commented Jan 22, 2016

mattiafelice-palermo commented Jan 22, 2016

richardjgowers commented Jan 22, 2016

mattiafelice-palermo commented Jan 22, 2016

richardjgowers commented Jan 22, 2016

mattiafelice-palermo commented Feb 3, 2016

richardjgowers commented Feb 3, 2016

dotsdl commented Feb 18, 2016

mnmelo commented Feb 18, 2016

kain88-de commented Feb 18, 2016

dotsdl commented Feb 18, 2016

mattiafelice-palermo commented Feb 21, 2016

orbeckst commented Mar 13, 2016

orbeckst commented Oct 12, 2017

orbeckst commented Aug 16, 2020