How is pytraj different from MDAnalysis and mdtraj? #445

hainm · 2015-06-12T19:03:35Z

Note: this is from my experience of using MDAnalysis and mdtraj. Since I spend most of my time playing with pytraj, there will be bias for this package. If my information about MDAnalysis and mdtraj is not correct, please feel free to comment here.

Each sub-section below will be filled more in future.

The text was updated successfully, but these errors were encountered:

hainm · 2015-06-12T19:07:13Z

Design

MDAnalysis focuses on frame-based iterating. Only single frame is loaded for calculation.
mdtraj focuses on numpy vectorization. All data are loaded to memory (although it does have iterload but my feeling is that iterload is not really the main point of mdtraj).
pytraj: have both on-disk (TrajectoryIterator) and in-memory (Trajectory)

hainm · 2015-06-12T19:11:50Z

Code quality

mdtraj > (pytraj, MDAnalysis)
- currently pytraj's code is still poorly organized and has a lots of issues

hainm · 2015-06-12T19:16:49Z

Easy to use?

(pytraj, mdtraj) > MDanalysis
- It took very long time to get used to MDAnalysis's workflow. This is partially I knew mdtraj and wrote pytraj before knowing MDAnalysis

hainm · 2015-06-12T19:20:14Z

number of analyses

number:pytraj, MDAnalysis > mdtraj
* pytraj thanks to cpptraj's developers for more than hundred types of analyses.
membrane: MDAnalysis > (pytraj, mdtraj). No doubt about this because one of the core developers here, becksteinlab

hainm · 2015-06-12T19:38:18Z

Flexibility

I don't have much experience about MDAnalysis because it's not really straightforward to use. I need to spend more time to play with it.

Per pytraj vs mdtraj, I think pytraj has more flexibility. User can apply any pytraj's actions to a trajectory, a frame, a list of trajectory, a frame_iter, a chunk_iter, or a list/tuple of mixing different types without worrying about allocate array to hold the data.

This is an example from mdtraj (link)

rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
    rmsds.append(md.rmsd(chunk, first_frame))
    print(chunk, '\n', chunk.time)

this is from pytraj
*Note: there is bug in pyca.rmsd that does not follow pytraj''s spirit. So I am usingcalc_radgyr` here.

In [4]: import pytraj.common_actions as pyca

In [5]: pyca.calc_radgyr(traj)
Out[5]:
<pytraj.datasets.DataSet_double: size=10, name=RoG_00000, legend=RoG_00000[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

In [6]: pyca.calc_radgyr([traj, traj])
Out[6]:
<pytraj.datasets.DataSet_double: size=20, name=RoG_00002, legend=RoG_00002[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

In [7]: pyca.calc_radgyr([traj, traj, traj.chunk_iter(chunk=4)])
Out[7]:
<pytraj.datasets.DataSet_double: size=30, name=RoG_00004, legend=RoG_00004[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

hainm · 2015-06-12T19:54:14Z

Better trajectory iterator

(I will add note about MDAnalysis later)

Per mdtraj, md.iterload return a python generator object

In [4]: import mdtraj as md
In [5]: traj_filename = mdtraj.testing.get_fn('frame0.h5')
In [6]: md.iterload(traj_filename, chunk=100)
Out[6]: <generator object iterload at 0x7feea2f583a8>

while pytraj.io.iterload return a TrajectoryIterator object. I think this object has much more advantage over python generator. (and also thanks Jason Swails for this name suggestion. It's great).

In [7]: import pytraj.io
In [8]: traj = pytraj.io.load_sample_data("tz2")
In [9]: traj
Out[9]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>

TrajectoryIterator behaves like a regular in-memory Trajectory but data will only loaded to memory whenever needed.

And user never need to explicitly rewind after iterating it.

In [10]: traj.n_frames
Out[10]: 10

In [11]: for idx, frame in enumerate(traj):
   ....:     pass
   ....:

In [12]: assert idx == 9

In [13]: for idx, frame in enumerate(traj):
    pass
   ....:

In [14]: assert idx == 9

In [15]:

user can append new trajectory from disk to TrajectoryIterator without adding much memory too (because it's still in disk).

In [21]: traj
Out[21]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>


In [22]: traj.load(fname)

In [23]: traj
Out[23]:
<pytraj.TrajectoryIterator with 20 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>

hainm · 2015-06-12T20:03:33Z

parallel

all 3 packages support OPENMP for some analyses.
but pytraj explicitly support MPI. When saying explicitly, I mean user does not need to write complicated code or really need to understand deeply about MPI.

just

# mpi_example.py
from mpi4py import MPI
from pytraj.parallel import map as pymap
import pytraj.common_actions as pyca

comm = MPI.COMM_WORLD
total_arr = pymap(comm, pyca.calc_molsurf, traj, "@CA", top=traj.top, dtype='ndarray', root=0)
if comm.rank == 0:
    print (total_arr)

and then

mpirun -n 4 python mpi_example.py

pytraj's map will automatically split the traj into 4 chunks and perform calculation. traj could be an in-memory Trajectory or an on-disk TrajectoryIterator or even a list/tuple of them. Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup.

Note: not always get embarrassing speeding up. See Jason's comment below.

swails · 2015-06-12T20:35:01Z

What's the MPI speedup?

hainm · 2015-06-12T20:36:34Z

@swails

it's in the last sentence "Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup." This is true at least for pyca.calc_molsurf for large trajectory.

hainm · 2015-06-12T20:41:38Z

I have a notebook comparing pytraj MPI 8 cores with mdtraj 8 cores openmp too. The code look ugly but just showing that although mdtraj has very fast rmsd calculation, data IO is bigger deal, especially for large traj.

http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/parallel/rmsd_mpi.ipynb

swails · 2015-06-12T21:05:10Z

@swails

it's in the last sentence "Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup." This is true at least for pyca.calc_molsurf for large trajectory.

That's only true for something that's compute-bound. I/O bound applications are another story altogether -- your speedups stop as soon as the SATA bandwidth becomes saturated (I don't know when this is, which is why I asked, and it will vary from machine to machine). For highly parallel filesystems (e.g., Lustre), you can probably get good speedup (unless the filesystem sucks or your data is very poorly striped). On a standard desktop, scaling will top out as the I/O bus fills up.

Also, a lot of the most computationally demanding analyses are not embarrassingly parallel (FFTs, 2D-RMSD, and clustering to name a few). They can be parallelized, but not without communication (and in that case you may be better off going with SMP, which I think is why Dan dropped MPI for non-REMD analyses in favor of OpenMP).

hainm · 2015-06-12T21:25:55Z

That's only true for something that's compute-bound. I/O bound applications are another story altogether -- your speedups stop as soon as the SATA bandwidth becomes saturated (I don't know when this is, which is why I asked, and it will vary from machine to machine). For highly parallel filesystems (e.g., Lustre), you can probably get good speedup (unless the filesystem sucks or your data is very poorly striped). On a standard desktop, scaling will top out as the I/O bus fills up.

I still have very limited knowledge about MPI. But I don't know if I/O is really big problem here. For example, if I want to use 8 cores, I will create one TrajectoryIterator in each node (very little memory) and load only one single Frame for calculation (like calc_rmsd, calc_radgyr, ....). I think with current implementation in pytraj, it's just like you run 8 trajectories with cpptraj in parallel. There is almost 0-communication.

Of course, pytraj only support embarrassing parallel (it's good for H-bond search too). And pytraj does not aim tens or even hundreds of cores. It seems to overkill. I think 4-8 times speeding up is still cool.

Also, a lot of the most computationally demanding analyses are not embarrassingly parallel (FFTs, 2D-RMSD, and clustering to name a few). They can be parallelized, but not without communication (and in that case you may be better off going with SMP, which I think is why Dan dropped MPI for non-REMD analyses in favor of OpenMP).

You are right. I should mention in previous section that user should expect speeding up if there is minimal communication among nodes. I think pytraj has very good parallel scaling for any cpptraj' Action (total 79 actions in cpptraj) if the Action in each Frame is independent from other Frame.

For the ones you mentions, they are in Analysis classes, and most of them (AFAIN) require pairwise Frame calculation. I agree that this is very difficult to parallelize with MPI.

swails · 2015-06-12T21:31:44Z

I still have very limited knowledge about MPI. But I don't know if I/O is really big problem here. For example, if I want to use 8 cores, I will create one TrajectoryIterator in each node (very little memory) and load only one single Frame for calculation (like calc_rmsd, calc_radgyr, ....). I think with current implementation in pytraj, it's just like you run 8 trajectories with cpptraj in parallel. There is almost 0-communication.

You're still getting hung up on communication. In MD engines (sander, pmemd, Gromacs, you name it), 99.9% of the time those programs run is spent doing calculations with the processor. In that case, the more a processor has to wait to be filled with data (communication overhead), the worse the scaling is.

But with trajectory analysis, you are filling data from disk, which is orders of magnitude slower than filling data from RAM (which is probably an order of magnitude slower than filling registers from cache, but I digress). The issue is that a lot of trajectory analysis spends more time reading a single frame from disk than it spends computing an RMSD (or at least it spends a similar amount of time).

So it doesn't matter how little pytraj-MPI communicates, parallel scaling will halt as soon as frame reading becomes as expensive as the calculation (or close to it) and N+1 threads can't retrieve data faster from disk than N threads. Basically, I know communication is not a bottleneck (since there's none of it), but there are other potential bottlenecks than just MPI messages (in this case, disk I/O). But maybe disks have gotten a lot better, or your machine is particularly good, or...

hainm · 2015-06-12T21:33:28Z

thanks. I see. I will read more about this.

hainm · 2015-06-16T02:51:56Z

@swails I just remember that pymmpbsa is using MPI too? what's the scaling?

hainm · 2015-06-17T22:33:59Z

Universal

pytraj can use different engines ('pytraj' as default, 'mdanalysis', 'mdtraj') for loading file to pytraj Trajectory or TrajectoryIterator, ... objects

import pytraj as pt
pt.io.load(filename, topology_name) # default `pytraj`
pt.io.load(filename, topology_name, engine='mdtraj')
pt.io.iterload(filename, topology_name, engine='mdanalysis')

hainm · 2015-06-20T01:29:55Z

use different engines for loading data

cpptraj

import pytraj as pt
traj = pt.iterload(traj_filename, topology_filename)

mdtraj

In [13]: from mdtraj.testing import get_fn
In [14]: fn = get_fn("frame0.h5")
In [15]: import pytraj as pt
In [16]: traj = pt.load(fn, engine='mdtraj')

mdanalysis

In [18]: from MDAnalysisTests import datafiles
In [19]: t = pt.iterload(datafiles.DCD, datafiles.PSF, engine='mdanalysis')

hainm · 2015-08-03T17:43:08Z

parallel - 2

It's very easy to embarrassingly parallelize user's analysis function

just

def new_func(traj, mask, *args, **kwd):
    ...

and call pytraj.pmap

import pytraj as pt
pt.pmap(n_cores=4, func=new_func, traj, mask, *args, **kwd)

pytraj will split the work to 4 available cores.

Advantages:

No extra library since pytraj use built-in multiprocessing.
embarrassingly speeding up
pytraj uses out-of-core calculation, which mean pytraj uses very small memory.

hainm mentioned this issue Jun 22, 2015

give up cpptraj's DataSetList and change to sub-class python's list + map, vstack, ... #464

Merged

hainm closed this as completed Sep 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is pytraj different from MDAnalysis and mdtraj? #445

How is pytraj different from MDAnalysis and mdtraj? #445

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 16, 2015

hainm commented Jun 17, 2015

hainm commented Jun 20, 2015

hainm commented Aug 3, 2015

How is pytraj different from MDAnalysis and mdtraj? #445

How is pytraj different from MDAnalysis and mdtraj? #445

Comments

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

Design

hainm commented Jun 12, 2015

Code quality

hainm commented Jun 12, 2015

Easy to use?

hainm commented Jun 12, 2015

number of analyses

hainm commented Jun 12, 2015

Flexibility

hainm commented Jun 12, 2015

Better trajectory iterator

hainm commented Jun 12, 2015

parallel

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 12, 2015

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

swails commented Jun 12, 2015

hainm commented Jun 12, 2015

hainm commented Jun 16, 2015

hainm commented Jun 17, 2015

Universal

hainm commented Jun 20, 2015

use different engines for loading data

hainm commented Aug 3, 2015

parallel - 2