Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is pytraj different from MDAnalysis and mdtraj? #445

Closed
hainm opened this issue Jun 12, 2015 · 18 comments
Closed

How is pytraj different from MDAnalysis and mdtraj? #445

hainm opened this issue Jun 12, 2015 · 18 comments

Comments

@hainm
Copy link
Contributor

hainm commented Jun 12, 2015

Note: this is from my experience of using MDAnalysis and mdtraj. Since I spend most of my time playing with pytraj, there will be bias for this package. If my information about MDAnalysis and mdtraj is not correct, please feel free to comment here.

Each sub-section below will be filled more in future.

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

Design

  • MDAnalysis focuses on frame-based iterating. Only single frame is loaded for calculation.
  • mdtraj focuses on numpy vectorization. All data are loaded to memory (although it does have iterload but my feeling is that iterload is not really the main point of mdtraj).
  • pytraj: have both on-disk (TrajectoryIterator) and in-memory (Trajectory)

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

Code quality

  • mdtraj > (pytraj, MDAnalysis)
    • currently pytraj's code is still poorly organized and has a lots of issues

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

Easy to use?

  • (pytraj, mdtraj) > MDanalysis
    • It took very long time to get used to MDAnalysis's workflow. This is partially I knew mdtraj and wrote pytraj before knowing MDAnalysis

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

number of analyses

  • number:pytraj, MDAnalysis > mdtraj
    * pytraj thanks to cpptraj's developers for more than hundred types of analyses.
  • membrane: MDAnalysis > (pytraj, mdtraj). No doubt about this because one of the core developers here, becksteinlab

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

Flexibility

I don't have much experience about MDAnalysis because it's not really straightforward to use. I need to spend more time to play with it.

Per pytraj vs mdtraj, I think pytraj has more flexibility. User can apply any pytraj's actions to a trajectory, a frame, a list of trajectory, a frame_iter, a chunk_iter, or a list/tuple of mixing different types without worrying about allocate array to hold the data.

This is an example from mdtraj (link)

rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
    rmsds.append(md.rmsd(chunk, first_frame))
    print(chunk, '\n', chunk.time)

this is from pytraj
*Note: there is bug in pyca.rmsd that does not follow pytraj''s spirit. So I am usingcalc_radgyr` here.

In [4]: import pytraj.common_actions as pyca

In [5]: pyca.calc_radgyr(traj)
Out[5]:
<pytraj.datasets.DataSet_double: size=10, name=RoG_00000, legend=RoG_00000[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

In [6]: pyca.calc_radgyr([traj, traj])
Out[6]:
<pytraj.datasets.DataSet_double: size=20, name=RoG_00002, legend=RoG_00002[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

In [7]: pyca.calc_radgyr([traj, traj, traj.chunk_iter(chunk=4)])
Out[7]:
<pytraj.datasets.DataSet_double: size=30, name=RoG_00004, legend=RoG_00004[Max], 
aspect=Max, dtype=double, format=%12.4lf>
values: [ 26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993
  26.66003587  16.50015787  17.08691826  16.55618147  15.18575198
  15.86350294  14.08403482  14.65317853  14.48540094  13.69095993]

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

Better trajectory iterator

(I will add note about MDAnalysis later)

Per mdtraj, md.iterload return a python generator object

In [4]: import mdtraj as md
In [5]: traj_filename = mdtraj.testing.get_fn('frame0.h5')
In [6]: md.iterload(traj_filename, chunk=100)
Out[6]: <generator object iterload at 0x7feea2f583a8>

while pytraj.io.iterload return a TrajectoryIterator object. I think this object has much more advantage over python generator. (and also thanks Jason Swails for this name suggestion. It's great).

In [7]: import pytraj.io
In [8]: traj = pytraj.io.load_sample_data("tz2")
In [9]: traj
Out[9]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>

TrajectoryIterator behaves like a regular in-memory Trajectory but data will only loaded to memory whenever needed.

And user never need to explicitly rewind after iterating it.

In [10]: traj.n_frames
Out[10]: 10

In [11]: for idx, frame in enumerate(traj):
   ....:     pass
   ....:

In [12]: assert idx == 9

In [13]: for idx, frame in enumerate(traj):
    pass
   ....:

In [14]: assert idx == 9

In [15]:
  • user can append new trajectory from disk to TrajectoryIterator without adding much memory too (because it's still in disk).
In [21]: traj
Out[21]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>


In [22]: traj.load(fname)

In [23]: traj
Out[23]:
<pytraj.TrajectoryIterator with 20 frames: <Topology with 1692 mols, 1704 residues, 5293 atoms, 
5300 bonds, PBC with box type = ortho>>

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

parallel

  • all 3 packages support OPENMP for some analyses.
  • but pytraj explicitly support MPI. When saying explicitly, I mean user does not need to write complicated code or really need to understand deeply about MPI.

just

# mpi_example.py
from mpi4py import MPI
from pytraj.parallel import map as pymap
import pytraj.common_actions as pyca

comm = MPI.COMM_WORLD
total_arr = pymap(comm, pyca.calc_molsurf, traj, "@CA", top=traj.top, dtype='ndarray', root=0)
if comm.rank == 0:
    print (total_arr)

and then

mpirun -n 4 python mpi_example.py

pytraj's map will automatically split the traj into 4 chunks and perform calculation. traj could be an in-memory Trajectory or an on-disk TrajectoryIterator or even a list/tuple of them. Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup.

Note: not always get embarrassing speeding up. See Jason's comment below.

@swails
Copy link
Contributor

swails commented Jun 12, 2015

What's the MPI speedup?

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

@swails

it's in the last sentence "Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup." This is true at least for pyca.calc_molsurf for large trajectory.

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

I have a notebook comparing pytraj MPI 8 cores with mdtraj 8 cores openmp too. The code look ugly but just showing that although mdtraj has very fast rmsd calculation, data IO is bigger deal, especially for large traj.

http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/parallel/rmsd_mpi.ipynb

@swails
Copy link
Contributor

swails commented Jun 12, 2015

@swails

it's in the last sentence "Since there is minimal communication among nodes, user should get embarrassing ~4 times speedup." This is true at least for pyca.calc_molsurf for large trajectory.

That's only true for something that's compute-bound. I/O bound applications are another story altogether -- your speedups stop as soon as the SATA bandwidth becomes saturated (I don't know when this is, which is why I asked, and it will vary from machine to machine). For highly parallel filesystems (e.g., Lustre), you can probably get good speedup (unless the filesystem sucks or your data is very poorly striped). On a standard desktop, scaling will top out as the I/O bus fills up.

Also, a lot of the most computationally demanding analyses are not embarrassingly parallel (FFTs, 2D-RMSD, and clustering to name a few). They can be parallelized, but not without communication (and in that case you may be better off going with SMP, which I think is why Dan dropped MPI for non-REMD analyses in favor of OpenMP).

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

That's only true for something that's compute-bound. I/O bound applications are another story altogether -- your speedups stop as soon as the SATA bandwidth becomes saturated (I don't know when this is, which is why I asked, and it will vary from machine to machine). For highly parallel filesystems (e.g., Lustre), you can probably get good speedup (unless the filesystem sucks or your data is very poorly striped). On a standard desktop, scaling will top out as the I/O bus fills up.

I still have very limited knowledge about MPI. But I don't know if I/O is really big problem here. For example, if I want to use 8 cores, I will create one TrajectoryIterator in each node (very little memory) and load only one single Frame for calculation (like calc_rmsd, calc_radgyr, ....). I think with current implementation in pytraj, it's just like you run 8 trajectories with cpptraj in parallel. There is almost 0-communication.

Of course, pytraj only support embarrassing parallel (it's good for H-bond search too). And pytraj does not aim tens or even hundreds of cores. It seems to overkill. I think 4-8 times speeding up is still cool.

Also, a lot of the most computationally demanding analyses are not embarrassingly parallel (FFTs, 2D-RMSD, and clustering to name a few). They can be parallelized, but not without communication (and in that case you may be better off going with SMP, which I think is why Dan dropped MPI for non-REMD analyses in favor of OpenMP).

You are right. I should mention in previous section that user should expect speeding up if there is minimal communication among nodes. I think pytraj has very good parallel scaling for any cpptraj' Action (total 79 actions in cpptraj) if the Action in each Frame is independent from other Frame.

For the ones you mentions, they are in Analysis classes, and most of them (AFAIN) require pairwise Frame calculation. I agree that this is very difficult to parallelize with MPI.

@swails
Copy link
Contributor

swails commented Jun 12, 2015

I still have very limited knowledge about MPI. But I don't know if I/O is really big problem here. For example, if I want to use 8 cores, I will create one TrajectoryIterator in each node (very little memory) and load only one single Frame for calculation (like calc_rmsd, calc_radgyr, ....). I think with current implementation in pytraj, it's just like you run 8 trajectories with cpptraj in parallel. There is almost 0-communication.

You're still getting hung up on communication. In MD engines (sander, pmemd, Gromacs, you name it), 99.9% of the time those programs run is spent doing calculations with the processor. In that case, the more a processor has to wait to be filled with data (communication overhead), the worse the scaling is.

But with trajectory analysis, you are filling data from disk, which is orders of magnitude slower than filling data from RAM (which is probably an order of magnitude slower than filling registers from cache, but I digress). The issue is that a lot of trajectory analysis spends more time reading a single frame from disk than it spends computing an RMSD (or at least it spends a similar amount of time).

So it doesn't matter how little pytraj-MPI communicates, parallel scaling will halt as soon as frame reading becomes as expensive as the calculation (or close to it) and N+1 threads can't retrieve data faster from disk than N threads. Basically, I know communication is not a bottleneck (since there's none of it), but there are other potential bottlenecks than just MPI messages (in this case, disk I/O). But maybe disks have gotten a lot better, or your machine is particularly good, or...

@hainm
Copy link
Contributor Author

hainm commented Jun 12, 2015

thanks. I see. I will read more about this.

@hainm
Copy link
Contributor Author

hainm commented Jun 16, 2015

@swails I just remember that pymmpbsa is using MPI too? what's the scaling?

@hainm
Copy link
Contributor Author

hainm commented Jun 17, 2015

Universal

pytraj can use different engines ('pytraj' as default, 'mdanalysis', 'mdtraj') for loading file to pytraj Trajectory or TrajectoryIterator, ... objects

import pytraj as pt
pt.io.load(filename, topology_name) # default `pytraj`
pt.io.load(filename, topology_name, engine='mdtraj')
pt.io.iterload(filename, topology_name, engine='mdanalysis')

@hainm
Copy link
Contributor Author

hainm commented Jun 20, 2015

use different engines for loading data

  • cpptraj
import pytraj as pt
traj = pt.iterload(traj_filename, topology_filename)
  • mdtraj
In [13]: from mdtraj.testing import get_fn
In [14]: fn = get_fn("frame0.h5")
In [15]: import pytraj as pt
In [16]: traj = pt.load(fn, engine='mdtraj')
  • mdanalysis
In [18]: from MDAnalysisTests import datafiles
In [19]: t = pt.iterload(datafiles.DCD, datafiles.PSF, engine='mdanalysis')

@hainm
Copy link
Contributor Author

hainm commented Aug 3, 2015

parallel - 2

It's very easy to embarrassingly parallelize user's analysis function

just

def new_func(traj, mask, *args, **kwd):
    ...

and call pytraj.pmap

import pytraj as pt
pt.pmap(n_cores=4, func=new_func, traj, mask, *args, **kwd)

pytraj will split the work to 4 available cores.

Advantages:

  • No extra library since pytraj use built-in multiprocessing.
  • embarrassingly speeding up
  • pytraj uses out-of-core calculation, which mean pytraj uses very small memory.

@hainm hainm closed this as completed Sep 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants