## `pytraj` vs `mdtraj`.

I just want to test how slow and how fast `pytraj` is to further optimize the speed. 
This comparison won't be fair since I have limited knowdlege about `mdtraj`. And there is different in designing two packages (`cpptraj` used double precision while `mdtraj` use (mostly?) float32)

### Note 1: The speed comparison might be very different in your computer since mdtraj use openmp for some calculations (rmsd, ...) and I did not set the number of cores to be used.
    * sometimes mdtraj.rmsd is extremly fast (>10 times faster than pytraj)
### Note 2: I updated `pytraj` vs `MDAnalysis` at the end of this page too
### See also 
   *     [multiple_analysises.ipynb](multiple_analysises.ipynb)
   *     [speed_test_2_trajs.ipynb](speed_test_2_trajs.ipynb)
   *     [performance_tips.ipynb](performance_tips.ipynb)
   *     [parallel/rmsd_mpi.ipynb](parallel/rmsd_mpi.ipynb)

In [1]:
from pytraj.__version__ import __version__ as p_version
print (p_version)
from mdtraj.version import version as m_version
print (m_version)

0.1.2.dev3
1.4.0.dev0.dev-bb6923f


## loading data

In [2]:
# data is stored in local disk, generated from TIP3P REMD, netcdf

from pytraj import io
import mdtraj as md

top_name = "../tests/data/nogit/remd/myparm.parm7"
filename = "../tests/data/nogit/remd/remd.x.000"

!du -sh "../tests/data/nogit/remd/remd.x.000"
print ()
!head "../tests/data/nogit/remd/myparm.parm7"

200M	../tests/data/nogit/remd/remd.x.000

%VERSION  VERSION_STAMP = V0001.000  DATE = 03/13/14  02:23:52                  
%FLAG TITLE                                                                     
%FORMAT(20a4)                                                                   
default_name                                                                    
%FLAG POINTERS                                                                  
%FORMAT(10I8)                                                                   
   17443      16   17165     287     603     386    1257    1129       0       0
   25521    5666     287     386    1129      61     139     179      32       1
       0       0       0       0       0       0       0       2      24       0
       0


In [3]:
# load whole traj into memory by FrameArray in pytraj
# I don't use %timeit because it's terriblly slow in my computer

%time io.load(filename, top_name)[:]
# using cpptraj's class (Trajin_Single)
%time io.load(filename, top_name) # the data is actually still in disk :D

%time md.load_netcdf(filename, top=top_name)

# pytraj fa: -1 (but not that slow)

CPU times: user 2.09 s, sys: 449 ms, total: 2.54 s
Wall time: 2.54 s
CPU times: user 191 ms, sys: 14 ms, total: 205 ms
Wall time: 205 ms
CPU times: user 1.71 s, sys: 330 ms, total: 2.04 s
Wall time: 2.24 s


<mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, and unitcells at 0x2aaad0477e10>

In [4]:
fa = io.load(filename, top_name)[:10]
m_traj = md.load_netcdf(filename, top=top_name)[:10]
assert fa.n_frames == m_traj.n_frames

## indexing

In [5]:
print("single frame")
%timeit fa[5]
%timeit m_traj[5]

print ("")
print ("slice")
%timeit fa[0:10:2]
%timeit m_traj[0:10:2]

single frame
10000 loops, best of 3: 165 µs per loop
10 loops, best of 3: 137 ms per loop

slice
10 loops, best of 3: 25 ms per loop
10 loops, best of 3: 137 ms per loop


## iterating

In [6]:
%timeit for frame in fa: pass

# mdtraj does not have Frame object, they have Trajecotory object with single frame
%timeit for traj in m_traj: pass

# iterating over Frame in pytraj is supper fast because we use C++ vector and we just moving the Frame pointer
# mdtraj is quite slow (because iterating over numpy array?), however their strenth is in numpy array. When a chunk
# of traj is loaded to numpy array, everything is very fast

The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 11.4 µs per loop
1 loops, best of 3: 1.34 s per loop


## iterload

In [6]:
# reimport files so I don't need to re-run this notebook
fname = "../tests/data/nogit/remd/remd.000.nc" # I need to to rename the file ext from ".x.000" to ".000.nc" so mdtraj can load
topname = "../tests/data/nogit/remd/myparm.parm7" # I need to rename the file from ".top" to "parm7" so mdtraj can load
import mdtraj as md
import pytraj.io as io

def iterload_pytraj():
    traj = io.load(fname, topname) # we use TrajReadOnly, which does not load any frame to disk when we call "io.load"
    for chunk in traj.chunk_iter(start=0, chunk=100):
        pass
    
def iterload_mdtraj():
    for chunk in md.iterload(fname, chunk=100, top=topname):
        pass
    
%timeit iterload_pytraj()
%timeit iterload_mdtraj()

1 loops, best of 3: 2.53 s per loop
1 loops, best of 3: 3.21 s per loop


## calc_rmsd

In [7]:
f0 = fa[0]

print ("")
print ("rmsd from pytraj, no parallel")
%timeit [frame.rmsd(f0) for frame in fa]

print ("")
print ("rmsd from mdtraj, no parallel")
%timeit md.rmsd(m_traj, m_traj, 0, parallel=False)

print ("")
print ("rmsd from pytraj, with parallel")
%timeit md.rmsd(m_traj, m_traj, 0, parallel=True)

# mdtraj has very fast rmsd calculation (about >10 times faster). they use "openmp" (?) and float32 while we used float64

# let's try convert pytraj's double precision traj to float32
class FakeTraj:
    def __init__(self, traj, astype='float64'):
        import numpy as np
        
        self.top = traj.top.copy()
        if astype == 'float64':
            self.xyz = traj.xyz
        elif astype == 'float32':
            self.xyz = traj.xyz.astype(np.float32)
        
ftraj32 = FakeTraj(fa, astype='float32')
ftraj64 = FakeTraj(fa)

print ("")
print ("compare speed between float32 and float64 for FakeTraj, no parallel")
%timeit md.rmsd(ftraj32, ftraj32, 0, parallel=False)
%timeit md.rmsd(ftraj64, ftraj64, 0, parallel=False)

# using float32 make rms calculation >= 3 times faster

print ("")
print ("compare speed between float32 and float64 for FakeTraj, with parallel")
%timeit md.rmsd(ftraj32, ftraj32, 0)
%timeit md.rmsd(ftraj64, ftraj64, 0)

# conclusion: use float32 + openmp to improve calculation in pytraj

# rmsd for single frame
f1 = fa[1]
m_traj1 = m_traj[0]

print ("")
print ("rmsd for single frame, pytraj, no parallel")
%timeit f1.rmsd(f0)
print ("rmsd for single frame, mdtraj, no parallel")
%timeit md.rmsd(m_traj1, m_traj1, 0, parallel=False)
print ("rmsd for single frame, mdtraj, with parallel")
%timeit md.rmsd(m_traj1, m_traj1, 0, parallel=True)


rmsd from pytraj, no parallel
100 loops, best of 3: 16.9 ms per loop

rmsd from mdtraj, no parallel
10 loops, best of 3: 131 ms per loop

rmsd from pytraj, with parallel
1 loops, best of 3: 276 ms per loop

compare speed between float32 and float64 for FakeTraj, no parallel
10 loops, best of 3: 129 ms per loop
10 loops, best of 3: 131 ms per loop

compare speed between float32 and float64 for FakeTraj, with parallel
1 loops, best of 3: 329 ms per loop
1 loops, best of 3: 262 ms per loop

rmsd for single frame, pytraj, no parallel
1000 loops, best of 3: 1.59 ms per loop
rmsd for single frame, mdtraj, no parallel
10 loops, best of 3: 132 ms per loop
rmsd for single frame, mdtraj, with parallel
1 loops, best of 3: 306 ms per loop


## calc_radgyr

In [8]:
%timeit fa.calc_radgyr()
%timeit md.compute_rg(m_traj)

# pytraj/cpptraj is about 5 times faster

10 loops, best of 3: 21.7 ms per loop
10 loops, best of 3: 25.6 ms per loop


## calc_dssp

In [9]:
## calc_dssp
import numpy as np
%timeit fa.calc_dssp(dtype='ndarray')

10 loops, best of 3: 21.3 ms per loop


In [10]:
%timeit md.compute_dssp(m_traj)

1 loops, best of 3: 1.65 s per loop


## calc_COM

In [11]:
%timeit fa.calc_COM()
%timeit md.compute_center_of_mass(m_traj)

# almost the same

10 loops, best of 3: 21.2 ms per loop
10 loops, best of 3: 29.7 ms per loop


## calc_distance

In [12]:
%timeit fa.calc_distance("@1 @300")

indices = np.array([[0, 299],])
%timeit md.compute_distances(m_traj, indices)

# mdtraj is much faster for single calculation (62 times faster, openmp?). 
# Not sure about including mask like :2-100@CB,CA ...

100 loops, best of 3: 24.9 ms per loop
1000 loops, best of 3: 359 µs per loop


## calc_psi

In [13]:
%timeit fa.calc_multidihedral("psi")
%timeit md.compute_psi(m_traj)

# mdtraj is about 10 times faster. Not sure about mask selection

10 loops, best of 3: 64.7 ms per loop
10 loops, best of 3: 54.6 ms per loop


## calc_phi

In [14]:
%timeit fa.calc_multidihedral("phi")
%timeit md.compute_phi(m_traj)

# same result as calc_psi

10 loops, best of 3: 65.6 ms per loop
10 loops, best of 3: 54.6 ms per loop


In [15]:
# search all dihedral?

%timeit fa.calc_multidihedral() # search all in pytraj/cpptraj

%timeit md.compute_chi1(m_traj)
%timeit md.compute_chi2(m_traj)
%timeit md.compute_chi3(m_traj)
%timeit md.compute_chi4(m_traj) 
%timeit md.compute_phi(m_traj)
%timeit md.compute_psi(m_traj)

10 loops, best of 3: 184 ms per loop
1 loops, best of 3: 270 ms per loop
1 loops, best of 3: 328 ms per loop
1 loops, best of 3: 216 ms per loop
10 loops, best of 3: 108 ms per loop
10 loops, best of 3: 55.2 ms per loop
10 loops, best of 3: 55.5 ms per loop


## saving files

In [16]:
# netcdf

%timeit fa.save("fa.nc", overwrite=True)
%timeit m_traj.save("m_traj.nc")

10 loops, best of 3: 21.2 ms per loop
100 loops, best of 3: 14.9 ms per loop


In [17]:
# dcd

%timeit fa.save("fa.dcd", overwrite=True)
%timeit m_traj.save("m_traj.dcd")

100 loops, best of 3: 6.66 ms per loop
100 loops, best of 3: 9.23 ms per loop


In [18]:
# binpos

%timeit fa.save("fa.binpos", overwrite=True)
%timeit m_traj.save("m_traj.binpos")

100 loops, best of 3: 6.75 ms per loop
10 loops, best of 3: 34.5 ms per loop


In [19]:
# xtc # not sure cpptraj supports

%timeit fa.save("fa.xtc", overwrite=True)
%timeit m_traj.save("m_traj.xtc")

1 loops, best of 3: 396 ms per loop
10 loops, best of 3: 20.6 ms per loop


# `pytraj` vs `MDAnalysis`

In [2]:
# I need to rename my files because `MDAanlysis` need correct extension to read file
# since `MDAnalysis` uses frame-iterating fashion, I will use TrajReadOnly in `pytraj` to have more fair comparison becuase
# `TrajReadOnly` does not load all frames into memory and user need to iterate to get the frame
import netCDF4 as netcdf
top_name = "../tests/data/nogit/remd/myparm.top"
filename = "../tests/data/nogit/remd/remd.000.ncdf"

In [3]:
# need to repload module to I don't need to run this notebook from the begining (which is really slow)
from pytraj import io
from MDAnalysis import Universe

In [4]:
%timeit traj = io.load(filename, top_name)
%timeit u = Universe(top_name, filename) # we use opposite file order

# `pytraj` load file 3 times faster

1 loops, best of 3: 208 ms per loop
1 loops, best of 3: 773 ms per loop


In [5]:
# reload files since we used timeit
traj = io.load(filename, top_name)
u = Universe(top_name, filename) # we use opposite file order

In [6]:
## iterating

%timeit for frame in traj: pass
%timeit for frame in u.trajectory: pass

# pytraj is about 3 times faster. If using FrameArray, iterating happens in <1 ms.

1 loops, best of 3: 985 ms per loop
1 loops, best of 3: 2.93 s per loop


In [8]:
## iterating for a chunk
%timeit for frame in traj(10, 999, 10): pass # twice faster
%timeit for frame in u.trajectory[10:999:10]: pass

10 loops, best of 3: 130 ms per loop
1 loops, best of 3: 308 ms per loop


## Note: I still need to check how MDanalysis does the calculation (rmsd, ...) to update this comparison