## `pytraj` vs `mdtraj`.

I just want to test how slow and how fast `pytraj` is to further optimize the speed. 
This comparison won't be fair since I have limited knowdlege about `mdtraj`. And there is different in designing two packages (`cpptraj` used double precision while `mdtraj` use (mostly?) float32)

In [1]:
from pytraj.__version__ import __version__ as p_version
print (p_version)
from mdtraj.version import version as m_version
print (m_version)

0.1.2.dev3
1.4.0.dev0.dev-bb6923f


## loading data

In [2]:
# data is stored in local disk, generated from TIP3P REMD, netcdf

from pytraj import io
import mdtraj as md

top_name = "../tests/data/nogit/remd/myparm.parm7"
filename = "../tests/data/nogit/remd/remd.x.000"

!du -sh "../tests/data/nogit/remd/remd.x.000"
print ()
!head "../tests/data/nogit/remd/myparm.parm7"

200M	../tests/data/nogit/remd/remd.x.000

%VERSION  VERSION_STAMP = V0001.000  DATE = 03/13/14  02:23:52                  
%FLAG TITLE                                                                     
%FORMAT(20a4)                                                                   
default_name                                                                    
%FLAG POINTERS                                                                  
%FORMAT(10I8)                                                                   
   17443      16   17165     287     603     386    1257    1129       0       0
   25521    5666     287     386    1129      61     139     179      32       1
       0       0       0       0       0       0       0       2      24       0
       0


In [3]:
# load whole traj into memory by FrameArray in pytraj
# I don't use %timeit because it's terriblly slow in my computer

%time io.load(filename, top_name)[:]
# using cpptraj's class (Trajin_Single)
%time io.load(filename, top_name) # the data is actually still in disk :D

%time md.load_netcdf(filename, top=top_name)

# pytraj fa: -1 (but not that slow)

Topology is None
CPU times: user 2 s, sys: 521 ms, total: 2.52 s
Wall time: 2.52 s
CPU times: user 189 ms, sys: 13 ms, total: 202 ms
Wall time: 202 ms
CPU times: user 1.8 s, sys: 299 ms, total: 2.1 s
Wall time: 2.3 s


<mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, and unitcells at 0x2aaacf0627f0>

In [4]:
fa = io.load(filename, top_name)[:10]
m_traj = md.load_netcdf(filename, top=top_name)[:10]

Topology is None


## calc_rmsd

In [5]:
f0 = fa[0]

print ("")
print ("rmsd from pytraj, no parallel")
%timeit [frame.rmsd(f0) for frame in fa]

print ("")
print ("rmsd from mdtraj, no parallel")
%timeit md.rmsd(m_traj, m_traj, 0, parallel=False)

print ("")
print ("rmsd from pytraj, with parallel")
%timeit md.rmsd(m_traj, m_traj, 0, parallel=True)

# mdtraj has very fast rmsd calculation (about >10 times faster). they use "openmp" (?) and float32 while we used float64

# let's try convert pytraj's double precision traj to float32
class FakeTraj:
    def __init__(self, traj, astype='float64'):
        import numpy as np
        
        self.top = traj.top.copy()
        if astype == 'float64':
            self.xyz = traj.xyz
        elif astype == 'float32':
            self.xyz = traj.xyz.astype(np.float32)
        
ftraj32 = FakeTraj(fa, astype='float32')
ftraj64 = FakeTraj(fa)

print ("")
print ("compare speed between float32 and float64 for FakeTraj, no parallel")
%timeit md.rmsd(ftraj32, ftraj32, 0, parallel=False)
%timeit md.rmsd(ftraj64, ftraj64, 0, parallel=False)

# using float32 make rms calculation >= 3 times faster

print ("")
print ("compare speed between float32 and float64 for FakeTraj, with parallel")
%timeit md.rmsd(ftraj32, ftraj32, 0)
%timeit md.rmsd(ftraj64, ftraj64, 0)

# conclusion: use float32 + openmp to improve calculation in pytraj

# rmsd for single frame
f1 = fa[1]
m_traj1 = m_traj[0]

print ("")
print ("rmsd for single frame, pytraj, no parallel")
%timeit f1.rmsd(f0)
print ("rmsd for single frame, mdtraj, no parallel")
%timeit md.rmsd(m_traj1, m_traj1, 0, parallel=False)
print ("rmsd for single frame, mdtraj, with parallel")
%timeit md.rmsd(m_traj1, m_traj1, 0, parallel=True)


rmsd from pytraj, no parallel
100 loops, best of 3: 9.91 ms per loop

rmsd from mdtraj, no parallel
The slowest run took 14.28 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.51 ms per loop

rmsd from pytraj, with parallel
1000 loops, best of 3: 973 µs per loop

compare speed between float32 and float64 for FakeTraj, no parallel
1000 loops, best of 3: 1.47 ms per loop
100 loops, best of 3: 4.05 ms per loop

compare speed between float32 and float64 for FakeTraj, with parallel
1000 loops, best of 3: 897 µs per loop
100 loops, best of 3: 4 ms per loop

rmsd for single frame, pytraj, no parallel
1000 loops, best of 3: 996 µs per loop
rmsd for single frame, mdtraj, no parallel
1000 loops, best of 3: 267 µs per loop
rmsd for single frame, mdtraj, with parallel
1000 loops, best of 3: 435 µs per loop


## calc_radgyr

In [6]:
%timeit fa.calc_radgyr()
%timeit md.compute_rg(m_traj)

# pytraj/cpptraj is about 5 times faster

10 loops, best of 3: 23.8 ms per loop
10 loops, best of 3: 26.9 ms per loop


## calc_dssp

In [7]:
## calc_dssp
import numpy as np
%timeit fa.calc_dssp(dtype='ndarray')

10 loops, best of 3: 24.1 ms per loop


In [8]:
%timeit md.compute_dssp(m_traj)

1 loops, best of 3: 1.66 s per loop


## calc_COM

In [9]:
%timeit fa.calc_COM()
%timeit md.compute_center_of_mass(m_traj)

# almost the same

10 loops, best of 3: 25.3 ms per loop
10 loops, best of 3: 30.4 ms per loop


## calc_distance

In [10]:
%timeit fa.calc_distance("@1 @300")

indices = np.array([[0, 299],])
%timeit md.compute_distances(m_traj, indices)

# mdtraj is much faster for single calculation (62 times faster). Not sure about including mask like :2-100@CB,CA ...

10 loops, best of 3: 22.9 ms per loop
1000 loops, best of 3: 375 µs per loop


## calc_psi

In [11]:
%timeit fa.calc_multidihedral("psi")
%timeit md.compute_psi(m_traj)

# mdtraj is about 10 times faster. Not sure about mask selection

10 loops, best of 3: 69.9 ms per loop
10 loops, best of 3: 59.7 ms per loop


## calc_phi

In [12]:
%timeit fa.calc_multidihedral("phi")
%timeit md.compute_phi(m_traj)

# same result as calc_psi

10 loops, best of 3: 69.9 ms per loop
10 loops, best of 3: 59.6 ms per loop


In [13]:
# search all dihedral?

%timeit fa.calc_multidihedral() # search all in pytraj/cpptraj

%timeit md.compute_chi1(m_traj)
%timeit md.compute_chi2(m_traj)
%timeit md.compute_chi3(m_traj)
%timeit md.compute_chi4(m_traj) 
%timeit md.compute_phi(m_traj)
%timeit md.compute_psi(m_traj)

10 loops, best of 3: 195 ms per loop
1 loops, best of 3: 294 ms per loop
1 loops, best of 3: 350 ms per loop
1 loops, best of 3: 231 ms per loop
10 loops, best of 3: 117 ms per loop
10 loops, best of 3: 59.1 ms per loop
10 loops, best of 3: 58.8 ms per loop


## saving files

In [14]:
# netcdf

%timeit fa.save("fa.nc", overwrite=True)
%timeit m_traj.save("m_traj.nc")

10 loops, best of 3: 20.6 ms per loop
The slowest run took 4.95 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 12.7 ms per loop


In [15]:
# dcd

%timeit fa.save("fa.dcd", overwrite=True)
%timeit m_traj.save("m_traj.dcd")

The slowest run took 4.02 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 6.71 ms per loop
The slowest run took 7.53 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 8.41 ms per loop


In [16]:
# binpos

%timeit fa.save("fa.binpos", overwrite=True)
%timeit m_traj.save("m_traj.binpos")

The slowest run took 5.82 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 6.67 ms per loop
10 loops, best of 3: 33.6 ms per loop


In [17]:
# xtc # not sure cpptraj supports

%timeit fa.save("fa.xtc", overwrite=True)
%timeit m_traj.save("m_traj.xtc")

1 loops, best of 3: 402 ms per loop
10 loops, best of 3: 20.5 ms per loop
