-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why FrameArray
and TrajReadOnly
?
#262
Comments
Hi, this is because of historical issue. :D I started to wrap Trajin_Single in cpptraj's first (TrajReadOnly). But latter I need to have another class to hold a chunk of data in memory, so I created FrameArray (similar to FrameArray in cpptraj but pytraj has more control). The only reason I want to expose TrajReadOnly in While playing with TrajReadOnly, I wanted to slice it, like traj[0:20:2] to get a chunk of frames. There # load whole traj into memory by FrameArray in pytraj
# I don't use %timeit because it's terriblly slow in my computer
%time io.load(filename, top_name)[:] # FrameArray
# using cpptraj's class (Trajin_Single)
%time io.load(filename, top_name) # TrajReadOnly
%time md.load_netcdf(filename, top=top_name) # mdtra's Trajectory
# pytraj fa: -1 (but not that slow)
CPU times: user 2.09 s, sys: 449 ms, total: 2.54 s # FrameArray
Wall time: 2.54 s
CPU times: user 191 ms, sys: 14 ms, total: 205 ms # TrajReadOnly
Wall time: 205 ms
CPU times: user 1.71 s, sys: 330 ms, total: 2.04 s # mdtraj
Wall time: 2.24 s you can see that FrameArray and mdtraj have similar loading time (1.7-2.1 s), which is much slower For FrameArray, I also like its offers: iterating over frames is really fast http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/speed_test_2_trajs.ipynb iterating and slicing is fast because no data is copied (not like Why not an object like For example, I am experimenting So in summary, But you're right that simple API is much more robust than trying to make things very complicated. I will think more about designing it. ( and actually thanks. Hai |
btw, I created http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/index.ipynb PS: (for example. even with their official website, the demo plot looks terrible Hai |
ah, I really don't like FrameArray and TrajRreadOny's names. :D ( |
oh, two more points about
you simply just call: To the best of my limited knowledge about frame iterator? it's simple: The main idea is to use both
|
OK, so correct me if I'm wrong, but it seems to me that the advantage of
Do you get a >>> import numpy as np
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> arr[2:8] = 0
>>> arr
array([0, 1, 0, 0, 0, 0, 0, 0, 8, 9]) Since Anyway, I digress. Let me suggest something for pytraj here:
This is more consistent with Python's data model, IMO (an iterator -- like a generator -- cannot be assigned to because there's nothing to assign to, since it's not in memory). Consider: >>> iterable = [x for x in range(10)]
>>> iterable[8] = 0
>>> iterable
[0, 1, 2, 3, 4, 5, 6, 7, 0, 9]
>>> iterable = (x for x in range(10))
>>> iterable[8] = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object does not support item assignment The way you've described it to me, this is the difference between
It's easy to get started. Publication-quality plots are more challenging, but you can uncover the layers of complexity in matplotlib as you need them, and through it all their object model is consistent (Artists, Patches, Line2Ds, Axes, etc.). A As for the layout -- you are looking for the |
A little clarification -- Consider: Python 3.3.5 (default, Aug 24 2014, 10:02:17)
[GCC 4.7.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> range(10)
range(0, 10)
>>> range(10)[5]
5 |
Compare that to the much more limited pure generator: Python 3.3.5 (default, Aug 24 2014, 10:02:17)
[GCC 4.7.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (x for x in range(10))
<generator object <genexpr> at 0x7f0e9bdeacd0>
>>> (x for x in range(10))[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable So it's a subscriptable generator -- I would still stress this as an iterator. Calling it |
@swails: you know that I am really bad at naming stuff. For example, I used to have a method called
Hai |
The name should describe what it's good for. I like |
Yes. (python3 use lots of iterator for memory reason :D, so I follow this)
Yes. It's a you can look here and find that
this only happen if you don't break the "contiguous" things for memory. Please check this |
I am experiencing This is very promising. However I still need to know what I really need. And honesty I don't know what's advantage of using numpy's To update fa[0].xyz[:] = numpy_xyz_array # fa is `FrameArray` Another point, since import mdtraj as md
md.rmsd(fa, fa, 0) # fa is `FrameArray` object.
I love to have this too but it's impossible with current implementation of cpptraj. As I said before, the memory layout is different. |
great. noted.
Yes, noted.
Noted. But what's about we can index overall, I think this is great suggestion (as Hai |
I see. Great.
I made import chemistry as chem
parm = chem.load_file(parm_name) while in |
I just realize that slicing https://github.com/hainm/pytraj/blob/master/pytraj/FrameArray.pyx#L428 (but of-course, it's impossible to slice and strip atoms at the same time to get a fa['@CA'] # return a copy since we slice and strip all but @CA atoms |
I can wait to change the names, so just push few hundreds of files to pytraj/pytraj :D Hai |
oops, I'just check and it turned out that slicing FrameArray (fa[:3]) will return a copy. :D for a single Frame, fa[0] will return a |
I disagree. A large number of analyses require RMSD alignment, centering and imaging, or just stripping out the solvent altogether. If I recall details of previous git log messages (they've become too numerous for me to continue following them), this can't be done with a
FWIW, git is unique in that it separates "content" (i.e., all of the lines of code, git calls them 'blobs') from the files that they're in ("tree" information). So renaming, moving, copying, etc. different files will not actually weigh down a git repository, since a lot of the content does not change.
I'm not sure how pytraj is using C++ behind-the-scenes, but if you pass a As far as stripping, even |
Hi, I should be more clearer (due to my English). I meant
Yes, I agree with this. But
I think you're thinking about for frame in TrajectoryIterator():
frame.xyz[:] = some_thing_cool # note: `xyz` here is a numpy array view of frame coords This approach is the same as traj.autoimage()
traj.strip_atoms('!CA') # inplace strip atoms. this is different from traj['@CA']
traj.fit_to(ref) an any In [57]: traj = io.load_sample_data ()[:]
In [58]: traj[0, 0]
Out[58]: array([ 3.32577000e+00, 1.54790900e+00, -1.60000000e-06])
In [59]: for frame in traj: frame[0] = [2., 3., 4.]
In [60]: traj[0, 0] # traj[0, 0] was updated when we update `frame`
Out[60]: array([ 2., 3., 4.]) |
No, |
Yes. in But I might be wrong since my understanding about C++ is limited too. |
@swails |
On Wed, May 6, 2015 at 9:04 PM, Hai Nguyen notifications@github.com wrote:
Jason is right about the vector passing in C++. This creates a copy: MyFunction(std::vector myvec) {} While this will just pass the reference with no copy: MyFunction(std::vector const& myvec) {} The segfault seems odd to me. I'm not 100% clear on how pytraj is sitting
Daniel R. Roe, PhD |
It's exactly the same concept -- you are passing by value rather than reference (since you declared the So in response: yes, that makes a copy. |
ah, I see. I will try to see if I can use Hai |
after trial and error + google + stackoverflow, It's likely that If I understand correctly, if using this is my idea about slicing Trajectory to give a
I guess I need to create Hai |
ah, I make new update: Hai |
Finally I can get a In [2]: from pytraj import io
In [3]: from pytraj.testing import aa_eq
In [4]: # load sample to `Trajectory` (tz2.ortho.*)
In [5]: traj = io.load_sample_data("tz2")[:]
In [6]: mylist = [1, 5, 8]
In [7]: # slicing to get 3 frames. `aa_eq` = `assert_almost_equal to 7 decimal place
In [8]: fa0 = traj[mylist]
In [9]: fa1 = traj[mylist]
In [10]: aa_eq(fa0.xyz, fa1.xyz)
In [11]: aa_eq(fa0.xyz, traj[mylist].xyz)
In [12]: # update fa0 and make sure fa1, traj are updated too
In [13]: fa0[0, 0] = [100., 101., 102.]
In [14]: assert fa1[0, 0, 0] == fa0[0, 0, 0] == 100.
In [15]: assert traj[1, 0, 0] == 100.
In [16]: # result
In [17]: print (fa0[0 , 0], fa1[0 , 0], traj[1 , 0])
[ 100. 101. 102.] [ 100. 101. 102.] [ 100. 101. 102.] |
just tested slicing speed and In [8]: fa
Out[8]:
<Trajectory with 1000 frames, 17443 atoms/frame>
In [9]: m_traj
Out[9]: <mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, and unitcells
at 0x2aaab94eb320>
In [10]: s = slice(0, 1000, 5)
In [11]: %timeit fa[s]
10 loops, best of 3: 44.1 ms per loop
In [12]: %timeit m_traj[s]
10 loops, best of 3: 179 ms per loop |
Cython is so cool. I just implemented very fast slicing by add more types for In [5]: fa = io.load("remd.000.nc", top="myparm.parm7")
In [6]: fa
Out[6]:
<pytraj.Trajectory with 1000 frames: <Topology with 5634 mols, 5666 residues, 17443 atoms, 17452 bonds, PBC with box type = truncoct>>
In [7]: s = slice(0, 1000, 5)
In [8]: %timeit fa._fast_slice (s)
The slowest run took 4.24 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 27.4 µs per loop @swails I highly recommend to use Cython if you ever want to speedup your ParmEd. It's good for Hai |
I am closing this since names were changed. |
I think this is somewhat related to one of your comments in #260, but I wonder what the rationale is to have
FrameArray
andTrajReadOnly
as two separate objects storing the same information.This adds complexity to the underlying API (that does not exist in
mdtraj
), and this complexity is not alleviated by having convenience methods to translate between them, IMO. For a library application like this one (ormdtraj
), I've found that those in which I can easily memorize the entire object model and data flow design are the easiest (and most enjoyable) to use. This includes even very complex packages, likepandas
andmatplotlib
(they try hard to keep their object models clean and simple). I may have to look up what methods to use to do a certain task, but knowing the object model and data flow makes it easy to know where to look.Unless I'm missing something, there's nothing that
TrajReadOnly
can do (functionally) thatFrameArray
cannot, correct? The only difference is that the former is immutable. So what's the point ofTrajReadOnly
? Why not always useFrameArray
and just get rid of the other one? It's easy to makeFrameArray
immutable... just don't change it. I don't see why this has to be enforced by the library, and is rather against the design principles of Python itself (i.e., "We're all consenting adults").The text was updated successfully, but these errors were encountered: