Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory slicing should be completely Pythonic #918

Closed
richardjgowers opened this issue Jul 29, 2016 · 23 comments
Closed

Trajectory slicing should be completely Pythonic #918

richardjgowers opened this issue Jul 29, 2016 · 23 comments

Comments

@richardjgowers
Copy link
Member

Expected behaviour

Slicing a trajectory should follow all the rules of slicing a Python list

Actual behaviour

Some slices raise errors (with appropriate error messages)

Code to reproduce the behaviour

import MDAnalysis as mda

u = mda.Universe(top, trj)

In [22]: a = range(10)

In [23]: a[:1000]
Out[23]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [24]: u.trajectory[:1000]
Out[24]: <DCDReader /home/richard/.local/lib/python2.7/site-packages/MDAnalysisTests-0.14.1.dev0-py2.7.egg/MDAnalysisTests/data/adk_dims.dcd with 98 frames of 3341 atoms>

In [25]: u.trajectory[1000::-1]

IndexError: Frame start/stop outside of the range of the trajectory.

In [26]: a[1000::-1]
Out[26]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Currently version of MDAnalysis:

(run python -c "import MDAnalysis as mda; print(mda.__version__)")

@richardjgowers
Copy link
Member Author

So more a question, trajectory slicing should always just work like slicing a list right?

@richardjgowers richardjgowers mentioned this issue Jul 29, 2016
7 tasks
@orbeckst
Copy link
Member

The common trick is to actually slice a list or array like you did in your example and then fancy-index the trajectory... In principle we're doing this already in coordinates.ProtoReader._sliced_iter() but check_slice_indices() is using its own rules. It's a bit more costly to create and slice an array range(self.n_frames) but then you can be sure of the results.

@orbeckst
Copy link
Member

And yes, I am also of the opinion that trajectory slicing should just behave like array slicing and indexing.

@orbeckst
Copy link
Member

orbeckst commented Jul 30, 2016

How about

def get_slice_frame_indices(self, trjslice):
    frame_indices = np.arange(self.n_frames)[trjslice]
    if not frame_indices:
        raise IndexError("Slice {0} results in an empty trajectory.".format(trjslice))
    return frame_indices

def _sliced_iter(self, start, stop, step):
    for i in self.get_slice_frame_indices(slice(start, stop, step)):
         yield self._read_frame(i)

(without docs and defensive programming but you should get the idea...)

@jbarnoud
Copy link
Contributor

frame_indices = np.arange(self.n_frames)[trjslice]

That can be quite a huge array. It will be slow to build, and memory costly. Removing that kind of constructs in the auxiliary readers improved the perf by 2 orders of magnitude, I worry that having them here would hurt a lot.

@kain88-de
Copy link
Member

Yeah use a generator instead.

@orbeckst
Copy link
Member

Fair enough, although you only need to do it once and even 1M frames will only generate a 2MB array (I think our ints are 16bit).

Alternatively: find the NumPy code that does the slicing.

Oliver Beckstein
email: orbeckst@gmail.com

Am Jul 30, 2016 um 0:07 schrieb Jonathan Barnoud notifications@github.com:

frame_indices = np.arange(self.n_frames)[trjslice]

That can be quite a huge array. It will be slow to build, and memory costly. Removing that kind of constructs in the auxiliary readers improved the perf by 2 orders of magnitude, I worry that having them here would hurt a lot.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@kain88-de
Copy link
Member

kain88-de commented Jul 30, 2016

Why can't we use range as a generator to provide us the required integers? _get_slice_frame_indices needs some small defensive guards but otherwise it should be possible to do.

def _get_slice_frame_indices(self, trjslice):
    # have some guards here.
    return range(trjslice)

@orbeckst
Copy link
Member

Excellent idea – catch fancy indexing first and then hand over to six.range / xrange.

On 30 Jul, 2016, at 10:32, Max Linke notifications@github.com wrote:

Why can we use range as a generator to provide us the required integers? _get_slice_frame_indices needs some small defensive guards but otherwise it should be possible to do.

def _get_slice_frame_indices(self, trjslice
):

have some guards here.

return range(trjslice)

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

@orbeckst
Copy link
Member

orbeckst commented Jul 31, 2016

Btw, to make the range(trjslice) approach work one needs to set

start = start if start is not None else trajectory.n_frames

(It does not handle stop=None.)

I think that this is all that is needed.

@orbeckst
Copy link
Member

orbeckst commented Sep 14, 2016

Stupid question, PR #916 by @jdetle says that it fixed this issue. Is this correct? Can we properly close this issue?

EDIT: Oops, sorry, can't read. The PR said "fixed #914". Apologies

@jdetle
Copy link
Contributor

jdetle commented Sep 14, 2016

Partially, negative stepping requires a new and improved DCD reader IIRC.

@orbeckst
Copy link
Member

orbeckst commented Sep 14, 2016

Yeah, that's right, so #922 needs to be referenced (implement backwards reading in DCDReader).

EDIT: I should think before I type.

Thus, there isn't really any big hurdle here. Just the usual write the code and test (and give proper error messages for any reader that cannot do random access yet) – once #314 is solved, all readers can just do this.

@orbeckst orbeckst changed the title Trajectory slicing Trajectory slicing should be completely Pythonic Sep 14, 2016
@orbeckst
Copy link
Member

The original report by @richardjgowers asks for trajectory slicing working like list slicing.

However, wouldn't it be even better if it worked like numpy array slicing, i.e., including fancy indexing and boolean indexing? The only difference to array slicing is that we wouldn't return a view or copy but still an iterator.

@richardjgowers
Copy link
Member Author

I think I've implemented boolean slicing...

On Wed, 14 Sep 2016, 6:02 p.m. Oliver Beckstein, notifications@github.com
wrote:

The original report by @richardjgowers https://github.com/richardjgowers
asks for trajectory slicing working like list slicing.

However, wouldn't it be even better if it worked like numpy array
slicing
, i.e., including fancy indexing and boolean indexing? The only
difference to array slicing is that we wouldn't return a view or copy but
still an iterator.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AI0jB3AHuo0M5vVtIditOxlyYiMhRyM-ks5qqCingaJpZM4JYkvG
.

@richardjgowers
Copy link
Member Author

@orbeckst yeah so rather than pythonic, we're aiming for numpythonic slicing (like we do with Groups too)

So thinking about this wrt #239 it would make sense if the slicing returned a generator with its own file handle, so

traj1 = u.trajectory[::2]
traj2 = u.trajectory[::4]

Returned generators that were independent of each other, I think currently they won't play nice with each other.

@orbeckst
Copy link
Member

Separate file handles would be great because then it would be easier to implement analysis that compares a trajectory with itself.

For parallel threaded analysis it would also be great (if on can get around the GIL).

Oliver Beckstein
email: orbeckst@gmail.com

Am Sep 15, 2016 um 2:04 schrieb Richard Gowers notifications@github.com:

@orbeckst yeah so rather than pythonic, we're aiming for numpythonic slicing (like we do with Groups too)

So thinking about this wrt #239 it would make sense if the slicing returned a generator with its own file handle, so

traj1 = u.trajectory[::2]
traj2 = u.trajectory[::4]
Returned generators that were independent of each other, I think currently they won't play nice with each other.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@shobhitagarwal1612
Copy link
Contributor

I would like to handle this error. I have understood what the bug is, but I am not sure where to start from. Please guide me.

@jbarnoud
Copy link
Contributor

Isn't this fixed already?

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF, DCD

u = mda.Universe(PSF, DCD)
for ts in u.trajectory[8::-1]:
    print(ts.frame)

prints just as expected:

8
7
6
5
4
3
2
1
0

Or did I miss something?

@jbarnoud
Copy link
Contributor

It seems to be fixed by #1081. Sorry @shobhitagarwal1612, you will have to find an other issue to tackle 😃

@richardjgowers
Copy link
Member Author

Maybe we should check there's a test somewhere where we go through all sorts of slices and check a numpy array and a reader react the same way

@shobhitagarwal1612
Copy link
Contributor

shobhitagarwal1612 commented Jan 29, 2017

@jbarnoud
when i try running u.trajectory[99::-1]
it throws the below exception

Traceback (most recent call last):
File "", line 1, in
File "/home/shobhit/Desktop/PythonProjects/mdanalysis/package/MDAnalysis/coordinates/base.py", line 1221, in getitem
frame.start, frame.stop, frame.step)
File "/home/shobhit/Desktop/PythonProjects/mdanalysis/package/MDAnalysis/coordinates/base.py", line 1314, in check_slice_indices
"Frame start/stop outside of the range of the trajectory.")
IndexError: Frame start/stop outside of the range of the trajectory.

Shouldn't it return the complete list in reverse order according to the Python docs?

@jbarnoud
Copy link
Contributor

@shobhitagarwal1612 Indeed! This is what I missed! I though I could close an issue for free...

We have our culprit here: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/base.py#L1312
I guess the function should return the right start and stop based on nframes rather than raising an issue. This will affect AnalysisBase, though; it is fine by me, but maybe other see an issue at the analysis not complaining when start and stop are out of bounds.

So fixing the issue is a matter of a very little change, but also some tests.

@jbarnoud jbarnoud reopened this Jan 29, 2017
shobhitagarwal1612 added a commit to shobhitagarwal1612/mdanalysis that referenced this issue Jan 29, 2017
shobhitagarwal1612 added a commit to shobhitagarwal1612/mdanalysis that referenced this issue Jan 30, 2017
shobhitagarwal1612 added a commit to shobhitagarwal1612/mdanalysis that referenced this issue Jan 30, 2017
jbarnoud pushed a commit that referenced this issue Feb 10, 2017
Makes trajectory slicing completely pythonic #918 

Original commit messages:

* Trajectory slicing made completely pythonic #918
* Adds test cases
Fixes some logical error
* Merged develop branch with forked develop branch
* Adds test cases
Fixes some logical error
* Merged develop branch with forked develop branch
* Trajectory slicing made completely pythonic #918
* Adds test cases
Fixes some logical error
* Merged develop branch with forked develop branch
* Adds some more test cases for pythonic slicing
* Fixed minor bugs
* Reduced the lines of code for the pythonic slicing logic
The tests which no longer raises an Error are removed
The values for the removed tests are moved into the test generator.
* Fixes import order
* Refractored analysis/base.py
Added warning docstring to check_slice_indices
* Modifies AUTHORS and CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants