Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change frame numbering from 1-based to 0-based #310

Closed
dotsdl opened this issue Jun 15, 2015 · 34 comments
Closed

Change frame numbering from 1-based to 0-based #310

dotsdl opened this issue Jun 15, 2015 · 34 comments

Comments

@dotsdl
Copy link
Member

dotsdl commented Jun 15, 2015

Subtask for #252:

We want to make frame numbering start from 0 so that indexing with, say:

u = mda.Universe(GRO, XTC)
u.trajectory[0]

gives frame 0 instead of frame 1. This change will likely break lots of code that includes ts.frame + 1 and ts.frame - 1 and the like; will need to go through the code with a fine-toothed comb for these instances, using the tests as markers for trouble.

@dotsdl dotsdl self-assigned this Jun 15, 2015
@dotsdl dotsdl added this to the 0.11 milestone Jun 15, 2015
@dotsdl
Copy link
Member Author

dotsdl commented Jun 16, 2015

I've started this process on issue-310. The general strategy is:

  • Look for ts.frame references which initialize its value; check that on first read this results in the first frame having label 0.
  • Run tests. Identify areas hardcoded for 1-based indexing and fix; fix anything else needed to ensure 0-based indexing maintained.
  • Ensure docs feature proper .. versionchanged labels where changes occurred; check that doc references to 1-based indexing are updated/expunged.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 16, 2015

I need help from someone familiar with the DCD pyrex code. I'm having trouble parsing it, so if someone wants to push a commit to this branch with the needed fixes to indexing there I'd be grateful.

@richardjgowers
Copy link
Member

some changes to Reader's iters just got pushed so make sure you're working with them, it should make your life easier too.

Something that might trip you up, I've seen a lot of Readers using the fact that frame is 1 based to know the index of the next frame ( in 0 based)... I think PDB was the culprit, but then this was used as a template for many others.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 16, 2015

Something that might trip you up, I've seen a lot of Readers using the fact that frame is 1 based to know the index of the next frame ( in 0 based)... I think PDB was the culprit, but then this was used as a template for many others.

Thanks for the tip. I've also noticed that many of them initialize ts.frame to 0 with the anticipation that when the first frame gets read it gets incremented. For now I'm going to try and solve this with initializing ts.frame to -1 and see how far this gets things. A cleaner solution may be possible.

@orbeckst
Copy link
Member

@dotsdl , can you link to the cython/C DCD code that you need some insights on? The DCD reader itself uses the C code. The specialized DCD correl/timeseries uses a re-implementation of the DCD C code in cython... at least if memory serves (@nmichaud did all the hard work a long time ago...).

@richardjgowers
Copy link
Member

So if everything is passing, all that needs to be done is making sure that the tests for each Reader are actually checking that:

  • starting at frame = 0
  • iterate properly (add 1 each frame)
  • restart from 0 when the trajectory is reiterated over
  • getitem numbers frames correctly (trajectory[i])

The testing for coordinate Readers is pretty haphazard currently, there should really be some sort of overarching _TestReader which all Readers have to go through that includes this.. but that's probably another Issue to this.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 18, 2015

@orbeckst I believe the relevant code is in MDAnalysis/package/src/dcd/dcd.c. Once I've finished checking that all the other Readers are behaving as expected (coordinate tests all pass, but need to see if they actually measure what's expected), I'll start poking at it myself. But if anyone else beats me to it, I'll be happy (ping @nmichaud ). It's probably one or two well-placed edits.

@richardjgowers
Copy link
Member

Hey @dotsdl

I pushed what is probably the the commit I'm least proud of ever, but it should make DCD work! The tests work before and after my hack (I checked it manually, gives 0 based now)

I'm getting a fail in test_analysis...
FAIL: Check for sustained resolution of Issue 188.
Not sure if you get that too?

If you want I can write the tests I proposed above

@dotsdl
Copy link
Member Author

dotsdl commented Jun 18, 2015

Running the tests now. I just pushed the changes needed to make the current set of tests meaningful for 0-based DCD frame reading. It looks like this hack works; I've got no better solutions, even after poking the pyrex code for a little while now.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 18, 2015

I'm about to start checking each reader test to ensure they do those tests you propose. If you want to check, too, I appreciate the help. You're welcome to start with the DCD reader. :D

@orbeckst
Copy link
Member

FYI: DCD.c is pure C, no cython/pyrex. The Python objects are hand-coded.
Do we need to change the underlying C code? (If we can keep it then it would make it somewhat easier to merge in changes from the upstream VMD molfile plugin sources... but admittedly, that's not high priority.)

@dotsdl
Copy link
Member Author

dotsdl commented Jun 18, 2015

We don't need to change it all, at least not with the fix @richardjgowers introduced, where we just decrement the frame number each time a frame is read.

@orbeckst
Copy link
Member

Ok, good. Sorry, didn't have time to dig into the commit.

On 18 Jun, 2015, at 14:43, David Dotson wrote:

We don't need to change it all, at least not with the fix @richardjgowers introduced, where we just decrement the frame number each time a frame is read.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 19, 2015

Okay, we now have 0-based frames for all readers except for the TRZ reader. Do TRZs store a frame number internally? I can't figure out where the tests are getting these frame numbers for that reader.

@richardjgowers
Copy link
Member

@dotsdl Yeah the frame is stored in the trajectory, fixed now.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 20, 2015

@richardjgowers awesome. We're almost there! I just realized I mispoke: the mol2 reader doesn't appear to update frame information at all on iteration. I noticed this in 610e300 but failed to mention it in my last post. I'm pulling my hair out trying to figure out why this is the case. Ideas?

@richardjgowers
Copy link
Member

Hehe, this one was pretty funny. MOL2 was creating a new Timestep object in each read, so was returning a Timestep with all the correct info. The problem was the tests were checking against the original Timestep (which wasn't touched in the read). So you were holding a pointer to a Timestep that MOL2 wasn't using any more..

I've fixed this so that MOL2 reuses the same Timestep throughout (in line with other Readers) and tests now pass

@richardjgowers
Copy link
Member

It might be something worth writing a test for... something like

ts = u.trajectory.ts
ts2 = u.trajectory.next()
assert ts is ts2

@dotsdl
Copy link
Member Author

dotsdl commented Jun 20, 2015

Okay...there are just a few tests that are failing yet. One has to do with streamio on the XYZReader, which I don't know how to fix. There are also failures from test_reader_api.py that shouldn't be failing, but are. Strangely, for the _TestReader.test_context test, changing:

assert_equal(l, 0)

back to

assert_equal(l, 1)

results in the value of l changing! How???

@dotsdl
Copy link
Member Author

dotsdl commented Jun 20, 2015

In other words, for

assert_equal(l, 0)

we get

======================================================================
FAIL: test_context (MDAnalysisTests.test_reader_api.TestSingleFrameReader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alter/Library/mdanalysis/MDAnalysis/testsuite/MDAnalysisTests/test_reader_api.py", line 110, in test_context
    assert_equal(l, 0)
  File "/usr/lib/python2.7/site-packages/numpy/testing/utils.py", line 334, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 1
 DESIRED: 0

----------------------------------------------------------------------

But with:

assert_equal(l, 1)

we get

======================================================================
FAIL: test_context (MDAnalysisTests.test_reader_api.TestMultiFrameReader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alter/Library/mdanalysis/MDAnalysis/testsuite/MDAnalysisTests/test_reader_api.py", line 110, in test_context
    assert_equal(l, 1)
  File "/usr/lib/python2.7/site-packages/numpy/testing/utils.py", line 334, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 0
 DESIRED: 1

----------------------------------------------------------------------

What is going on here?!

@richardjgowers
Copy link
Member

For those tests, I wrote 2 "reader" classes at the top of the test, these will have to be changed to give 0 based frame numbers.

@richardjgowers
Copy link
Member

Ok I've updated the Reader API tests to now expect and use 0 based frames throughout

@dotsdl
Copy link
Member Author

dotsdl commented Jun 21, 2015

Awesome. The only test we need to fix now is the test_XYZReader in test_streamio.py. @orbeckst, I noticed a note in the XYZReader(line 270 in coordinates/XYZ.py that indicates self._read_next_timestep() upon init is rather fragile. I'll defer to you on what the best fix here looks like.

@richardjgowers
Copy link
Member

I think I've squashed it now. Reading numframes was messing with the file handle

@dotsdl
Copy link
Member Author

dotsdl commented Jun 21, 2015

Great! Running all tests now. If they all pass, mind if I squash, apply any missing :versionchanged: notes to the docs, and then submit the PR?

@richardjgowers
Copy link
Member

Yeah if you could that would be awesome. I'm keen to get this through so I can finish #239

@richardjgowers
Copy link
Member

Hmmm ok, so the fix I wrote works in 2.7, but not 2.6, because the bz2 doesn't support context management in 2.6? I'll rewrite the fix, but it shouldn't affect all the fun you're having writing docs

@dotsdl
Copy link
Member Author

dotsdl commented Jun 21, 2015

I'll rewrite the fix, but it shouldn't affect all the fun you're having writing docs

Heh. :D

dotsdl added a commit that referenced this issue Jun 21, 2015
*Special thanks to @richardjgowers for lots of work on troublesome
Readers.*

Went through each Reader class; changed frame to 0-based.

Fixed tests relating to 0 based frame numbering

Added 0 based frames to DCD

Modified tests for DCDs.

Added 0-based frame tests to DMS, GMS test classes.

Added new tests for MOL2 reader checking frame indexing.

Finished tests for XYZ reader frames indices.

Fixed TRZ Reader frame numbering. TRZReader now reports Time from
Timestep not based on dt * frame.

Fixed MOL2 reader 0-based numbering. Now reuses Timesteps (like other
Readers) Now has _reopen method which "rewinds" the frame
numbering

The helanal analysis could use a redesign, but in the meantime fixed to
ensure that it works as expected regarding trajectory times.

Fixed test_reader_api tests to work with 0-based frames.

Updated Reader API checks for 0 based operation

Fixed XYZReader not rewinding properly on checking numframes

Added final .. versionchanged directives to classes, module docs
dotsdl added a commit that referenced this issue Jun 21, 2015
*Special thanks to @richardjgowers for lots of work on troublesome
Readers.*

Went through each Reader class; changed frame to 0-based.

Fixed tests relating to 0 based frame numbering

Added 0 based frames to DCD

Modified tests for DCDs.

Added 0-based frame tests to DMS, GMS test classes.

Added new tests for MOL2 reader checking frame indexing.

Finished tests for XYZ reader frames indices.

Fixed TRZ Reader frame numbering. TRZReader now reports Time from
Timestep not based on dt * frame.

Fixed MOL2 reader 0-based numbering. Now reuses Timesteps (like other
Readers) Now has _reopen method which "rewinds" the frame
numbering

The helanal analysis could use a redesign, but in the meantime fixed to
ensure that it works as expected regarding trajectory times.

Fixed test_reader_api tests to work with 0-based frames.

Updated Reader API checks for 0 based operation

Fixed XYZReader not rewinding properly on checking numframes

Added final .. versionchanged directives to classes, module docs
@dotsdl
Copy link
Member Author

dotsdl commented Jun 21, 2015

So as not to disrupt your work, I've squashed the commits for this issue and pushed them to issue-310-fin. If you push your final bits to the original issue-310 branch, I'll happily merge and squash them into the new one.

dotsdl added a commit that referenced this issue Jun 21, 2015
*Special thanks to @richardjgowers for lots of work on troublesome
Readers.*

Went through each Reader class; changed frame to 0-based.

Fixed tests relating to 0 based frame numbering

Added 0 based frames to DCD

Modified tests for DCDs.

Added 0-based frame tests to DMS, GMS test classes.

Added new tests for MOL2 reader checking frame indexing.

Finished tests for XYZ reader frames indices.

Fixed TRZ Reader frame numbering. TRZReader now reports Time from
Timestep not based on dt * frame.

Fixed MOL2 reader 0-based numbering. Now reuses Timesteps (like other
Readers) Now has _reopen method which "rewinds" the frame
numbering

The helanal analysis could use a redesign, but in the meantime fixed to
ensure that it works as expected regarding trajectory times.

Fixed test_reader_api tests to work with 0-based frames.

Updated Reader API checks for 0 based operation

Fixed XYZReader not rewinding properly on checking numframes

Added final .. versionchanged directives to classes, module docs

Fixed 2.6 XYZReader
dotsdl added a commit that referenced this issue Jun 21, 2015
*Special thanks to @richardjgowers for lots of work on troublesome
Readers.*

Went through each Reader class; changed frame to 0-based.

Fixed tests relating to 0 based frame numbering

Added 0 based frames to DCD

Modified tests for DCDs.

Added 0-based frame tests to DMS, GMS test classes.

Added new tests for MOL2 reader checking frame indexing.

Finished tests for XYZ reader frames indices.

Fixed TRZ Reader frame numbering. TRZReader now reports Time from
Timestep not based on dt * frame.

Fixed MOL2 reader 0-based numbering. Now reuses Timesteps (like other
Readers) Now has _reopen method which "rewinds" the frame
numbering

The helanal analysis could use a redesign, but in the meantime fixed to
ensure that it works as expected regarding trajectory times.

Fixed test_reader_api tests to work with 0-based frames.

Updated Reader API checks for 0 based operation

Fixed XYZReader not rewinding properly on checking numframes

Added final .. versionchanged directives to classes, module docs

Fixed 2.6 XYZReader
richardjgowers added a commit that referenced this issue Jun 22, 2015
@orbeckst
Copy link
Member

Please check all the docs and update accordingly, e.g. http://www.mdanalysis.org/mdanalysis/package/doc/html/documentation_pages/coordinates/init.html#id13

@orbeckst orbeckst reopened this Jul 13, 2015
@dotsdl
Copy link
Member Author

dotsdl commented Jul 14, 2015

All right. I'm back in business. I'll take care of this.

@orbeckst
Copy link
Member

@dotsdl , is there anything else to be done here than the docs? If not then this should be an easy "close"...

@dotsdl
Copy link
Member Author

dotsdl commented Jul 20, 2015

Just the docs. Need to go through with a comb. Doing this today in addition to #309.

@dotsdl
Copy link
Member Author

dotsdl commented Jul 21, 2015

Closed in 5bdbb47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants