Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestep API interface discussion (was: should Timestep.time start with 0.0?) #252

Closed
dotsdl opened this issue Apr 14, 2015 · 14 comments
Closed

Comments

@dotsdl
Copy link
Member

dotsdl commented Apr 14, 2015

The coordinates.Reader.time property gives the frame number (1-based) multiplied by the timestep (dt) between frames of the trajectory (determined from the first two frames of the trajectory for xdrfiles). However, the result of this is that the time of the first frame is shifted by dt from what is commonly found in trajectory files, where the first frame (t = 0) is usually written.

In other words, where universe.trajectory.time would be more accurate to give 0.0, with a dt = 1000.0 ps it would instead give 1000.0 for the time of the first frame.

It might be a bit disruptive to existing code, but should this be changed? It becomes less relevant if time becomes a common attribute for all Readers (#250), but then again, it also offers a point for confusion.

@richardjgowers
Copy link
Member

I think this makes sense. The obvious exception should be that if a Timestep has its own record of time (eg the simulation didn't start at t=0) then this should override everything.

The only thing I can think of is that this is technically something that should "belong" to the Timestep, a Reader doesn't have an inherent time, but a Timestep does. So all the handling should be done there... so probably all Reader should do is access Timestep.time... ie as a Timestep, I try and return my record of time, but failing that I fudge one for the Reader.

So something like:

class Reader
    @property
    def time(self):
        return self.ts.time

class Timestep
    @property
    def time(self):
        try:
            return self.ts.time
        except AttributeError:
            return self.ts.frame * self.ts.dt

I'd consider the current implementation of Reader.time a bug worth changing.

@dotsdl
Copy link
Member Author

dotsdl commented Apr 14, 2015

So all the handling should be done there... so probably all Reader should do is access Timestep.time... ie as a Timestep, I try and return my record of time, but failing that I fudge one for the Reader.

I agree that Reader.time should just return whatever Reader.ts.time gives. Since ts.time should be present across all Timestep objects (#250), is there ever an instance where we would ever be compelled to fudge a time? The current implementation assumes a constant dt, which will be completely off if a frame is missing or if the timestep varies across the trajectory (perhaps rare, but still).

In short, are there trajectory formats that don't include time information? If so, how to handle them cleanly?

@richardjgowers
Copy link
Member

Yeah I think I was half asleep when I wrote that, if .time is missing then I'm not sure how .dt would be populated...

I think it'd just be single frame stuff that doesn't have time.

@mnmelo
Copy link
Member

mnmelo commented Apr 15, 2015

I'm sure we'll run into exceptional cases. For instance, one can load multiple single-frame .pdb's or .gro's using the Chain Reader.
What time should be then made up?

Also, with the Chain Reader, will we reset the time back to 0 when multiple trajectories are loaded and the iteration goes from one to the next?

@orbeckst
Copy link
Member

IIRC, the original idea for having Reader.time in the reader was to make clear that this is something that the reader guesses and not something read from the trajectory. However, this seems pretty confusing and we should rather strive for a consistent interface and simply doing the best we can under the hood.

There are a bunch of formats like XYZ or multi-PDB, which are common lowest common denominator — and in fact not even DCD stores the time in a frame. You have to compute it from the header pretty much as frame_number * dt.

@orbeckst orbeckst added the API label Apr 16, 2015
@orbeckst
Copy link
Member

It sounds like a good idea to me to make Timestep responsible for the time, similar to what it already does for dimensions.

We could get rid of Reader.time for the 1.0 release but in the meantime deprecate trajectory.time (while just returning whatever Timestep.time says).

@orbeckst orbeckst removed the question label Apr 21, 2015
@orbeckst
Copy link
Member

I propose to

  1. introduce universally Timestep.time (as part of the API):
    • If the trajectory records time, it will contain the current frame's time (in MDA units)
    • If the trajectory records time between frames dt and frame numbers N, it will calculate the time as N * dt.
      • Note: we need to decide if frame numbers should start at 0 or 1!
      • We could also introduce a new optional trajectory keyword time_offset (default 0) which could be used to start a trajectory at a later time, i.e. Timestep.time = N * dt + Reader.time_offset.
    • If the trajectory does not contain time information then log a warning and pretend that dt = 1 and use the N * dt calculated time.
  2. deprecate Reader.time
  3. let the ChainReader try to use the trajectory time or the calculated time but if this fails (e.g. time for step N+1 is less than time for step N) use time_offset with some heuristic guesses to appropriately glue trajectories together.

The original question ("should trajectory time start at 0?") really boils down to "should frame numbering be 0 or 1 based". Currently it is supposed to be 1-based but this always creates confusion with the trajectory indexing and slicing notation so I would be in favor of harmonizing frame numbers and slicing/indexing and making it 0-based. I am not really sure, though, if there are deeper/annoying consequences to that decision.

Comments?

Once we have a sensible plan, we should open a few separate issues on sub-tasks. For instance, something like time_offset could be added later and the ChainReader can be handled separately once all other parts are in place. Just changing frame numbers from 1 to 0-based deserves its own issue.

@richardjgowers richardjgowers added this to the 0.11 milestone Apr 21, 2015
@richardjgowers
Copy link
Member

I think all the above sounds good. Changing frame numbering to 0 based makes sense in that trajectory[0] gives frame 0.

So if we're moving time into Timestep, then ChainReader will likely need to pass some sort of time_offset to Timestep.. (another reason why having a better constructor would be useful #250)

class Timestep
    def __init__():
        self.time_offset = kwargs.pop('time_offset', 0)

    @property
    def time(self):
        return self.frame * self.dt + self.time_offset

Then it's confusing if you need a frame offset or a time offset (ie are all the dts constants across your trajectories?).. but that's a headache for another Issue.

I'm putting this into 0.11 (the big API break).

@dotsdl
Copy link
Member Author

dotsdl commented Apr 26, 2015

If the trajectory records time between frames dt and frame numbers N, it will calculate the time as N * dt.

Are there trajectory formats that do this? That is, that assume a constant dt between each frame stored?

@richardjgowers
Copy link
Member

Dcd I think
On 26 Apr 2015 08:16, "David Dotson" notifications@github.com wrote:

If the trajectory records time between frames dt and frame numbers N, it
will calculate the time as N * dt.

Are there trajectory formats that do this? That is, that assume a constant
dt between each frame stored?


Reply to this email directly or view it on GitHub
#252 (comment)
.

@orbeckst
Copy link
Member

Yes, DCD does not store time stamps and instead you need to calculate time = frame * dt where dt is stored in the DCD header.

As I haven't heard more on this, I have summarized the conclusions and we should go ahead with the following changes:

Frame numbers

Frame numbers will start at 0 (i.e. zero-based instead of the current 0.9.2 practice of being 1-based).

This will harmonize slice notation with the frame numbering.

We'll need to be sure to have the tests working properly and go through the code with a fine comb because there might be various places where 1 is substracted/added....

Handling of time in Timestep

  1. We will introduce a new optional trajectory keyword time_offset (default 0) to each Reader and the Timestep. I is used to start a trajectory at a later time. The Reader needs to pass the keyword's value to its Timestep, e.g.

    class Timestep
    def __init__():
        self.time_offset = kwargs.pop('time_offset', 0)
    
    @property
    def time(self):
        return self.frame * self.dt + self.time_offset

    Note that in the above example the time property calculates the time from the frame number and the dt (time per frame) but in other cases it might take the time directly from a time stamp in the trajectory. In any case, time_offset is always added.

  2. introduce a new optional trajectory keyword dt (default None) to Timestep so that the Timestep may compute a time as time = frame * dt (see next point).

  3. introduce universally Timestep.time (as part of the API):

    • If the trajectory records time, it will contain the current frame's time (in MDA units)
    • If the trajectory records time between frames dt and frame numbers N, it will calculate the time as N * dt.
    • If the trajectory does not contain time information then log a warning and pretend that dt = 1 and use the N * dt calculated time.
  4. deprecate Reader.time (for 1.0)

  5. update ChainReader

    • Let the ChainReader try to use the trajectory time or the calculated time.
    • But if this fails (e.g. time for step N+1 is less than time for step N) use time_offset with some heuristic guesses to appropriately glue trajectories together.

Further actions

Please open issue for relevant sub-tasks such as

  • implement time_offset
  • implement Timestep.time
  • change frame numbering from 1 to 0 based (test cases!!)
  • make the ChainReader work with the new system

Add relevant issues number to this comment or mention issue #252.
Close this issue once all sub-issues have been raised.

Updates to this comment

  • 2015-05-20: added keyword dt for Timestep (obviously needed)

@orbeckst orbeckst assigned orbeckst and dotsdl and unassigned orbeckst May 21, 2015
@orbeckst
Copy link
Member

@dotsdl – can I please leave it to you to follow-up on this issue (raise new ones and monitor implementation)? I suggest that you copy the API proposal from my last comment to the wiki, raise the sub-issues as needed and reference the new wiki page. Once the issues are raised, close this one.

Thanks,
Oliver

@dotsdl
Copy link
Member Author

dotsdl commented May 24, 2015

Yes. Setting aside tomorrow (Memorial Day) to hammer on this, and probably the related issue #250.

@orbeckst
Copy link
Member

I think I can close this issue because @dotsdl split it nicely into sub-issues #306, #307, #308, #309, and #310.

All these issues link back here so we can always come back to the discussion.

@orbeckst orbeckst changed the title Trajectory.time property: should it give 0.0 ps for first frame? Timestep API interface discussion (was: should Timestep.time start with 0.0?) Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants