-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trajectory reader Timestep objects give inconsistent interfaces #250
Comments
I like the idea of setting this in stone too. The docs for trajectory readers set out a strict format, so should Timestep. Stuff related to this already: |
I agree, this needs to be sorted put, especially with a view to API-freeze in 1.0. Technically, the Trajectory API already contains an API for Timestep but it's simply not good ;-p. Can you guys (@dotsdl , @richardjgowers ) start drafting the API, starting with a discussion here? — Comments from everyone welcome, of course! |
More than happy to hammer this out. Will work on a draft API given what we currently see across our existing readers. |
So if we're changing the Timestep API, I guess I can list all the things about it that annoy me: The init constructor is confusing, you can init one with either
It would make more sense to choose a single one of these methods (integer probably), and add the others as classmethod constructors... ie new_ts = Timestep.from_timestep(old_ts) Should we add the ability to have velocities & forces to base.Timestep? This would make it a lot more reusable in other classes.. it would be nice if subclasses of Timestep could use most of the base constructor and look like: class OtherTimestep(Timestep)
def __init__(self, arg, **kwargs):
super(self, Timestep).__init__(arg, **kwargs)
self.my_specific_thing = 1.0
self.my_specific_thing2 = 2.0 |
I second Richard's call for less overloading of the constructor and for a broader Base Timestep. Actually, should we let Base have all possible attributes (not just velocities and forces)? That would definitely help abstract the Reader. |
I think there's lots of stuff that won't be common to all Timesteps, eg TRZTimestep has .temperature, but it's confusing/misleading if this is always left as 0.0 for every other format. Ideally the constructor for base.Timestep should read exactly like the API specifications. |
+1 for @richardjgowers 's proposal (in the spirit of PEP 20 "There should be one-- and preferably only one --obvious way to do it.") I don't like the assignment to an underscored variable as the official way to get coordinates into a ts = Timestep(len(coords))
ts._positions = coords but rather have an explicit setter or make positions a first class citizen (probably through a managed attribute) so that we can do ts = Timestep(len(coords))
ts.positions = coords
ts.velocities = v # some magic has to happen to allocate the arrays...
ts.forces = f # ... here to0 or use optional kwargs ts = Timestep(len(coords), positions=coords, velocities=v, forces=f) I also wouldn't clutter The API should really contain a sensible minimal subset of all possible features and user code will have to deal gracefully with information that is not available – that's much safer and cleaner than us trying to guess defaults for everything (see PEP 20 "In the face of ambiguity, refuse the temptation to guess."). We should, however, be consistent how this missing data is communicated. |
Also solves adding velocities/forces to Timesteps without them (Issue #213)
Ok I've pushed a branch which has a look at what the new Timestep constructor could look like. Feedback very welcome! Highlights are:
Hopefully this will make subclasses of base.Timestep a lot smaller, so the enforcing API should be easier. @orbeckst I played with the idea of having a ._data attribute which was a dict, then overrode getattr setattr to use this. It could work but I think it got confusing very fast. The class structure of a Python object is already a dict, so if we put a dict holding all the observables inside the Timestep class, then it's confusing because we've got a nested dict. Ie do I access Timestep.data['observable'] or Timestep.observable? This does mean that "users" of Timesteps will have to do some error checking on what is passed to them. Eg: def write_something(ts):
try:
pressure = ts.pressure
temperature = ts.temperature
except AttributeError:
raise NoDataError("The Timestep you gave me lacked info") |
On 22 Apr, 2015, at 03:45, Richard Gowers wrote:
My thinking went along the lines that Timestep.data would be an empty dict-like object in base.Timestep and the API will define that anything in data is optional. Anything in Timestep itself is mandatory. This data object should be something that allows attrib or keyword access interchangeably:
or Maybe it gets too confusing but it would make the API definition straightforward. |
Ahh, yeah it does make the namespace a bit cleaner. I've put in a .data dict that stores anything non essential. I've then also made __ getattr__ look in the data dict if the attribute isn't found in the class namespace. So the namespace is kind of like Timestep
So Timestep.temperature will work because it looks in Timestep.data['temperature'] automatically. So still todo:
@property
def positions(self):
if self.frame == self._pos_source:
return self._positions
else:
raise NoDataError
@positions.setter
def positions(self, new):
self._positions[:] = new
self._pos_source = self.frame |
Here is the API spec I propose we adopt, given the discussion here and a review of the existing docs. I agree that the guiding principle should be keeping it as minimal as possible.
This puts even relatively common elements such as One (minor) exception is that some trajectories may not store actual time information, in which case I think it appropriate to yield Also, I think these basic elements ( |
I think the problem with putting vels & forces into data is that they then can't (easily) be managed properties. eg what I've suggested for them f8e2099#diff-99815fce73651366394eb2dd8fe2f872R227 With .time, it might be better to raise NoDataError, but I'm not 100% on that either. It would be really annoying to do something like [ts.time for ts in u.trajectory] and get [None, None, None, ....] Otherwise I agree with all the above, if you want you can work off the branch I started which has the new constructors? |
I'm not sold on it, either. I think |
Ah...I see your point. However, elements in |
I'd say that would work but would be overcomplicated, but I'll let someone else weigh in. I just noticed you've put _unitcell in data too, all my above complaints apply to that as well, (it's very common and needs special handling) |
There are a number of common and high-profile data structures in Updated
|
|
Removed obsolete Timestep class Changed PDBReader to subclass SingleFrameReader Cleaned up multiframe Readers Moved Reader API tests into own file.
(defines size of Timestep) base.Timestep can now have velocities and forces through keywords 'velocities' and 'forces' [False by default] Added alternate constructors to Timestep from_timestep(other_timestep) - Allows construction of a Timestep from another Timestep of different type from_coordinates(pos, velocities=None, forces=None) - Allows construction from set of existing coordinates All other Timesteps updated to reflect new capabilities of base.Timestep Still TODO: Docs! Travis check Sanity check!
@dotsdl and @richardjgowers : is one of you working on this one? I updated the updated API specs in my comment above. "All" (cough, cough) that needs to be done is to implement it... Or should this wait until @dotsdl 's Timestep hacking is done? |
I'm a bit bottlenecked at the moment (preparing some last minute things for MDSynthesis for SciPyConf). I'm happy to do it, but it won't be this week I'm afraid. |
I don't mind taking this on |
Cheers! |
Readers now always always return ts.frame = 0 on first frame. Fixed INPCRD and DLPoly Config giving -1 as first frame (should be 0)
|
In general, moved to relative import style across coordinates modules Timesteps now read their native frame into "_frame". This replaces what was "step" in some Readers. Timestep.data dictionary added for formats with nonstandard data for their Timestep. Writing a Timestep with XTC/TRR Writer no longer modifies the Timestep passed
In general, moved to relative import style across coordinates modules Timesteps now read their native frame into "_frame". This replaces what was "step" in some Readers. Timestep.data dictionary added for formats with nonstandard data for their Timestep. Writing a Timestep with XTC/TRR Writer no longer modifies the Timestep passed
All timesteps now use has_positions, has_velocities and has_forces flags. TRR uses these flags to manage when data is missing
I'm going to roll #213 into this too |
Good idea. |
@richardjgowers close the issue once you have added the docs. Many thanks!!! |
The interfaces for
Timestep
objects for the various trajectory readers are inconsistent, making it difficult to use them in a format-agnostic way. As a specific example, thecoordinates.xdrfile.core.Timestep
object used for XTC and TRR files features atime
attribute giving the time of the current frame, while thecoordinates.DCD.Timestep
object used for DCDs doesn't have one at all.If some trajectory formats are more information rich than others, then it makes sense for accessors to vary across them. However, we should probably hammer out what accessors should be present for all formats, and clearly indicate which are not universal.
The text was updated successfully, but these errors were encountered: