Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trajectory writer interface is inconsistent #206

Open
GoogleCodeExporter opened this issue Apr 4, 2015 · 12 comments
Open

trajectory writer interface is inconsistent #206

GoogleCodeExporter opened this issue Apr 4, 2015 · 12 comments

Comments

@GoogleCodeExporter
Copy link

The interface to our trajectory writers is inconsistent and should be fixed.

  • Not all of them accept the same options (or fail if one provides an unrecognized one), which is bad if one wants to write general code that can write out any format understood by MDAnalysis.
  • The docs for MDAnalysis.coordinates.core. writer() (the dispatch function that selects a writer based on file extension and user input) says, for instance, that "some writers will have special arguments that have to be looked up" but also that there a number of common arguments are common with specific universal meanings for all writers.
  • Specifically, "delta is length of time between two frames, in ps [1.0]". The DCDWriter, for instance, breaks this promise and instead expects delta to be the integrator timestep in AKMA units. The TRZWriter takes a clue from DCDWriter and also uses delta for integrator timestep. The Gromacs XTC/TRR writer on the other hand use delta as the time between saved frames in ps.

The Trajectory API is actually vague on what Writers have to accept. This should be tightened and synced with the docs to writer().

At a minimum

  • all writers need to treat start, stop, step, delta in the same manner
  • all writers should gracefully deal with unknown parameters, i.e. implement catch-all **kwargs

Opinions?

Original issue reported on code.google.com by orbeckst on 20 Jan 2015 at 7:02

@GoogleCodeExporter
Copy link
Author

Original comment by orbeckst on 20 Jan 2015 at 7:03

  • Added labels: Maintainability, Milestone-Release1.0, Usability

@GoogleCodeExporter
Copy link
Author

The testing for Writers is pretty haphazard right now, which is arguably how they've ended up diverging.

After the next release I can write some tests like I did for Timesteps where there's a mixin test for Writer and then each format.

Timestep tests:
https://code.google.com/p/mdanalysis/source/browse/testsuite/MDAnalysisTests/test_coordinates.py?name=develop#2778

Original comment by richardjgowers on 22 Jan 2015 at 4:00

@orbeckst orbeckst modified the milestone: 1.0 Apr 9, 2015
@orbeckst orbeckst added the API label Apr 13, 2015
@richardjgowers
Copy link
Member

So I've bumped into this a little when playing with the ncdf Writer (Issue #257). This was previously not deciding on whether to write velocities/forces on creation, but instead deciding when passed its first Timestep to write. So previously:

W = mda.Writer('out.ncdf', numatoms=100)
# at this point, W hasn't created the header of the file, and isn't commited to writing velocities

W.write(ts)
# once passed a Timestep, it looked if _velocities were present, then wrote the header with velocities and forces fields

And now:

W = mda.Writer('out.ncdf', numatoms=100, velocities=True, forces=True)

# W is now commited to writing velocities and forces

W.write(ts)  # this will now expect Velocities and Forces

Which I think gives much better control, eg you can choose to write a new trajectory without vels and forces if you want. Treatment of velocities and forces is something that needs defining in the API however.

@orbeckst
Copy link
Member

orbeckst commented May 2, 2015

  • For formats that either contain velocities forces or not I agree with @richardjgowers 's approach to decide on what to do when creating the Writer instance.
  • For formats like TRR where each Timestep can decide on its own what it wants to contain, we cannot do that. It has to depend on what is passed to the Timestep – or @mnmelo might correct me here?

Perhaps we can combine this by allowing

  • positions=True|False|None
  • velocities=True|False|None
  • forces=True|False|None

with the following meanings:

  • True: "you must provide this for each Timestep or I raise a ValueError"
  • False: "you must not provide this for each Timestep or I raise a ValueError"
  • None: "you may provide this for each Timestep and I will save it"

Writers that do not allow the per-Timestep decision should raise an exception (ValueError) on init if they get fed None instead of either True or False. The defaults would be set to something sensible...

Would this make sense?

@orbeckst orbeckst self-assigned this May 2, 2015
@mnmelo
Copy link
Member

mnmelo commented May 2, 2015

Yes, you're right @orbeckst, TRR writers can assume nothing from the start (see #162). I didn't realize earlier I this discussion this could also be a problem.

The following is a common setup for GROMACS run outputs:
Positions every x frames;
Velocities every v frames;
Forces every f frames;
Where typically v and f are the same but larger than x (and a multiple of it).

In this case there'll be frames only with positions, and others with all three properties. The TRR format only writes what's there, i.e., if there's no velocities/forces only positions get written to file; no space is reserved for the missing properties.

A somewhat more uncommon, but equally possible and valid, situation is to have all three x v f steps different and not multiple of each other. In this case there might be frames with any combination of positions velocities and forces.

If the user wants to replicate such a behavior either:
The writer is initialized with the step number for each property;
Or it must be flexible enough to let arbitrary properties be set on the fly.

The second option is also the most faithful to the format, which doesn't even require any sort of regular spacing, and even allows for frames without any positions/velocities/forces.

@richardjgowers
Copy link
Member

So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.

The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).
DCD & TRZ Writer used "dt" like this

TRR/XTC/NCDF used "delta" for time between frames.

I'm changing the "delta" writers to use "dt" instead.

richardjgowers added a commit that referenced this issue Jul 19, 2015
fixed, skip_timestep, periodic, skip and delta no longer required for
Readers (Issue #350)

All writers now refer to time between steps as "dt" (was previously
delta in XTC, TRR and NCDF Writers) (Issue #206)

Timesteps now have a default dt of 1.0 ps (and issue warning when using
this)

Timestep attribute time now tries to return data['time'] first, then dt
* frame, using the default value of 1.0 ps if necessary. (Issue #308)
@orbeckst
Copy link
Member

I think somewhere in these long discussions we decided to use "dt" for the time between frames.

delta can then be one the data variables because it's not provided by all readers.

On 19 Jul, 2015, at 06:46, Richard Gowers wrote:

So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.

The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).

DCD & TRZ Writer used "dt" like this

TRR/XTC/NCDF used "delta" for time between frames.

I'm changing the "delta" writers to use "dt" instead.

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

@richardjgowers
Copy link
Member

Yeah the problem is the Writers wanted differently named kwargs for this

@richardjgowers
Copy link
Member

Ok, I'm raising this issue from the dead, we're (@micaela-matta) trying to write a TXYZ Writer... So currently Writer.write() expects either a Timestep, AtomGroup or Universe. For ascii formats especially, sticking to a Timestep is pretty lame for data, and also there's some weird hacks for Writers which need this.

Is it OK for Writers to just throw a failure if they get a Timestep and really can't make a sensible file from one? (ie a pdb file from just a timestep will be ugly)

@kain88-de
Copy link
Member

Can't we just deprecate the Timestep? I find it cleaner to give a atomgroup or universe anyway.

@richardjgowers
Copy link
Member

@kain88-de I'd prefer that yeah. Timestep is in AtomGroup, but not vice versa

@orbeckst
Copy link
Member

ok

@lilyminium lilyminium mentioned this issue Jun 5, 2020
6 tasks
@richardjgowers richardjgowers modified the milestones: 1.0, 2.0 Jun 6, 2020
@richardjgowers richardjgowers removed this from the 2.0 milestone May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants