Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .write to trajectory reader #1037

Closed
kain88-de opened this issue Oct 19, 2016 · 12 comments
Closed

Add .write to trajectory reader #1037

kain88-de opened this issue Oct 19, 2016 · 12 comments

Comments

@kain88-de
Copy link
Member

kain88-de commented Oct 19, 2016

Like the ag.write it would be convenient to have a trajectory.write method. This method would always write the complete trajectory with all atoms. To write only a subset the more general Writer still has to be used. This makes conversion of trajectory formats or concatenating trajectories very easy.

# convert
mda.Universe(PDB).trajectory.write('traj_in_dcd.dcd')
# concatenate
mda.Universe(PSF, (DCD1, DCD2, DCD3)).write('concate.dcd')

It is also a good fit to write a MemoryReader back to disk.

Proposed API

All trajectory readers get a new method Reader.write() that allows one to write the trajectory to disk (possibly in a different format):

u.trajectory.write("new.dcd")

The semantics should evoke the idea that a trajectory is considered the object to be written.

Optional slicing kwargs

kwargs to determine which frames to write

  • start
  • stop
  • step

To be used as

u.trajectory.write("new.dcd", start=5000, step=1000)

Optional atoms kwarg

What to do when a writer needs more information than available to the Reader #1037 (comment) ? Add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

  1. If the writer has all information, write the trajectory; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
  2. If the writer needs additional information and...
    • no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
    • atoms is provided: write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea #1037 (comment) .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

Basic implementation

Implementation could look similar to the following (without all the detailed error checking and checking for atoms and no ProgressMeter):

class BaseReader(...):
    ...
    def write(self, filename, atoms=None, start=None, stop=None, step=None):
        assert self == atoms.universe.trajectory
        with mda.Writer(filename, n_atoms=atoms.n_atoms) as W:
            for ts in self[start:stop:step]:
                 W.write(atoms)

History

@richardjgowers
Copy link
Member

The headache I can see with this is for trajectory writers that require more than the Reader contains, (eg PDBWriter will want resnames, DCDReader doesn't have these). To make this work (with identical API) for all cases, Universe should be passed around as the 'thing' to be written. I'm fairly certain that Readers don't have a link to the Universe currently, so this will need to be added (in a way that doesn't cause circular reference problems with deleting).

@kain88-de
Copy link
Member Author

Well we would also write a more generic write/copy function in the MDAnalysis namespace that wraps MDAnalysis.Writer

def write(ag, fname):
    with mda.Writer(fname, n_atoms=ag.n_atoms) as w:
        for ts in ag.universe.trajectory:
            w.write(ag)


# call it with
mda.write(mda.Universe(PSF, DCD), 'test.pdb))

That is easier to implement and would allow to write sub selections.

@kain88-de
Copy link
Member Author

To be more specific and avoid confusions we could also name the generic function write_trajectory.

@orbeckst
Copy link
Member

mda.write() seems like a cheap and rather useful function.

I still like the concise semantics of u.trajectory.write("new.xtc"). I haven't really thought it through but my feeling is we should avoid adding the Universe to the Reader. We could start by just raising an error if we have insufficient information and then tell the user to use MDAnalysis.write(universe, "new.xtc") instead. That would still be pretty clean.

@kain88-de
Copy link
Member Author

We could start by just raising an error if we have insufficient information

The writers will already complain them self if a TimeStep object isn't enough with an exception. So that would just work.

@richardjgowers
Copy link
Member

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop

@orbeckst
Copy link
Member

The way that people will commonly access this method as is u.trajectory.write(). It really feels redundant to me to repeat "trajectory" in the method name again as u.trajectory.write_trajectory(). I think here we can take advantage of the OO design in that it also provides semantic clarity.

(And in any case, we are already overloading AtomGroup.write() to either write coordinates or selections.)

Oliver Beckstein
email: orbeckst@gmail.com

Am Oct 20, 2016 um 3:15 schrieb Richard Gowers notifications@github.com:

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@sseyler
Copy link
Contributor

sseyler commented Oct 20, 2016

I like u.trajectory.write("new.xtc") and agree with @orbeckst in that it's semantically clear and succinct. For a trajectory writing function in the MDA namespace, MDAnalysis.write_trajectory(universe, "new.xtc") makes sense to me.

@orbeckst
Copy link
Member

orbeckst commented Oct 26, 2016

Optional atoms kwarg

@richardjgowers asked what to do when a writer needs more information than available to the Reader #1037 (comment). I suggest to add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

  1. If the writer has all information, write the trajectory and ignore atoms; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
  2. If the writer needs additional information and...
    • no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
    • atoms is provided (and atoms.positions == Reader.ts.positions): write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea #1037 (comment) .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

EDIT: Changed behavior to use atoms for subselections if provided.

@orbeckst
Copy link
Member

orbeckst commented Dec 8, 2016

I updated the original comment with the specs. Hopefully that makes it easier to implement.

Differences to what @kain88-de originally proposed

  • allow subselections with atoms kwarg (we need it anyway in many cases)
  • added start, stop, step kwargs
  • only implement Reader.write() and not Universe.write() because the latter is ambiguous, especially in the context of serializing whole universes Persistent topologies in JSON #643.

@orbeckst
Copy link
Member

orbeckst commented Dec 8, 2016

I am not sure if the following is a good idea but I'll throw it out nevertheless:

u.trajectory[start:stop:step].write("new.dcd")

It would mean overloading the Generator object. Probably not a very pythonic thing to do.

@mnmelo
Copy link
Member

mnmelo commented Dec 9, 2016

Conceptually, the simplicity and readability of that slice construct seem quite pythonic to me. I really like it!
Then again, I'm not very shy when it comes to hacking the python object model...

jbarnoud added a commit that referenced this issue Jun 12, 2018
This is a proof of concept that replaces the iterators returned by the
proto reader __getitem__ by an iterable class. That class has a __len__
method to adress #1894 and can have a write method to adress #1037.

In the current implementation, each type of input for the proto reader's
__getitem__ returns a different class. This allow to test for the
correct behavior only once. Ideally, the iterables should be imutable to
avoid inconsistencies.
jbarnoud added a commit that referenced this issue Jul 3, 2018
This is a proof of concept that replaces the iterators returned by the
proto reader __getitem__ by an iterable class. That class has a __len__
method to adress #1894 and can have a write method to adress #1037.

In the current implementation, each type of input for the proto reader's
__getitem__ returns a different class. This allow to test for the
correct behavior only once. Ideally, the iterables should be imutable to
avoid inconsistencies.
jbarnoud added a commit that referenced this issue Jul 7, 2018
See #1037

Not having access to the topology, the method cannot write PDB files or
similar formats.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants