Add `.write` to trajectory reader #1037

kain88-de · 2016-10-19T10:09:48Z

Like the ag.write it would be convenient to have a trajectory.write method. This method would always write the complete trajectory with all atoms. To write only a subset the more general Writer still has to be used. This makes conversion of trajectory formats or concatenating trajectories very easy.

# convert
mda.Universe(PDB).trajectory.write('traj_in_dcd.dcd')
# concatenate
mda.Universe(PSF, (DCD1, DCD2, DCD3)).write('concate.dcd')

It is also a good fit to write a MemoryReader back to disk.

Proposed API

All trajectory readers get a new method Reader.write() that allows one to write the trajectory to disk (possibly in a different format):

u.trajectory.write("new.dcd")

The semantics should evoke the idea that a trajectory is considered the object to be written.

Optional slicing kwargs

kwargs to determine which frames to write

start
stop
step

To be used as

u.trajectory.write("new.dcd", start=5000, step=1000)

Optional `atoms` kwarg

What to do when a writer needs more information than available to the Reader #1037 (comment) ? Add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

If the writer has all information, write the trajectory; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
If the writer needs additional information and...
- no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
- atoms is provided: write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea #1037 (comment) .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

Basic implementation

Implementation could look similar to the following (without all the detailed error checking and checking for atoms and no ProgressMeter):

class BaseReader(...):
    ...
    def write(self, filename, atoms=None, start=None, stop=None, step=None):
        assert self == atoms.universe.trajectory
        with mda.Writer(filename, n_atoms=atoms.n_atoms) as W:
            for ts in self[start:stop:step]:
                 W.write(atoms)

History

2016-12-08 @orbeckst added proposed API from Add .write to trajectory reader #1037 (comment) and added slicing kwargs

The text was updated successfully, but these errors were encountered:

richardjgowers · 2016-10-19T10:15:12Z

The headache I can see with this is for trajectory writers that require more than the Reader contains, (eg PDBWriter will want resnames, DCDReader doesn't have these). To make this work (with identical API) for all cases, Universe should be passed around as the 'thing' to be written. I'm fairly certain that Readers don't have a link to the Universe currently, so this will need to be added (in a way that doesn't cause circular reference problems with deleting).

kain88-de · 2016-10-19T11:10:41Z

Well we would also write a more generic write/copy function in the MDAnalysis namespace that wraps MDAnalysis.Writer

def write(ag, fname):
    with mda.Writer(fname, n_atoms=ag.n_atoms) as w:
        for ts in ag.universe.trajectory:
            w.write(ag)


# call it with
mda.write(mda.Universe(PSF, DCD), 'test.pdb))

That is easier to implement and would allow to write sub selections.

kain88-de · 2016-10-19T13:08:27Z

To be more specific and avoid confusions we could also name the generic function write_trajectory.

orbeckst · 2016-10-19T23:45:03Z

mda.write() seems like a cheap and rather useful function.

I still like the concise semantics of u.trajectory.write("new.xtc"). I haven't really thought it through but my feeling is we should avoid adding the Universe to the Reader. We could start by just raising an error if we have insufficient information and then tell the user to use MDAnalysis.write(universe, "new.xtc") instead. That would still be pretty clean.

kain88-de · 2016-10-20T10:07:32Z

We could start by just raising an error if we have insufficient information

The writers will already complain them self if a TimeStep object isn't enough with an exception. So that would just work.

richardjgowers · 2016-10-20T10:15:07Z

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop

orbeckst · 2016-10-20T16:04:45Z

The way that people will commonly access this method as is u.trajectory.write(). It really feels redundant to me to repeat "trajectory" in the method name again as u.trajectory.write_trajectory(). I think here we can take advantage of the OO design in that it also provides semantic clarity.

(And in any case, we are already overloading AtomGroup.write() to either write coordinates or selections.)

Oliver Beckstein
email: orbeckst@gmail.com

Am Oct 20, 2016 um 3:15 schrieb Richard Gowers notifications@github.com:

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

sseyler · 2016-10-20T16:32:16Z

I like u.trajectory.write("new.xtc") and agree with @orbeckst in that it's semantically clear and succinct. For a trajectory writing function in the MDA namespace, MDAnalysis.write_trajectory(universe, "new.xtc") makes sense to me.

orbeckst · 2016-10-26T00:37:26Z

Optional `atoms` kwarg

@richardjgowers asked what to do when a writer needs more information than available to the Reader #1037 (comment). I suggest to add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

If the writer has all information, write the trajectory ~~and ignore atoms~~; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
If the writer needs additional information and...
- no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
- atoms is provided ~~(and atoms.positions == Reader.ts.positions)~~: write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea #1037 (comment) .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

EDIT: Changed behavior to use atoms for subselections if provided.

orbeckst · 2016-12-08T22:31:40Z

I updated the original comment with the specs. Hopefully that makes it easier to implement.

Differences to what @kain88-de originally proposed

allow subselections with atoms kwarg (we need it anyway in many cases)
added start, stop, step kwargs
only implement Reader.write() and not Universe.write() because the latter is ambiguous, especially in the context of serializing whole universes Persistent topologies in JSON #643.

orbeckst · 2016-12-08T22:49:21Z

I am not sure if the following is a good idea but I'll throw it out nevertheless:

u.trajectory[start:stop:step].write("new.dcd")

It would mean overloading the Generator object. Probably not a very pythonic thing to do.

mnmelo · 2016-12-09T11:49:54Z

Conceptually, the simplicity and readability of that slice construct seem quite pythonic to me. I really like it!
Then again, I'm not very shy when it comes to hacking the python object model...

This is a proof of concept that replaces the iterators returned by the proto reader __getitem__ by an iterable class. That class has a __len__ method to adress #1894 and can have a write method to adress #1037. In the current implementation, each type of input for the proto reader's __getitem__ returns a different class. This allow to test for the correct behavior only once. Ideally, the iterables should be imutable to avoid inconsistencies.

See #1037 Not having access to the topology, the method cannot write PDB files or similar formats.

Fixes #1037

kain88-de added enhancement Component-Readers Component-Writers labels Oct 19, 2016

orbeckst added the GSOC Starter label Oct 20, 2016

orbeckst added the API label Oct 26, 2016

orbeckst mentioned this issue Oct 26, 2016

[Feature Request] Write a trajectory from an AtomGroup and an Array #915

Open

kain88-de mentioned this issue Oct 28, 2016

Should we allow adding additional information to a MemoryReader #1057

Open

orbeckst mentioned this issue May 14, 2018

trajectory slice operator doesn't return generator with a len function. #1894

Closed

jbarnoud mentioned this issue Jun 12, 2018

Proof of concept trajectory class iterators #1934

Merged

4 tasks

jbarnoud added a commit that referenced this issue Jul 7, 2018

Partially functional write for frame iterable

1e8c11b

See #1037 Not having access to the topology, the method cannot write PDB files or similar formats.

jbarnoud added a commit that referenced this issue Jul 8, 2018

Allow ag.write to write trajectory and selected frames

0bb2bac

Fixes #1037

richardjgowers closed this as completed in 1ef8010 Jul 12, 2018

richardjgowers assigned jbarnoud Jul 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `.write` to trajectory reader #1037

Add `.write` to trajectory reader #1037

kain88-de commented Oct 19, 2016 •

edited by orbeckst

Loading

richardjgowers commented Oct 19, 2016

kain88-de commented Oct 19, 2016

kain88-de commented Oct 19, 2016

orbeckst commented Oct 19, 2016

kain88-de commented Oct 20, 2016

richardjgowers commented Oct 20, 2016

orbeckst commented Oct 20, 2016

sseyler commented Oct 20, 2016

orbeckst commented Oct 26, 2016 •

edited

Loading

orbeckst commented Dec 8, 2016

orbeckst commented Dec 8, 2016

mnmelo commented Dec 9, 2016

Add .write to trajectory reader #1037

Add .write to trajectory reader #1037

Comments

kain88-de commented Oct 19, 2016 • edited by orbeckst Loading

Proposed API

Optional slicing kwargs

Optional atoms kwarg

Progress meter

Basic implementation

History

richardjgowers commented Oct 19, 2016

kain88-de commented Oct 19, 2016

kain88-de commented Oct 19, 2016

orbeckst commented Oct 19, 2016

kain88-de commented Oct 20, 2016

richardjgowers commented Oct 20, 2016

orbeckst commented Oct 20, 2016

sseyler commented Oct 20, 2016

orbeckst commented Oct 26, 2016 • edited Loading

Optional atoms kwarg

Progress meter

orbeckst commented Dec 8, 2016

orbeckst commented Dec 8, 2016

mnmelo commented Dec 9, 2016

Add `.write` to trajectory reader #1037

Add `.write` to trajectory reader #1037

kain88-de commented Oct 19, 2016 •

edited by orbeckst

Loading

Optional `atoms` kwarg

orbeckst commented Oct 26, 2016 •

edited

Loading

Optional `atoms` kwarg