MDAnalysis CG / MARTINI awareness #507

tylerjereddy · 2015-10-23T11:22:23Z

Just thought it would be a good idea to 'bookmark' this idea from one of the hackathon teams I was coaching a bit at the recent CECAM 2015 workshop in Germany. I hope they will follow through with a pull request at some point.

In short, there are almost certainly better ways for MDAnalysis to deal with coarse-grained (CG) simulation data (i.e., MARTINI). Of course, MDAnalysis atoms objects aren't really atoms in a CG context, but rather represent the basic fundamental units in the coordinate file (i.e., a single set of Cartesian coordinates isn't necessarily an actual atom, but can also be a representative particle). While I've gotten rather used to doing this 'mental translation' of nomenclature, and there are other things like dihedral angles that also mean something completely different if you are dealing with a CG context, there are at least some system properties that could probably be handled in a clearly 'better' way without too much effort. This is probably most clearly demonstrated with a simple example of the two main properties focused on by the team -- charge and mass:

import MDAnalysis
u = MDAnalysis.Universe('popc-1500-CG-phosphates.gro') #only contains CG PO4 particles
sel = u.select_atoms('bynum 1:10') #select first 10 phosphate particles in system 
sel.atoms.names
>>> array(['PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4'], 
      dtype='|S3')
sel.atoms.masses
>>> array([ 30.974,  30.974,  30.974,  30.974,  30.974,  30.974,  30.974,
        30.974,  30.974,  30.974])
sel.atoms.charges
>>> array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

MDAnalysis is inferring the masses based on the element corresponding to the first character in the atom name string--which is phosphorus here. Most CG MARTINI particles have masses of 72.0 (4 * water, based on the built-in general 4:1 mapping used in the forcefield), although some (i.e., ring particles in some amino acids) have smaller masses. The charges are not being handled at all really (should all be -1 here).

While I do agree it would be a good idea to return the correct values, I do have to concede that in this case I've never actually wanted to probe the masses or charges of CG particles, and the masses in particular don't strike me as having any immediately obvious analytical value.

The team was also considering automatic parsing of the forcefield files to reduce manual intervention (and maintenance, moving forward), and automatically inferring the type of coarse grained forcefield (as well as allowing a keyword argument specification to Universe). Those are probably quite ambitious ideas! I think @orbeckst also mentioned that some of the underlying infrastructure may be subject to reorganization at some point soon.

The text was updated successfully, but these errors were encountered:

mnmelo · 2015-10-23T11:28:31Z

I agree that having a cleverer MDAnalysis in this respect might be helpful to the uninitiated, but it seems these cases can be solved by passing in the .tpr instead of the .gro for the topology. If you're coming from GROMACS, you'll have faced the same issues there.

But I don't mean to say we should stick to the same limitations as GROMACS. If we can find a way to have a (tunable) smart behavior, then I'm all for it. ('Tunable' because we need to be careful about the assumptions we make of a system from atom/residue names alone...)

richardjgowers · 2015-10-23T15:25:04Z

I think all the guessers for mass/type are currently hard coded to expect proteins. It would be cool to make this more flexible so you could load a universe and say u = mda.Universe(..., type='martini'). This would then choose a different set of lookup tables in the back.

With Martini having a finite well defined set of particles, it should be a good place to start.

mnmelo · 2015-10-23T16:35:57Z

This sounds like a good way to me.

tylerjereddy · 2015-10-23T16:54:51Z

Yeah, that's what they had started doing.

orbeckst · 2015-10-24T01:03:50Z

#315 and #104 are related to a more flexible auto-detection of atom types.

Generally, it would be nice if you could easily say "select protein" or "select lipid" or "select nucleic" without having to care about your forcefield.

(Note that there are already some corner cases that are arguably bugs: if you select a protein with an amino acid such as glutamate or leucine bound as a substrate, the protein selection will add the substrate to the "protein"... perhaps we should do a fragment check for "protein" and introduce another selection such as "aminoacid" for the old behavior).)

jbarnoud · 2015-10-26T09:15:49Z

I like what @richardjgowers proposes. By giving more context to the Universe we allow it to be smarter with atom and residue names. Yet, how would the Universe manage conflicts between informations read from the topology and from the given context? For instance, if I use Martini but I changed the mass of a Martini bead (for any weird reason), then the mass read from the TPR will differ from the expected Martini mass. I guess, MDAnalysis should use the one from the TPR, right?

Also, instead of something like u = mda.Universe(..., type='martini'), I would suggest something like

universe_type = mda.UniverseTypeMartini()
u = mda.Universe(..., type=universe_type)

This would allow more flexibility for the enduser. For instance, if I use a custom lipid I can add it to the list of known lipid residue by overwrite or inherit from mda.UniverseTypeMartini. Also we can have something like mda.UniverseType.from_file() to store force field context in files.

#315 and #104 suggest the use of user wide RC files. I really dislike this idea as it is extremely error prone. Indeed, the expected behaviour of MDAnalysis will change from user to user meaning a script that works for me may not work for a colleague. Worst, my RC file may do assumptions on what system I work with (like assuming I only work with Martini simulations) that are easily forgot when working on a different system. Having the context passed as an object would allow to create the instance from a file; this would fix #315 and #104, without having the issues with RC file as the files are explicitly loaded.

richardjgowers · 2015-10-26T15:46:16Z

I think using the guess_* functions will always be a last option when parsing.

I think your idea of UniverseType will boil down to being a set of dicts for different attributes (ie mapping type to mass etc), so should be very possible. Rather than external .rcs I'd rather have a set of built in types, which are easy to hack into.. something like

mytype = mda.UniverseType.MARTINI

mytype.mass['this guy'] = 100.0

u = mda.Universe(... type=mytype)

But then these aren't really types, they're more like forcefields. If we actually included forcefields in mda it would open up a lot of cool possibilities (along with less cool headaches :D)

jbarnoud · 2015-10-26T15:53:52Z

On 26/10/15 16:46, Richard Gowers wrote:

I think using the guess_* functions will always be a last option when
parsing.

I think your idea of |UniverseType| will boil down to being a set of
dicts for different attributes (ie mapping type to mass etc), so
should be very possible. Rather than external .rcs I'd rather have a
set of built in types, which are easy to hack into.. something like

|mytype = mda.UniverseType.MARTINI mytype.mass['this guy'] = 100.0 u =
mda.Universe(... type=mytype) |

But then these aren't really types, they're more like forcefields. If
we actually included forcefields in mda it would open up a lot of cool
possibilities (along with less cool headaches :D)

—
Reply to this email directly or view it on GitHub
#507 (comment).

Indeed, having force fields in mda requires a lot of effort to stay up
to date. But if not for force field, what do you call types?

orbeckst · 2015-10-26T17:44:16Z

On 26 Oct, 2015, at 02:15, Jonathan Barnoud wrote:

#315 and #104 suggest the use of user wide RC files. I really dislike this idea as it is extremely error prone. Indeed, the expected behaviour of MDAnalysis will change from user to user meaning a script that works for me may not work for a colleague. Worst, my RC file may do assumptions on what system I work with (like assuming I only work with Martini simulations) that are easily forgot when working on a different system. Having the context passed as an object would allow to create the instance from a file; this would fix #315 and #104, without having the issues with RC file as the files are explicitly loaded.

You're making a good point regarding RC files. Perhaps we need a more nuanced discussion ta #315 � I think there are some settings (the stuff in core.flags) that can definitely be put into RC files (although my impression is that no-one actually changes flags and then we might just be doing a lot of work for nothing).

I still think that a user should be able to have a say in how detection works. For the H-bond analysis we are following the example that you laid out: if it's a big change: subclass [1]. It seems worthwhile to discuss, which pattern works best for the user without falling into the trap of unintentionally broken installations.

[1] https://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hbond_analysis.html#detection-of-hydrogen-bonds

tylerjereddy added enhancement usability Component-Readers hackathon coarse graining labels Oct 23, 2015

orbeckst added the Component-Configuration label Oct 24, 2015

jbarnoud mentioned this issue Dec 15, 2015

What should be a bond in Gromacs topologies? #588

Open

orbeckst added the Component-Topology label Dec 15, 2015

jbarnoud mentioned this issue Aug 17, 2016

How to use information in topology files #942

Closed

richardjgowers removed the hackathon label Nov 28, 2017

orbeckst mentioned this issue Jul 30, 2018

Feature: AtomGroupGroup #1861

Open

lilyminium mentioned this issue Mar 16, 2020

Add contexts to guess attributes better (especially elements and masses) #2630

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDAnalysis CG / MARTINI awareness #507

MDAnalysis CG / MARTINI awareness #507

tylerjereddy commented Oct 23, 2015

mnmelo commented Oct 23, 2015

richardjgowers commented Oct 23, 2015

mnmelo commented Oct 23, 2015

tylerjereddy commented Oct 23, 2015

orbeckst commented Oct 24, 2015

jbarnoud commented Oct 26, 2015

richardjgowers commented Oct 26, 2015

jbarnoud commented Oct 26, 2015

orbeckst commented Oct 26, 2015

MDAnalysis CG / MARTINI awareness #507

MDAnalysis CG / MARTINI awareness #507

Comments

tylerjereddy commented Oct 23, 2015

mnmelo commented Oct 23, 2015

richardjgowers commented Oct 23, 2015

mnmelo commented Oct 23, 2015

tylerjereddy commented Oct 23, 2015

orbeckst commented Oct 24, 2015

jbarnoud commented Oct 26, 2015

richardjgowers commented Oct 26, 2015

jbarnoud commented Oct 26, 2015

orbeckst commented Oct 26, 2015