Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDAnalysis CG / MARTINI awareness #507

Open
tylerjereddy opened this issue Oct 23, 2015 · 9 comments
Open

MDAnalysis CG / MARTINI awareness #507

tylerjereddy opened this issue Oct 23, 2015 · 9 comments

Comments

@tylerjereddy
Copy link
Member

Just thought it would be a good idea to 'bookmark' this idea from one of the hackathon teams I was coaching a bit at the recent CECAM 2015 workshop in Germany. I hope they will follow through with a pull request at some point.

In short, there are almost certainly better ways for MDAnalysis to deal with coarse-grained (CG) simulation data (i.e., MARTINI). Of course, MDAnalysis atoms objects aren't really atoms in a CG context, but rather represent the basic fundamental units in the coordinate file (i.e., a single set of Cartesian coordinates isn't necessarily an actual atom, but can also be a representative particle). While I've gotten rather used to doing this 'mental translation' of nomenclature, and there are other things like dihedral angles that also mean something completely different if you are dealing with a CG context, there are at least some system properties that could probably be handled in a clearly 'better' way without too much effort. This is probably most clearly demonstrated with a simple example of the two main properties focused on by the team -- charge and mass:

import MDAnalysis
u = MDAnalysis.Universe('popc-1500-CG-phosphates.gro') #only contains CG PO4 particles
sel = u.select_atoms('bynum 1:10') #select first 10 phosphate particles in system 
sel.atoms.names
>>> array(['PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4', 'PO4'], 
      dtype='|S3')
sel.atoms.masses
>>> array([ 30.974,  30.974,  30.974,  30.974,  30.974,  30.974,  30.974,
        30.974,  30.974,  30.974])
sel.atoms.charges
>>> array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

MDAnalysis is inferring the masses based on the element corresponding to the first character in the atom name string--which is phosphorus here. Most CG MARTINI particles have masses of 72.0 (4 * water, based on the built-in general 4:1 mapping used in the forcefield), although some (i.e., ring particles in some amino acids) have smaller masses. The charges are not being handled at all really (should all be -1 here).

While I do agree it would be a good idea to return the correct values, I do have to concede that in this case I've never actually wanted to probe the masses or charges of CG particles, and the masses in particular don't strike me as having any immediately obvious analytical value.

The team was also considering automatic parsing of the forcefield files to reduce manual intervention (and maintenance, moving forward), and automatically inferring the type of coarse grained forcefield (as well as allowing a keyword argument specification to Universe). Those are probably quite ambitious ideas! I think @orbeckst also mentioned that some of the underlying infrastructure may be subject to reorganization at some point soon.

@mnmelo
Copy link
Member

mnmelo commented Oct 23, 2015

I agree that having a cleverer MDAnalysis in this respect might be helpful to the uninitiated, but it seems these cases can be solved by passing in the .tpr instead of the .gro for the topology. If you're coming from GROMACS, you'll have faced the same issues there.

But I don't mean to say we should stick to the same limitations as GROMACS. If we can find a way to have a (tunable) smart behavior, then I'm all for it. ('Tunable' because we need to be careful about the assumptions we make of a system from atom/residue names alone...)

@richardjgowers
Copy link
Member

I think all the guessers for mass/type are currently hard coded to expect proteins. It would be cool to make this more flexible so you could load a universe and say u = mda.Universe(..., type='martini'). This would then choose a different set of lookup tables in the back.

With Martini having a finite well defined set of particles, it should be a good place to start.

@mnmelo
Copy link
Member

mnmelo commented Oct 23, 2015

This sounds like a good way to me.

@tylerjereddy
Copy link
Member Author

Yeah, that's what they had started doing.

@orbeckst
Copy link
Member

#315 and #104 are related to a more flexible auto-detection of atom types.

Generally, it would be nice if you could easily say "select protein" or "select lipid" or "select nucleic" without having to care about your forcefield.

(Note that there are already some corner cases that are arguably bugs: if you select a protein with an amino acid such as glutamate or leucine bound as a substrate, the protein selection will add the substrate to the "protein"... perhaps we should do a fragment check for "protein" and introduce another selection such as "aminoacid" for the old behavior).)

@jbarnoud
Copy link
Contributor

I like what @richardjgowers proposes. By giving more context to the Universe we allow it to be smarter with atom and residue names. Yet, how would the Universe manage conflicts between informations read from the topology and from the given context? For instance, if I use Martini but I changed the mass of a Martini bead (for any weird reason), then the mass read from the TPR will differ from the expected Martini mass. I guess, MDAnalysis should use the one from the TPR, right?

Also, instead of something like u = mda.Universe(..., type='martini'), I would suggest something like

universe_type = mda.UniverseTypeMartini()
u = mda.Universe(..., type=universe_type)

This would allow more flexibility for the enduser. For instance, if I use a custom lipid I can add it to the list of known lipid residue by overwrite or inherit from mda.UniverseTypeMartini. Also we can have something like mda.UniverseType.from_file() to store force field context in files.

#315 and #104 suggest the use of user wide RC files. I really dislike this idea as it is extremely error prone. Indeed, the expected behaviour of MDAnalysis will change from user to user meaning a script that works for me may not work for a colleague. Worst, my RC file may do assumptions on what system I work with (like assuming I only work with Martini simulations) that are easily forgot when working on a different system. Having the context passed as an object would allow to create the instance from a file; this would fix #315 and #104, without having the issues with RC file as the files are explicitly loaded.

@richardjgowers
Copy link
Member

I think using the guess_* functions will always be a last option when parsing.

I think your idea of UniverseType will boil down to being a set of dicts for different attributes (ie mapping type to mass etc), so should be very possible. Rather than external .rcs I'd rather have a set of built in types, which are easy to hack into.. something like

mytype = mda.UniverseType.MARTINI

mytype.mass['this guy'] = 100.0

u = mda.Universe(... type=mytype)

But then these aren't really types, they're more like forcefields. If we actually included forcefields in mda it would open up a lot of cool possibilities (along with less cool headaches :D)

@jbarnoud
Copy link
Contributor

On 26/10/15 16:46, Richard Gowers wrote:

I think using the guess_* functions will always be a last option when
parsing.

I think your idea of |UniverseType| will boil down to being a set of
dicts for different attributes (ie mapping type to mass etc), so
should be very possible. Rather than external .rcs I'd rather have a
set of built in types, which are easy to hack into.. something like

|mytype = mda.UniverseType.MARTINI mytype.mass['this guy'] = 100.0 u =
mda.Universe(... type=mytype) |

But then these aren't really types, they're more like forcefields. If
we actually included forcefields in mda it would open up a lot of cool
possibilities (along with less cool headaches :D)


Reply to this email directly or view it on GitHub
#507 (comment).

Indeed, having force fields in mda requires a lot of effort to stay up
to date. But if not for force field, what do you call types?

@orbeckst
Copy link
Member

On 26 Oct, 2015, at 02:15, Jonathan Barnoud wrote:

#315 and #104 suggest the use of user wide RC files. I really dislike this idea as it is extremely error prone. Indeed, the expected behaviour of MDAnalysis will change from user to user meaning a script that works for me may not work for a colleague. Worst, my RC file may do assumptions on what system I work with (like assuming I only work with Martini simulations) that are easily forgot when working on a different system. Having the context passed as an object would allow to create the instance from a file; this would fix #315 and #104, without having the issues with RC file as the files are explicitly loaded.

You're making a good point regarding RC files. Perhaps we need a more nuanced discussion ta #315 � I think there are some settings (the stuff in core.flags) that can definitely be put into RC files (although my impression is that no-one actually changes flags and then we might just be doing a lot of work for nothing).

I still think that a user should be able to have a say in how detection works. For the H-bond analysis we are following the example that you laid out: if it's a big change: subclass [1]. It seems worthwhile to discuss, which pattern works best for the user without falling into the trap of unintentionally broken installations.

[1] https://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hbond_analysis.html#detection-of-hydrogen-bonds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants