Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Guide to new Topology format
The new (as of Dec 2015) topology format requires that topology parsers return a single Topology object, where previously they returned a list of Atom instances.
Below is an annotated version of the parse for GRO files.
Topology object contains many tables of data, called
*Groups later refer to.
Atomnames is an
Attributes facilitate access to arrays of data. For an
AtomGroup accessing the names, this simply applies
AtomGroup.indices as a mask to the array, but it also handles the
import numpy as np from ..lib.util import openany from ..core.topologyattrs import ( Resids, Resnames, Atomids, Atomnames, ) from ..core.topology import Topology from .base import TopologyReader, squash_by
All Parsers must still inherit from mda.topology.base.TopologyReader and define a parse method
class GROParser(TopologyReader) def parse(self): """Return the *Topology* object for this file""" # Gro has the following columns # resid, resname, name, index, (x,y,z) with openany(self.filename, 'r') as inf: inf.readline() n_atoms = int(inf.readline()) # Allocate arrays # This isn't necessary, appending to lists is also # possible where the number of atoms isn't known beforehand. # But ultimately all data must be stored in np.arrays resids = np.zeros(n_atoms, dtype=np.int32) # String data must use dtype=object so that any size string # can be stored. resnames = np.zeros(n_atoms, dtype=object) names = np.zeros(n_atoms, dtype=object) indices = np.zeros(n_atoms, dtype=np.int32) for i in xrange(n_atoms): line = inf.readline() resids[i] = int(line[:5]) resnames[i] = line[5:10].strip() names[i] = line[10:15].strip() indices[i] = int(line[15:20])
Whereas previously each
Atom kept a record of its
resname, all Residue properties are now stored on a per-Residue basis, meaning that only one record of Residue name is ever kept.
Considering a trivial case of 4 atoms:
- Atom 1, Resid 3, Resname 'A'
- Atom 2, Resid 3, Resname 'A'
- Atom 3, Resid 4, Resname 'B'
- Atom 4, Resid 4, Resname 'B'
squash_by function reduces this information to the following:
- Atom 1, Resindex 0
- Atom 2, Resindex 0
- Atom 3, Resindex 1
- Atom 4, Resindex 1
Represented in the array
And then a separate record of information per-Residue
- Resindex 0, Resid 3, Resname 'A'
- Resindex 1, Resid 4, Resname 'B'
Represented in the arrays
residx, new_resids, (new_resnames,) = squash_by(resids, resnames) # new_resids is len(residues) # so resindex 0 has resid new_resids atomnames = Atomnames(names) atomids = Atomids(indices) residueids = Resids(new_resids) residuenames = Resnames(new_resnames)
Topology is created by declaring its size in
passing a list of the
Attributes, and the relationship between atom indices and residue indices, and residue indices and segment indices.
top = Topology(n_atoms, len(new_resids), 0, attrs=[atomnames, atomids, residueids, residuenames], atom_resindex=residx, residue_segindex=None) return top