Guide to new Topology format

Richard Gowers edited this page Dec 8, 2015 · 1 revision

The new (as of Dec 2015) topology format requires that topology parsers return a single Topology object, where previously they returned a list of Atom instances.

Below is an annotated version of the parse for GRO files.

The Topology object contains many tables of data, called Attributes, which *Groups later refer to. For example, Atomnames is an Attribute. Attributes facilitate access to arrays of data. For an AtomGroup accessing the names, this simply applies AtomGroup.indices as a mask to the array, but it also handles the ResidueGroup and SegmentGroup cases.

import numpy as np

from ..lib.util import openany
from ..core.topologyattrs import (
from ..core.topology import Topology
from .base import TopologyReader, squash_by

All Parsers must still inherit from mda.topology.base.TopologyReader and define a parse method

class GROParser(TopologyReader)
    def parse(self):
        """Return the *Topology* object for this file"""
        # Gro has the following columns
        # resid, resname, name, index, (x,y,z)
        with openany(self.filename, 'r') as inf:
            n_atoms = int(inf.readline())

            # Allocate arrays
            # This isn't necessary, appending to lists is also
            # possible where the number of atoms isn't known beforehand.
            # But ultimately all data must be stored in np.arrays
            resids = np.zeros(n_atoms, dtype=np.int32)
            # String data must use dtype=object so that any size string
            # can be stored.
            resnames = np.zeros(n_atoms, dtype=object)
            names = np.zeros(n_atoms, dtype=object)
            indices = np.zeros(n_atoms, dtype=np.int32)

            for i in xrange(n_atoms):
                line = inf.readline()
                resids[i] = int(line[:5])
                resnames[i] = line[5:10].strip()
                names[i] = line[10:15].strip()
                indices[i] = int(line[15:20])

Whereas previously each Atom kept a record of its resname, all Residue properties are now stored on a per-Residue basis, meaning that only one record of Residue name is ever kept.

Considering a trivial case of 4 atoms:

  • Atom 1, Resid 3, Resname 'A'
  • Atom 2, Resid 3, Resname 'A'
  • Atom 3, Resid 4, Resname 'B'
  • Atom 4, Resid 4, Resname 'B'

The squash_by function reduces this information to the following:

  • Atom 1, Resindex 0
  • Atom 2, Resindex 0
  • Atom 3, Resindex 1
  • Atom 4, Resindex 1

Represented in the array residx

And then a separate record of information per-Residue

  • Resindex 0, Resid 3, Resname 'A'
  • Resindex 1, Resid 4, Resname 'B'

Represented in the arrays new_resids and new_resnames

        residx, new_resids, (new_resnames,) = squash_by(resids, resnames)

        # new_resids is len(residues)
        # so resindex 0 has resid new_resids[0]
        atomnames = Atomnames(names)
        atomids = Atomids(indices)
        residueids = Resids(new_resids)
        residuenames = Resnames(new_resnames)

Finally, the Topology is created by declaring its size in n_atoms, n_residues and n_segments, passing a list of the Attributes, and the relationship between atom indices and residue indices, and residue indices and segment indices.

        top = Topology(n_atoms, len(new_resids), 0,
                       attrs=[atomnames, atomids, residueids, residuenames],

        return top

Project Information

GNU GPL v2 code license
Labels: python, molecular dynamics, analysis, DCD, CHARMM, LAMMPS, NAMD, Gromacs, computer simulation, atoms, coordinates, trajectory, XTC, Library, object-oriented
Core Developers


Release Notes
Guide for Developers
Google Summer of Code

Code of Conduct


@mdanalysis on Twitter
Downloads (PyPi)
Mailing Lists:
User discussion group
Developer mailing list

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.