# Selecting on secondary structure in MDAnalysis

## Problem

**How can I select regions with secondary structure in MDAnalysis?**

For instance, get all helical regions.

**Key Idea:** 

Create a new topology attribute `ss` and assign secondary structure (from DSSP) to this attribute.



## Layers of MDAnalysis

![MDA layers](./mda-layers.jpg)

* **Topology**: constant properties
* **Trajectory**: time-varying properties

Topology in MDAnalysis is organized in **TopologyAttributes**: [User Guide: The Topology System](https://userguide.mdanalysis.org/stable/topology_system.html)

* "arrays" of properties
* associated with a `Atom` or a container in the hierarchy (`Residue` or `Segment`)
* indexed by the atom index, residue index, or segment index

## Approach

1. compute secondary structure using DSSP
2. create a new TopologyAttribute *ss*
3. assign DSSP results to *ss*
4. select

## Packages

In [1]:
import MDAnalysis as mda
print(mda.__version__)

2.10.0


In [2]:
from MDAnalysis.analysis.dssp import DSSP

Data file

In [3]:
from MDAnalysisTests.datafiles import PDB

## Universe

In [4]:
u = mda.Universe(PDB)
u.guess_TopologyAttrs(context='default', to_guess=['elements'])

protein = u.select_atoms("protein")



## Secondary structure with DSSP

Secondary structure for initial frame (even if it's the only frame, `frames=[0]` is safe):

In [5]:
dssp = DSSP(protein).run(frames=[0])

  self.times[idx] = ts.time


Get assignment for frame 0:

In [6]:
dssp.results.dssp[0]

array(['-', '-', 'E', 'E', 'E', 'E', 'E', '-', '-', '-', '-', '-', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', '-', '-',
       'E', 'E', 'E', '-', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', '-', '-', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', '-', '-', '-', '-', '-', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', '-', '-', '-', 'H', 'H', 'H', '-',
       '-', '-', 'E', 'E', 'E', 'E', '-', '-', '-', '-', '-', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', '-', '-', '-', '-', '-',
       'E', 'E', 'E', 'E', 'E', 'E', '-', '-', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', '-', '-', 'E', 'E', '-', '-', '-', '-', '-', '-',
       '-', '-', 'E', 'E', '-', '-', '-', 'E', '-', '-', '-', '-', '-',
       '-', 'E', '-', '-', '-', '-', '-', '-', 'E', '-', '-', '-', '-',
       'H', 'H', 'H', '-', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H

## New TopologyAttr

attributes are attached to a topology â€“ see [User Guide: The Topology System](https://userguide.mdanalysis.org/stable/topology_system.html)

secondary structure is a *residue* level attribute


In [7]:
from MDAnalysis.core.topologyattrs import ResidueAttr
import numpy as np

In [8]:
# Just *defining* this class registers it with MDAnalysis
# and makes "ss" available for Universe.add_TopologyAttr()

class SecondaryStructure(ResidueAttr):
    """Per-residue secondary structure identifier.

    - H: helix (any)
    - E: sheet
    - -: other/no structure
    """
    attrname = "ss"
    singular = "ss"
    dtype = "U1"

    @staticmethod
    def _gen_initial_values(na, nr, ns):
        # initialize with "-" for each residue
        return np.full(nr, "-", dtype=SecondaryStructure.dtype)


We add it with

In [9]:
u.add_TopologyAttr("ss")

Now fill the attribute with the DSSP results

In [10]:
protein.residues.ss = dssp.results.dssp[0]

## Select all helices

In [11]:
helices = u.select_atoms("ss H")

In [12]:
helices.residues.ss

array(['H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
       'H', 'H', 'H', 'H', 'H', 'H'], dtype='<U1')

## Select beta sheets

In [13]:
sheets = u.residues[u.residues.ss == "E"].atoms