# Introduction to MDAnalysis

In this notebook we introduce various features of MDAnalysis.

You can follow along or work through it in your own time later. We will explain most of what is shown here in more detail in the following tutorials.

## Background: Adenylate kinase
As an example we will analyze the enzyme *adenylate kinase*. It catalyzes the reaction ATP + AMP $\rightleftharpoons$ 2 ADP. It undergoes a *conformational transition* between a closed conformation (e.g. PDB code 1AKE) and an open conformation (e.g. 4AKE) [1], even in the absence of substrate.

Sampling large conformational is challenging with standard equilibrium MD. Therefore we used an enhanced sampling method ("dynamic importance sampling", DIMS) to generate transitions between closed and open apo AdK [2, 3] in addition to "brute force" equilibrium MD (on PSC Anton).



1. S. L. Seyler and O. Beckstein. Sampling of large conformational transitions: Adenylate kinase as a testing ground. Molec. Simul., 40(10–11):855–877, 2014. doi: [10.1080/08927022.2014.919497](https://doi.org/10.1080/08927022.2014.919497)
2. O. Beckstein, E. J. Denning, J. R. Perilla, and T. B. Woolf. Zipping and unzipping of adenylate kinase: Atomistic insights into the ensemble of open ↔ closed transitions. 394(1):160–176, 2009. doi: [10.1016/j.jmb.2009.09.009](https://doi.org/10.1016/j.jmb.2009.09.009)
3. S. L. Seyler, A. Kumar, M. F. Thorpe, and O. Beckstein. Path similarity analysis: A method for quantifying macromolecular pathways. PLoS Comput Biol, 11(10):e1004568, 10 2015. doi: [10.1371/journal.pcbi.1004568](https://doi.org/10.1371/journal.pcbi.1004568)

## Setup 

### Load packages 

In [15]:
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt
import nglview as nv

In [14]:
print(mda.__version__)

0.19.2


### Load data

* AdK equilibrium trajectory: `adk` (from `adk = fetch_adk_equilibrium()`)
* transition between closed and open AdK from DIMS MD: `PSF`, `DCD`

In [4]:
from MDAnalysisData.datasets import fetch_adk_equilibrium
from MDAnalysis.tests.datafiles import PSF, DCD

adk = fetch_adk_equilibrium()

## PDB structures 

#### Closed conformation

Load from PDB (or fall back to local copy of the PDB file).

In [27]:
try:
    uc = mda.fetch_mmtf('1AKE')
except IOError:
    uc = mda.Universe("../data/1ake.pdb")

In [32]:
uc

<Universe with 3816 atoms>

In [33]:
nv.show_mdanalysis(uc)

NGLWidget()

In [34]:
pclosed = uc.select_atoms("protein and segid A")
pclosed

<AtomGroup with 1661 atoms>

In [35]:
nv.show_mdanalysis(pclosed)

NGLWidget()

In [38]:
print(pclosed[:10])

<AtomGroup [<Atom 1: N of type N of resname MET, resid 1 and segid A and altLoc >, <Atom 2: CA of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 3: C of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 4: O of type O of resname MET, resid 1 and segid A and altLoc >, <Atom 5: CB of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 6: CG of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 7: SD of type S of resname MET, resid 1 and segid A and altLoc >, <Atom 8: CE of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 9: N of type N of resname ARG, resid 2 and segid A and altLoc >, <Atom 10: CA of type C of resname ARG, resid 2 and segid A and altLoc >]>


#### Open conformation

In [26]:
try:
    uo = mda.fetch_mmtf('4AKE')
except IOError:
    uo = mda.Universe("../data/4ake.pdb")

In [28]:
nv.show_mdanalysis(uo)

NGLWidget()

In [39]:
popen = uo.select_atoms("protein and segid A")

In [40]:
nv.show_mdanalysis(popen)

NGLWidget()

#### Visualize open and closed together 

First superimpose

In [41]:
from MDAnalysis.analysis.align import alignto

In [43]:
alignto(popen, pclosed)

atoms:    N_ref=1661, N_traj=1656
but we attempt to create a valid selection (use strict=True to disable this heuristic).


(23.805724332251824, 7.208983484994985)

In [50]:
popen.segments.segids = 'O'
pclosed.segments.segids = 'C'

In [51]:
merged = mda.Merge(pclosed, popen)

In [54]:
merged.segments

<SegmentGroup with 2 segments>

In [55]:
nv.show_mdanalysis(merged)

NGLWidget()

## Analysis of DIMS trajectory

In [58]:
PSF, DCD

('/Users/oliver/anaconda3/envs/workshop/lib/python3.6/site-packages/MDAnalysisTests/data/adk.psf',
 '/Users/oliver/anaconda3/envs/workshop/lib/python3.6/site-packages/MDAnalysisTests/data/adk_dims.dcd')

In [60]:
u = mda.Universe(PSF, DCD)

In [62]:
u.atoms

<AtomGroup with 3341 atoms>

In [66]:
u.atoms.residues

<ResidueGroup with 214 residues>

In [69]:
print(u.atoms.residues)

<ResidueGroup [<Residue MET, 1>, <Residue ARG, 2>, <Residue ILE, 3>, ..., <Residue ILE, 212>, <Residue LEU, 213>, <Residue GLY, 214>]>


In [67]:
u.atoms.segments

<SegmentGroup with 1 segment>

In [70]:
print(u.atoms.segments)

<SegmentGroup [<Segment 4AKE>]>


### Visualization 

In [68]:
for attr in ("altLocs", "icodes", "occupancies", "tempfactors"):
    u.add_TopologyAttr(attr)

(We add a whole bunch of attributes (with default values) to the `Universe` which `nv.show_mdanalysis()` currently expects because it reads MDAnalysis data as PDB format. If you don't do it, you get a warning that is harmless and can be ignored.)

In [61]:
nv.show_mdanalysis(u)

NGLWidget(count=98)

### Quantify conformational transition 

In [63]:
ca = u.select_atoms("protein and name CA")
ca

<AtomGroup with 214 atoms>

In [64]:
ca.atoms

<AtomGroup with 214 atoms>

In [65]:
ca.residues

<ResidueGroup with 214 residues>

Experimental FRET labels: distances
