# Day 1, Lecture 1

**Topic: Molecules**

## MDAnalysis Installation

```bash
conda install -y -c conda-forge mdanalysis mdanalysistests mdanalysisdata
```

In [1]:
import MDAnalysis as mda

## Documentation

https://docs.mdanalysis.org/stable/index.html

https://userguide.mdanalysis.org/stable/index.html

## NGLView installation

```bash
conda install -y -c conda-forge nglview
jupyter-nbextension enable nglview --py --sys-prefix
jupyter notebook .
```

In [2]:
import nglview as nv



### Background: Adenylate kinase
As an example we will analyze the enzyme *adenylate kinase*. It catalyzes the reaction ATP + AMP $\rightleftharpoons$ 2 ADP. It undergoes a *conformational transition* between a closed conformation (e.g. PDB code 1AKE) and an open conformation (e.g. 4AKE) [1], even in the absence of substrate.


<div>
<img src="figures/angle_defs.png" alt="AdK conformations (from [2])" width="500"/>
</div>


## Loading in data

To begin with, you can fetch data directly from the RCSB PDB.

In [3]:
adk_closed = mda.fetch_mmtf('1AKE')
nv.show_mdanalysis(adk_closed)

NGLWidget()

Alternatively, you can load data in from your own file.

In [4]:
adk_open = mda.Universe("4ake.pdb")
nv.show_mdanalysis(adk_open)

NGLWidget()

Finally (and we will be doing this for the remainder of the tutorial for ease), you can load in provided files from the MDAnalysisTests and MDAnalysisData packages.

In [5]:
from MDAnalysis.tests.datafiles import PSF, DCD
closed_to_open = mda.Universe(PSF, DCD)
nv.show_mdanalysis(closed_to_open)

NGLWidget(max_frame=97)

In [6]:
from MDAnalysisData.datasets import fetch_adk_equilibrium
equilibrium_data = fetch_adk_equilibrium()
equilibrium = mda.Universe(equilibrium_data.topology, equilibrium_data.trajectory)
nv.show_mdanalysis(equilibrium)

NGLWidget(max_frame=4186)

## Selections

In [7]:
segment_a = adk_closed.select_atoms("protein and segid A")
nv.show_mdanalysis(segment_a)

NGLWidget()

In [8]:
print(segment_a[:5])

<AtomGroup [<Atom 1: N of type N of resname MET, resid 1 and segid A and altLoc >, <Atom 2: CA of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 3: C of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 4: O of type O of resname MET, resid 1 and segid A and altLoc >, <Atom 5: CB of type C of resname MET, resid 1 and segid A and altLoc >]>


In [9]:
print(segment_a.atoms.resids)

[  1   1   1 ... 214 214 214]


This is an example of a distance based selection so we can lead into UpdatingAtomGroups next time

In [10]:
solvshell = adk_closed.select_atoms("resname HOH and around 5.0 protein")
solvshell

<AtomGroup with 372 atoms>

## Fundamental data structures

### AtomGroup

In [11]:
protein = adk_closed.atoms.select_atoms("protein")
protein

<AtomGroup with 3317 atoms>

In [12]:
print(protein[:5])

<AtomGroup [<Atom 1: N of type N of resname MET, resid 1 and segid A and altLoc >, <Atom 2: CA of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 3: C of type C of resname MET, resid 1 and segid A and altLoc >, <Atom 4: O of type O of resname MET, resid 1 and segid A and altLoc >, <Atom 5: CB of type C of resname MET, resid 1 and segid A and altLoc >]>


In [13]:
ag = protein + solvshell
ag

<AtomGroup with 3689 atoms>

### Residues and Segments

In [14]:
protein.residues[10:50]

<ResidueGroup with 40 residues>

In [15]:
print(protein.residues[10:50])

<ResidueGroup [<Residue ALA, 11>, <Residue GLY, 12>, <Residue LYS, 13>, ..., <Residue GLN, 48>, <Residue ALA, 49>, <Residue LYS, 50>]>


In [16]:
protein.residues[0].resname

'MET'

In [17]:
print(protein.segments)

<SegmentGroup [<Segment A>, <Segment B>]>


### Atom data as NumPy arrays

In [18]:
protein.names

array(['N', 'CA', 'C', ..., 'C', 'O', 'OXT'], dtype=object)

In [19]:
protein.charges

array([0., 0., 0., ..., 0., 0., 0.])

In [20]:
protein.positions

array([[26.981, 53.977, 40.085],
       [26.091, 52.849, 39.889],
       [26.679, 52.163, 38.675],
       ...,
       [24.173,  7.911, -3.276],
       [24.73 ,  8.496, -4.208],
       [23.962,  8.474, -2.196]], dtype=float32)

## Working example

In [21]:
import numpy as np

Experimental FRET labels: distances

<div>
<img src="figures/fret_distances_adk.png" alt="FRET distances" width="250"/>
</div>


* I52 - K145
* A55 - V169
* A127 - A194

Calculate the C$_\beta$ distances as proxies for the spin-label distances.

In [22]:
beta = closed_to_open.select_atoms("name CB")

donors = beta.select_atoms("resname ILE and resid 52", 
                           "resname ALA and resid 55",
                           "resname ALA and resid 127")
acceptors = beta.select_atoms("resname LYS and resid 145", 
                           "resname VAL and resid 169",
                           "resname ALA and resid 194")

In [23]:
r = donors.positions - acceptors.positions
r

array([[ 18.454721 , -20.825457 ,  13.870202 ],
       [ -5.15199  ,  -7.7556076,  -8.505607 ],
       [-22.466576 , -18.11174  , -13.550823 ]], dtype=float32)

In [24]:
d = np.linalg.norm(r, axis=1)  # do it the non-PBC way first so we can wow everyone when Richard shows up
d

array([31.091139, 12.611019, 31.881138], dtype=float32)

## References

1. S. L. Seyler and O. Beckstein. Sampling of large conformational transitions: Adenylate kinase as a testing ground. Molec. Simul., 40(10–11):855–877, 2014. doi: [10.1080/08927022.2014.919497](https://doi.org/10.1080/08927022.2014.919497)
2. O. Beckstein, E. J. Denning, J. R. Perilla, and T. B. Woolf. Zipping and unzipping of adenylate kinase: Atomistic insights into the ensemble of open ↔ closed transitions. 394(1):160–176, 2009. doi: [10.1016/j.jmb.2009.09.009](https://doi.org/10.1016/j.jmb.2009.09.009)
3. S. L. Seyler, A. Kumar, M. F. Thorpe, and O. Beckstein. Path similarity analysis: A method for quantifying macromolecular pathways. PLoS Comput Biol, 11(10):e1004568, 10 2015. doi: [10.1371/journal.pcbi.1004568](https://doi.org/10.1371/journal.pcbi.1004568)