### Analysis of ubiquitin simulations

Here you will analyse the results from a short (1 nanosecond) simulation of ubiquitin, investigating the question "how similar is the MD trajectory to the crystal and NMR structures?"

The notebook illustrates:

1. The use of the Python MDTraj module (see http://www.mdtraj.org)
2. The use of the Python matplotlib module for plotting (see http://www.matplotlib.org)
3. The use of the Python MDPlus module for Principal Component Analysis (see https://claughton.bitbucket.io/pypcazip/)

---

First we import the necessary Python modules and prepare for plotting:

In [None]:
import mdtraj as mdt
from matplotlib import pyplot as plt
%matplotlib inline

Load the trajectories (the X-ray structure is in effect a single frame trajectory)

In [None]:
trajfile = 'ubq_1ns.xtc' # The MD trajectory in Gromacs .xtc format
topfile = 'ubq.pdb' # A PDB format file that matches the MD trajectory
xray = '1ubq.pdb' # The PDB file for the X-ray structure, downloaded from the Protein Data Bank
nmr = '1d3z.pdb' # The PDB file for the NMR structures (ten models), doenloaded from the Protesin Data Bank

mdtraj = mdt.load(trajfile, top=topfile)
xtraj = mdt.load(xray)
ntraj = mdt.load(nmr)
print(mdtraj)
print(xtraj)
print(ntraj)

The print statements make it clear that these three trajectory objects differ from each other. The MD and NMR trajectories contain a different number of atoms from the X-ray trajectory, because the X-ray structure does not include hydrogen atoms. The X-ray structure has more residues though, because it also contains coordinates for crystallographic waters. Both the MD and X-ray trajectories include unit cell data, the NMR one does not.

---

Before we can compare these trajectories, we have to make them compatible with each other. A simple approach is to reduce them to just protein backbone atoms. In the cell below we define a simple function that strips a trajectory down to these atoms, and then apply it to each trajectory:

In [None]:
def to_backbone(traj):
    '''
    Reduce an MDTraj trajectory to backbone atoms only
    '''
    sel = traj.topology.select('backbone')
    topsel = traj.topology.subset(sel)
    xyzsel = traj.xyz[:, sel]
    newtraj = mdt.Trajectory(xyzsel, topsel)
    return newtraj

mdsel = to_backbone(mdtraj)
xsel = to_backbone(xtraj)
nsel = to_backbone(ntraj)
print(mdsel)
print(xsel)
print(nsel)

The output from the print statements should convince you that things are better now.

Let's start with something simple: calculate the RMSD of each snapshot in the MD trajectory from a) the first frame in the MD trajectory, b) the X-ray structure, and c) the first model in the NMR trajectory (note that MDTraj reports RMSDs in nanometers):

In [None]:
mrmsd = mdt.rmsd(mdsel, mdsel[0])
xrmsd = mdt.rmsd(mdsel, xsel)
nrmsd = mdt.rmsd(mdsel, nsel[0])
plt.plot(mrmsd, label='From MD start')
plt.plot(xrmsd, label='From X-ray')
plt.plot(nrmsd, label='From NMR model 1')
plt.legend()

From the plot you should be able to draw the following conclusions:
1. The MD trajectory quickly drifts away from the starting structure
2. It remains closer to the starting structure than to the X-ray or NMR structures
3. It seems to be getting slightly closer to the X-ray and NMR structures
4. It seems to be getting marginally closer to the X-ray structure than the NMR structure

---

OK: But what about the other models in the NMR trajectory?

**EXERCISE**: 

**Write code in the cell below (and add others if you need them) to find out if the MD trajectory gets closer to one of the other NMR models (hint: you could adapt the code in the cell above to do it by a trial-and-error method, but you might be able to come up with something more elegant)**

In [None]:
# write your code here:


---
Now we will use PCA methods to get a more wholistic view of the way the MD simulation samples conformational space.

In the cell below we load the PCA module, and concatenate the individual trajectories together into one big one, so we do PCA on everything together:

In [None]:
from MDPlus.analysis import pca
all_sel = xsel + nsel + mdsel # Note the order in which we combine the trajectories, we will need to remember this
p = pca.fromtrajectory(all_sel)

Let's look at the eigenvalues from the PCA:

In [None]:
print(p.evals)

---

**EXERCISE**: 

**Produce a plot of this data: maybe as bars instead of lines? Look at the matplotlib documentation.**

In [None]:
# write your code here:


It looks like a 2D plot in the (PC1, PC2) subspace should resolve most of the variance in the data. The projections are in the `p.projs` attribute. Each projection vector contains one value per snapshot: remembering the order in which we combined the trajectories prior to PCA, we know that the first value corresponds to the X-ray structure, the next ten to the NMR structures, and the rest to the MD data, so we can separate things out accordingly:

In [None]:
plt.plot(p.projs[0][0], p.projs[1][0], 'Dc', label='X-ray structure')
plt.plot(p.projs[0][1:11], p.projs[1][1:11], 'og', label='NMR structures')
plt.plot(p.projs[0][11:], p.projs[1][11:], ',m', label='MD snapshots')
plt.legend()

**_EXERCISES_**:

1. **Try looking in different subspaces - e.g. the PC1/PC3 plane - is the relationship between the scatter of the MD snapshots, and the positions of the X-ray and NMR structures, any different?**

2. **Sometimes the difference between MD snapshots and an experimental structure is dominated by the high mobility of a few residues at the N- and/or C-terminii of the protein. Experiment repeating all the above analyses with a few terminal residues ommitted. To do this you just need to adjust the selection string in the `to_backbone()` function; e.g. you could try something like 'backbone and resid 5 to 71'.**