# 11. Working with MDAnalysis

MDAnalysis is one of the more popular libraries for processing structures and trajectories.

To use MDAnalysis, you must first import the relevant functions you want to use. For the purposes
of this tutorial, let's put together all you have learned so far into one final exercise.

In this exercise, you will need to calculate, plot, and save to file the change in the phi and psi
dihedrals of alanine di-peptide from a given trajectory.

## 11.1 The MDAnalysis Universe module

To work with structures and trajectories in MDAnalysis, you must first create an MDAnalysis Universe.

To do this, you import the MDAnalysis Universe module:

In [None]:
from MDAnalysis import Universe

You then load the structure file into the Universe with:

In [None]:
alanine = Universe('ALA.pdb')

** Note**: We could have also used `import MDAnalyis`, in which case we would have to use `MDAnalysis.Universe` in place of `Universe`. The `from ... import ...` syntax allows us to import only a sub-module from a module, rather than the full thing, which can be convinient in various circumstances.

A Universe is a 'class', a python object we haven't talked about yet, but it essentially allows us to group together a bunch of related information and relevant functions for working with it. A Universe stores all the information of the atoms in your structure, including

Names

Type

Positions (i.e. coordinates)

Bonds

and so on. This could be very useful if you are trying to measure distances or dihedrals. For when we add a trajectory later on, the Universe also stores information like the current frame number, etc. We call `alanine` an 'instance' of the Universe class.

You can check the different properties of your universe using MDAnalysis. The general syntax for this is `universename.attributename`; some properties (attributes) are themselves classes (e.g. `alanine.atoms` groups all the information about our atoms) and will have their own attributes. For example, if you want to check the names of the atoms of your structure, you can type

alanine.atoms.names

In [None]:
alanine.atoms.names

###### How would you find the values of the Phi and Psi dihedral angles of alanine dipeptide, in degrees or radians?

Hint: the Phi dihedral is calculated from atoms CLP, NL, CA, and CRP, and the Psi dihedral is calculated using atoms NR, CRP, CA, and NL.

For this we can make use of the built-in functions associated with the Universe class (or 'methods'). We can use `select_atoms(selection)` to isolate a group of atoms that match `selection`; the syntax of the selections strings is generally the same as for VMD (which you may have used in day 1 of this workshop). `alanine.select_atoms(selection)` will give us an 'atom group' instance. We can then used this instance's method `dihedral` to turn four atoms into a 'dihedral' instance, and finally using `value()` on this instance will return the (current) value of that dihedral angle.

In [None]:
print(type(alanine))
print(type(alanine.select_atoms('name CLP NL CA CRP')))
print(type(alanine.select_atoms('name CLP NL CA CRP').dihedral))

phi = alanine.select_atoms('name CLP NL CA CRP').dihedral.value()
psi = alanine.select_atoms('name NR CRP CA NL').dihedral.value()
print('The Phi dihedral is {:.2f} in degrees and {:.2f} in radians'.format(phi, numpy.deg2rad(phi)))
print('The Phi dihedral is {:.2f} in degrees and {:.2f} in radians'.format(psi, numpy.deg2rad(psi)))

###### How would you plot the change in Phi and Psi (in two plot, or all in the same plot)?

In order to do this, let's break the process down into small steps.

You will first need to load the trajectory. The format to do this is:

universe_name = Universe(PDBfile, TRJfile)

In our case, the PDB file is called 'ALA.pdb' and the TRJ file is called 'ALA.xtc'.

In [None]:
# Exercise 11.1.1: Create a Universe called 'ala_trajectory' using the PDB file 'ALA.pdb' 
# and the trajectory file 'ALA.xtc'

Now that you have a Universe with your trajectory, we can access another feature of the Universe: trajectory data.

Let's try something simple now. Write a quick 'for' loop that prints the frame number.

You can tell Python to pring the frame number using:

`ala_trajectory.trajectory.frame`


In [None]:
# Exercise 11.1.2: Print the current frame number.

A `Universe.trajectory` acts kind of like a list storing each frame in a simulation and various information about is, so we can loop through it the same way we did for lists above. As we go through each frame, the coordinates of each atom, stored in Universe.atoms, will be updated.

Remind yourself of how to construct a 'for' loop.

Now, loop through the trajectory.

In [None]:
# when using mdanalysis, each frame is conventionally called a 'timestep' or 'ts'
for ts in ala_trajectory.trajectory:
    # the comma at the end here will stop it printing on a new line every time,
    # so this doesn't take up too much space
    print(ts.frame)

That's half the work done!

Now, use the same logic as Exercise 11.1 to print the phi and psi dihedral angles.

In [None]:
# Exercise 11.1.3: Print the frame number and at least one of the dihedrals.


Now that you can access both the frame number and the phi and psi dihedrals, it's time to plot them.

Let's write a little bit more this time. First, import pyplot from matplotlib. Then create three empty 
lists (frames, all_phi, and all_psi, for example). Then fill each of those lists with the relevant values.

Make sure you comment your code at every step, just so you don't forget what the code is doing!

In [None]:
# We need pyplot from matplotlib to plot the dihedrals

# Let's create a variable with the length of our trajectory

# Create three empty lists

# iterate through the trajectory
    
    # calculate phi and psi
    
    # append frame number, phi, and psi to the lists


# Check the contents of the lists

Nearly there!

Now plot the time series of the dihedrals.

You can plot a line graph using

pyplot.plot(x_value, y_value)

then show the plot using pyplot.show()

In [None]:
%matplotlib inline
# Exercise 11.1.5: Plot your data here

Optional exercise: Make your plot presentable.

You can use `pyplot.xlim()`, `pyplot.ylim()`, `pyplot.xlabel()`, and `pyplot.ylabel()` to adjust your plot.

`pyplot.title()` adds a title.

Check the help for these functions for more details!

## 11.2 Analysing a protein simulation

In day 1 of this workshop you did some molecular dynamics simulations of a HIV-1 protease protein using gromacs.
Let's now look at how you could use MDAnalysis, and an associated python visulisation library NGLview, to analyse the trajectory you generated.

### Visualising the trajectory using nglview

The nglview library is a python widget for visualising simulation trajectories, achieving a similar task to the VMD program that you will have used on day 1. One of the interesting advantages of nglview is that it interfaces directly with analysis packages such as MDAnalysis and runs within jupyter notebooks.

Let's see how we can use nglview to visualise an MDAnalysis universe object.

First we need to create a universe (let's call it protein) from the simulation output files "pre_md.pdb" and "md_cent.xtc".

_Note: we have pre-aligned the trajectory to the first frame for you so as to remove any motions related to translation._

In [None]:
# Exercise 11.2.1: Let's load a universe named protein

Next let's load nglview and use it's show_mdanalysis function to load the MDAnalysis universe

In [None]:
import nglview
protein_view = nglview.show_mdanalysis(protein)

By default this pre-sets the nglviewer to show the protein in the cartoon representation. Let's add a few options to colour the protein by secondary structure, show water oxygens and change the background colour

In [None]:
# Let's update the cartoon representation to colour the protein by secondary structure
protein_view.update_cartoon(color='sstruc')

# We then add a transparent hyperball representation of the water oxygens 
#(play with the opacity value, see what you get)
protein_view.add_hyperball('SOL and not hydrogen', opacity=0.4)

# Let's change the display a little bit
protein_view.parameters = dict(camera_type='orthographic', clip_dist=0)

# Set the background colour to black
protein_view.background = 'black'

# Call protein_view to visualise the trajectory
protein_view

The nglview output can be controlled in the following way:

- play / pause button: play the trajectory 
- double click window: enter or exit full screen mode 
- left mouse button: rotate system 
- middle mouse wheel: zoom in/out 
- right mouse button: translate system 


As you can be seen from the trajectory, the HIV-1 protease structure does indeed move, but by how much? In the next section we will see how we can use MDAnalysis to quantify backbone fluctuations.

### Calculating the root-mean-square deviation

In order to gain a quantitative description of how the HIV-1 protease moves in our simulation we can calculate the root-mean-square deviation (RMSD) of the protein backbone.

The RMSD gives us an idea of how 'stable' our protein is when compared to our starting, static, structure. The lower the RMSD is the more stable we can say our protein is. 

The RMSD as a fucntion of time, $\rho (t)$, can be defined by the following equation:

\begin{equation}
\\
\rho (t) = \sqrt{\frac{1}{N}\sum^N_{i=1}w_i\big(\mathbf{x}_i(t) - \mathbf{x}^{\text{ref}}_i\big)^2}
\end{equation}

Luckily MDAnalysis has it's own built in function to calcualte this, we can import it like we did before.

In [None]:
from MDAnalysis.analysis.rms import RMSD as rmsd

In order to calculate the RMSD for every frame in our trajectory we will need:

- A reference structure
- A universe object
- A selection of atoms

In our case the reference structure will be the HIV-1 protease structure in the first frame.

Our universe object will be the 'protein' object we defined above.

For our selection we will use the backbone atoms.

In [None]:
ref = Universe('pre_md.pdb', 'md_cent.xtc')

# Set the ref trajectory to the first frame
ref.trajectory[0]

Due to the way that GROMACS post processes the trajectory file we need to edit it slightly before running our RMSD.

This is done by aligning all frames to the reference structure. 

In [None]:
from MDAnalysis.analysis import align

protein = Universe('pre_md.pdb', 'md_cent.xtc')
align_strucs = align.AlignTraj(protein, ref, select="backbone", weights="mass", in_memory=True, verbose=True)

R = align_strucs.run()

You will have noticed that running this function stores it in the variable 'R', we can now access the RMSD values:

In [None]:
rmsd_data = R.rmsd

Really, we'd like to visualise how the RMSD changes over time and this can be done in the same way you did in Excercise 11.1.5.

Take a look at the 'rmsd_data' variable (it's a numpy array) and try plotting it below.

You will need to access 'rmsd_data' (a numpy array) in order to plot both the time and the RMSD as a line plot.


In [None]:
# Excercise 11.2.2: Plot the RMSD data for the HIV-1 protease system. 

# Make sure to add appropriate axis titles.
# What does the RMSD tell you about the protein?
# [If you have time] What happens when you calculate the RMSD using more atoms (i.e. not just the backbone)

### Calculating the root-mean-square fluctuation

To look at how each residue flucuates over it's average postion we can use the closely related measurement of root-mean-square fluctuation (RMSF).

The RMSF for an atom, $\rho_i$, is given by:

\begin{equation}
\rho_i = \sqrt{\sum^N_{i=1} \big\langle(\mathbf{x}_i - \langle \mathbf{x}_i \rangle )^2 \big\rangle }
\end{equation}

In [None]:
from MDAnalysis.analysis.rms import RMSF as rmsf

In [None]:
# Reset the trajectory to the first frame
protein.trajectory[0]

# We will need to select the alpha Carbons only
calphas = 
rmsf_calc = rmsf(calphas, verbose=True).run()

In [None]:
# Excercise 11.2.3: Plot the RMSF data for the HIV-1 protease system. 
    # Tip, in order to plot the resids you will need to access them through the rmsf_calc object

# Make sure to add appropriate axis titles.
# What parts of the protein have a high RMSF, can you locate these on the protein structure?
# [If you have time] What happens when you calculate the RMSF using more atoms (i.e. not just the backbone)