# Normal Mode Analysis (NMA)
### University of California, Berkeley - Spring 2024

The goal of today’s lecture is to present Normal Mode Analysis (NMA) simulations of macromolecules and how to run them using Python programmming language. In this lecture, `ProDy` package is used for performing simulations and visualzations. 

The following concepts are covered in this notebooks:

* __Introduction__
* __ProDy__
* __GNM example with ProDy__

## Introduction
A normal mode of an oscillating system is a pattern of motion in which all parts of the system move sinusoidally with the same frequency and with a fixed phase relation. The free motion described by the normal modes takes place at fixed frequencies. These fixed frequencies of the normal modes of a system are known as its natural frequencies or resonant frequencies. A physical object, such as a building, bridge, or molecule, has a set of normal modes and their natural frequencies that depend on its structure, materials and boundary conditions. In music, normal modes of vibrating instruments (strings, air pipes, drums, etc.) are called "harmonics" or "overtones".

The most general motion of a system is a superposition of its normal modes. The modes are normal in the sense that they can move independently, that is to say that an excitation of one mode will never cause motion of a different mode. In mathematical terms, normal modes are orthogonal to each other.

## [ProDy](http://prody.csb.pitt.edu/)

### Introduction

ProDy is a free and open-source Python package for protein structural dynamics analysis. It is designed as a flexible and responsive API suitable for interactive usage and application development.

### Structure analysis
ProDy has fast and flexible PDB and DCD file parsers, and powerful and customizable atom selections for contact identification, structure comparisons, and rapid implementation of new methods.

### Dynamics analysis
In this lecture, we are going to use ProDy for Normal Mode Analysis (NMA) which can be performed using 

* Anisotropic network model (ANM)
* Gaussian network model (GNM)
* ANM/GNM with distance and property dependent force constants

Here, we will use Ubiquitin to perform Gaussian network model a.k.a. GNM.

It's worth mentioning that dynamics from experimental datasets, theoretical models and simulations can be visualized using [NMWiz](http://prody.csb.pitt.edu/nmwiz/). Normal Mode Wizard (NMWiz) is a VMD plugin designed for visual comparative analysis of normal mode data, i.e. modes may come from principal component, essential dynamics, normal mode analysis or may be any vector describing a molecular motion.


### Installation
You can install `ProDy` package using the following commands in your terminal:


1. Installing to your base Python installation:
```bash
pip install prody matplotlib
```

1. Installing using Conda (Recommended):
```bash
conda create -n nma
conda activate nma
conda install ipykernel matplotlib conda-forge::prody
```  
2. If using jupyter lab   
```bash
python -m ipykernel install --user --name=nma
```
3. Make sure to activate your environment as a kernel in your notebook (top right)


### Getting Started
Ok great! Now let's start using `ProDy` :)
To start using ProDy and turn this notebook's interactive mode on, Please run the following cell. This will import everything from ProDy package and enables the interactive mode.

In [None]:
from prody import *
from pylab import *

ion() # turns interactive mode on

## Gaussian Network Model (GNM) Analysis with ProDy

This example shows how to perform GNM calculations using an X-ray structure of `ubiquitin`. `Ubiquitin` is a small, 76-amino acid, regulatory protein that was discovered in 1975. It's present in all eukaryotic cells, directing the movement of important proteins in the cell, participating in both the synthesis of new proteins and the destruction of defective proteins.

A GNM instance that stores the Kirchhoff matrix and normal mode data describing the intrinsic dynamics of the protein structure will be obtained. GNM instances and individual normal modes (Mode) can be used as input to functions in ```prody.dynamics``` module.

We'll go through the analysis step by step. 

### 1. Parsing PDB file
Ok, let's parse the protein's PDB structure file with `ProDy`. This can be done using `parsePDB()` function from `ProDy`. The function will accept an identifier as the PDB ID. If the PDB file with the given ID exists in the local directory, it will load the file locally. Otherwise, the PDB file will be downloaded automatically.

![Ubiquitin 3D structure](https://proteopedia.org/wiki/images/c/c4/Lysubq.png)

Reference: https://proteopedia.org/wiki/index.php/Image:Lysubq.png

In [None]:
ubiquitin = parsePDB('1aar')

In [None]:
ubiquitin

This file contains 2 chains, and a flexible C-terminal (residues 71-76). We only want to use Cα atoms of first 70 residues from chain A, so we select them:

In [None]:
c_alphas = ubiquitin.select('calpha and chain A and resnum < 71')

In [None]:
c_alphas

A comprehensive documentation of atom selection is provided by `ProDy` in this [link](http://prody.csb.pitt.edu/manual/reference/atomic/select.html#selections). You can learn more about the selection in the mentioned webpage. 

As an example which can be useful to know here, __Cα__ is the central point in the backbone of every amino acid. The alpha carbon (α-carbon or Cα) is what connects the amino group to the acid carboxyl group, giving amino acids their name. The alpha carbon also serves as the point of attachment for the sidechains of 19 out of 20 amino acids used in protein building.

![](https://static.wikia.nocookie.net/foldit/images/e/ea/Backbone_overview_group.stickpolarh.png/revision/latest?cb=20180103000343)

Figure Reference: https://static.wikia.nocookie.net/foldit/images/e/ea/Backbone_overview_group.stickpolarh.png/revision/latest?cb=20180103000343

### Build Kirchhoff Matrix
First, Let's create a `GNM` object. 

In [None]:
gnm = GNM(name='Ubiquitin')

In [None]:
gnm

We can build Kirchhoff matrix using selected atoms and `gnm.buildKirchhoff()` method. The method accepts the coordinates as the first arguments. It also accepts two additonal parameters named `cutoff` as the cutoff distance for pairwise interactions and `gamma` as spring constant. You can leave the parameters with their default values and move on. 

In [None]:
gnm.buildKirchhoff(coords=c_alphas, cutoff=10.0, gamma=1.0)

Now the Kirchhoff matrix is built, we can get a copy of it using `gnm.getKirchhoff()` method.

In [None]:
k_matrix = gnm.getKirchhoff()

In [None]:
k_matrix

__NOTE__: If you have already calculated the Kirchhoff matrix and want to set it manually, you can use `gnm.setKirchhoff()` method.

### It's time to calculate the normal modes! :)
Now that we have the Kirchhoff matrix, Calculating the normal modes can be done simply by using `gnm.calcModes()` method.

Note that by default 20 non-zero (or non-trivial) modes and 1 trivial mode are calculated. Trivial modes are not retained. To calculate different numbers of non-zero modes or to keep zero modes, you can modify `n_modes` and `zeros` parameters of the function e.g. try `gnm.calcModes(50, zeros=True)`. You can set `Turbo` parameter tu `True` if you want to calculate the modes faster. It's also set by default to `True`.

In [None]:
gnm.calcModes(n_modes=20, zeros=False, turbo=True)

Ok. The modes are ready. You can get the results the following functions: 

* `gnm.getEigvals()`: Get Eigenvalues
* `gnm.getEigvecs()`: Get Eigenvectors
* `gnm.getCovariance()`: Get Covariance matrix. Note that covariance matrices are calculated using the available modes in the model, which is the slowest 20 modes in this case. If the user calculates `M` modes, these `M` modes will be used in calculating the covariance matrix.

In [None]:
gnm.getEigvals()

In [None]:
gnm.getEigvals().shape

In [None]:
gnm.getEigvecs()

In [None]:
gnm.getEigvecs().shape

In [None]:
gnm.getCovariance()

In [None]:
gnm.getCovariance().shape

In order to access to each single mode, you can index `gnm` object. Normal mode indices start from 0, so slowest mode has index 0.

In [None]:
slowest_mode = gnm[0]

In [None]:
slowest_mode.getEigval()

In [None]:
slowest_mode.getEigvec()

### Hinge sites
Hinge sites identified from all calculated modes. You can identify them using the global `calcHinges()` function. This function accepts `gnm` object as the first argument.

In [None]:
hinges = calcHinges(modes=gnm)

In [None]:
len(hinges)

In [None]:
hinges[:20]

### Short question! Calculate the hinges of the slowest mode

In [None]:
### YOUR CODE GOES HERE
slowest_hinge = calcHinges(modes=slowest_mode)
slowest_hinge

These numbers correspond to node indices in the `gnm` object, which does not know anything about the original atoms. In order to get the residue numbers corresponding to these hinges, we can index the resum array with the hinges list as follows:

In [None]:
# 1. get a copy of residue numbers
resnums = c_alphas.getResnums() 

In [None]:
# 2. calculate hinge site of the mode of interest (here we used the 2nd mode)
mode2_hinges = calcHinges(gnm[1])

In [None]:
# 3. get residue numbers corresponding these hinges
resnums[mode2_hinges]

### Visualization!
All of the `ProDy` visualization functions are prefixed with __`show...()`__. Let’s use some of them to plot data:

#### Contact Map

In [None]:
showContactMap(gnm)

#### Cross Correlations

In [None]:
showCrossCorr(gnm)

#### Slow mode shape
By default, hinge sites will be shown in `mode shape plot` indicated by __red__ stars, and it can be turned off by setting `hinges=False`. The option `zero=True` is to turn on the reference line of zero.

In [None]:
showMode(slowest_mode, hinges=True, zero=True)
grid()

#### Square fluctuations

In [None]:
showSqFlucts(slowest_mode, hinges=True)

#### Protein structure bipartition
Given a GNM mode, protein structure can be partitioned into two parts that move with respect to each other. The function `showProtein()` can take a __GNM mode__ as input and visualize the bipartition.

In [None]:
showProtein(c_alphas, mode=gnm[0])

## Deliverable
Explore protein data bank and find a protein .pdb of your choice and perform normal mode analysis on the protein. For ease, duplicate the notebook under a different name. Using this new .pdb, run all the cells as before and generate new plots. Analyze your results in a write-up of minimum 250 words not including figure names or descriptions. Use your plots to support your analysis using figure numbers and detailed descriptions. Submit as a .pdf to bCourses.  

Adapted from: https://github.com/Naghipourfar/molecular-biomechanics/