## CCP-BioSim Training Course:
### Using PCA for the Analysis of MD Simulations

The aim of the notebook is to illustrate methods we can use to assess convergence and sampling in MD trajectories. You will compare and contrast the most basic and widely-used method to do this - RMSD analysis - with the use of more sophisticated PCA based approaches. You will apply the approaches to two common scenarios: firstly the comparison of the dynamics of a protein the presence and absence of a bound ligand, and secondly the evaluation of sampling and convergence in an ensemble of independent, replicate, MD trajectories of a protein.

In the process, you will see how iPython (Jupyter) notebooks help with interactive data exploration and analysis.

### Prerequisites

We assume: 

1. you have a basic understanding of how to use an iPython (Jupyter) notebook.

2. you already have a basic understanding of PCA methods. For much more information about the PCA package used here - **pyPcazip** -, see [here](https://bitbucket.org/ramonbsc/pypcazip) and the paper [Shkurti et al., SoftwareX](http://dx.doi.org/10.1016/j.softx.2016.04.002).

**pyPcazip** uses a python library called **MDPlus** and that is mainly what you will see in action here. For a guide to the **MDPlus** API, see [here](http://pypcazip.readthedocs.io/en/latest/).

## Part 0: Check your environment
Firstly we need to check that your python environment has everything installed that you need.

Run the code in the cell below to do this.

In [None]:
all_good = True
try:
    import numpy as np
except ImportError:
    print('Error - you neeed to install numpy')
    all_good = False
try:
    from MDPlus.core import Fasu, Cofasu
    from MDPlus.analysis import mapping, pca
except ImportError:
    print('Error - you need to install pyPcazip')
    all_good = False
try:
    import matplotlib.pyplot as plt
    % matplotlib inline
except ImportError:
    print('Error - you need to install matplotlib')
    all_good = False
    
try:
    import nglview as nv
except ImportError:
    print('Error - you need to install nglview')
    all_good = False

files = ['wt_ca.binpos', 'wt_ca.pdb', 'irhy_ca.binpos', 'irhy_ca.pdb', '1rhw_prot.pdb']
files += ['rep{}/1rhw.md1.nc'.format(i + 1) for i in range(8)]
import os
for file in files:
    if not os.path.exists(file):
        print('Error - can\'t find data file {}'.format(file))
        all_good = False
if all_good:
    print('Success - you seem to have everything ready to go.')

If running the cell above led to error messages, please fix these before you try to go any further.
____

[Next: Part 1: Comparison of trajectories of wild-type neuraminidase and of the I233R/H275Y (IRHY) double mutant.](.PCA_analysis_of_MD_simulations-1.ipynb)