# Propka analysis of a protein structure

This tutorial illustrates the application of [Propka](https://github.com/jensengroup/propka-3.1) to the analysis of the pKas of acidic and basic residues in the structure of the cysteine protease cruzein (PDB code 2oz2).

`Propka` is a command-line application and you will work with it in the terminal window. This Jupyter notebook window provides the step-by-step instructions, and also the molecular structure viewer.

In addition to `Propka` itself, the tutorial makes use of two further Python packages:
 - [nglview](http://nglviewer.org/nglview/latest/): A package to embed a molecular viewer in a Jupyter notebook.
 - [mdtraj](http://www.mdtraj.org): a very useful trajectory analysis package.
 
## Prerequisites:

You must have the Python packages `propka`, `nglviewer` and `mdtraj` installed.

All can be installed using `pip`:

    pip install mdtraj
    pip install nglviewer
    pip install propka


You also need a copy of the PDB file `2oz2.pdb` in the `./data` directory below the one you launch this notebook from. If it's not already here, download it from the [PDB website](http://www.rcsb.org/structure/2oz2).

-----

In this first cell we check we have everything we need to run the tutorial:

In [None]:
OK = True
import subprocess
import os.path as op
pdb_file = 'data/2oz2.pdb'
try:
    import mdtraj as mdt
except ImportError:
    print('Error: you do not seem to have the MDTraj Python package installed')
    OK = False

result = subprocess.call('which propka3', shell=True)
if result != 0:
    print('Error: you do not seem to have Propka installed')
    OK = False
    
try:
    import nglview as nv
except ImportError:
    print('Error: you don\'t seem to have nglview installed - use pip or similar to get it then try again.')
    OK = False

if not op.exists(pdb_file):
    print('Error: you don\'t seem to have the data file {} in this directory.'.format(pdb_file))
    OK = False

if OK is False:
    print('This notebook will not work until you fix these issues')
else:
    print('Success: you seem to have all the packages installed that are needed.')

## Part 1: visualisation of the protein.

In this cell we load the pdb file and visualize it using nglviewer:

In [None]:
traj = mdt.load(pdb_file)
view = nv.show_mdtraj(traj)
view

If you look carefully at the structure above you will see that it contains two copies of the protein, as that was what was seen in the crystal structure unit cell. At the interface between the two you can see the two copies of the ligand molecule. In the PDB file one copy of the protein is labelled as chain A, and the other as chain C. We will analyse each of these independently.

## Part 2: Pka prediction using Propka

Propka is a command line tool, not a Python function, so in the terminal window type:

    propka3 data/2oz2.pdb -c A
    
At the end of the output is a table of the predicted pKa of each acidic and basic residue. Which out of Asp57 and Asp60 is predicted to be protonated at physiological pH?

There is another acidic residue with an unusually high pKa – which is it? From examination of the structure, can you explain why?

If you run the code in the next cell, it will change the view of the protein in the cell above to zoom in on this region to help you answer this question.

In [None]:
view.clear_representations()
view.add_cartoon(':A and protein')
view.add_representation('ball+stick', ':A and acidic')
view.add_representation('label', '57:A.CB')
view.add_representation('label', '60:A.CB')
view.add_representation('label', '50:A.CB')
view.center('50:A')

Back in the terminal window, repeat the analysis on the other protein chain, and compare the results:

    propka3 data/2oz2.pdb -c C
   
You will probbaly notice some differences, though not large. If you run the code in the cell below it will reset the molecule view above to zoom in on the relevant region:

In [None]:
view.clear_representations()
view.add_cartoon(':B and protein')
view.add_representation('ball+stick', ':B and acidic')
view.add_representation('label', '57:B.CB')
view.add_representation('label', '60:B.CB')
view.add_representation('label', '50:B.CB')
view.center('50:B')

## Exercise

What about the histidine residues in this structure? Are they predicted to be neutral or protonated? Adapt the code in the cell above to zoom in on other "interesting" regions of the protein.


## Summary

The microenvironment inside a protein is a very different place from bulk water, and so the true pKas of ionizable groups can be significantly different from what you might expect. Using a tool like `Propka` is a vital step in preparing a protein for biomolecular simulation.