<span style="float:right"><a href="http://moldesign.bionano.autodesk.com/" target="_blank" title="About">About</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://forum.bionano.autodesk.com/c/Molecular-Design-Toolkit" target="_blank" title="Forum">Forum</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://github.com/autodesk/molecular-design-toolkit/issues" target="_blank" title="Issues">Issues</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://bionano.autodesk.com/MolecularDesignToolkit/explore.html" target="_blank" title="Tutorials">Tutorials</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://autodesk.github.io/molecular-design-toolkit/" target="_blank" title="Documentation">Documentation</a></span>
</span>
![Molecular Design Toolkit](img/Top.png)
<br>

<center><h1>Example 4: The Dynamics of HIV Protease bound to a small molecule </h1> </center>

This notebook prepares a co-crystallized protein / small molecule ligand structure from [the PDB database](http://www.rcsb.org/pdb/home/home.do) and prepares it for molecular dynamics simulation. 

 - _Author_: [Aaron Virshup](https://github.com/avirshup), Autodesk Research<br>
 - _Created on_: August 9, 2016
 - _Tags_: HIV Protease, small molecule, ligand, drug, PDB, MD

In [None]:
import moldesign as mdt
import moldesign.units as u

Contents
=======
---
   - [I. The crystal structure](#I.-The-crystal-structure)
     - [A. Download and visualize](#A.-Download-and-visualize)
     - [B. Try assigning a forcefield](#B.-Try-assigning-a-forcefield)
   - [II. Parameterizing a small molecule](#II.-Parameterizing-a-small-molecule)
     - [A. Isolate the ligand](#A.-Isolate-the-ligand)
     - [B. Assign bond orders and hydrogens](#B.-Assign-bond-orders-and-hydrogens)
     - [C. Generate forcefield parameters](#C.-Generate-forcefield-parameters)
   - [III. Prepping the protein](#III.-Prepping-the-protein)
     - [A. Strip waters](#A.-Strip-waters)
     - [B. Histidine](#B.-Histidine)
   - [IV. Prep for dynamics](#IV.-Prep-for-dynamics)
     - [A. Assign the forcefield](#A.-Assign-the-forcefield)
     - [B. Attach and configure simulation methods](#B.-Attach-and-configure-simulation-methods)
     - [D. Equilibrate the protein](#D.-Equilibrate-the-protein)


## I. The crystal structure

First, we'll download and investigate the [3AID crystal structure](http://www.rcsb.org/pdb/explore.do?structureId=3aid).

### A. Download and visualize

In [None]:
protease = mdt.from_pdb('3AID')
protease

In [None]:
protease.draw()

### B. Try assigning a forcefield

This structure is not ready for MD - this command will raise a `ParameterizationError` Exception. After running this calculation, click on the **Errors/Warnings** tab to see why.

In [None]:
newmol = mdt.assign_forcefield(protease)

You should see 3 errors: 
 1. The residue name `ARQ` not recognized
 1. Atom `HD1` in residue `HIS69`, chain `A` was not recognized
 1. Atom `HD1` in residue `HIS69`, chain `B` was not recognized
 
(There's also a warning about bond distances, but these can be generally be fixed with an energy minimization before running dynamics)

We'll start by tackling the small molecule "ARQ".

## II. Parameterizing a small molecule
We'll use the GAFF (generalized Amber force field) to create force field parameters for the small ligand.

### A. Isolate the ligand
Click on the ligand to select it, then we'll use that selection to create a new molecule.

In [None]:
sel = mdt.widgets.ResidueSelector(protease)
sel

In [None]:
drugres = mdt.Molecule(sel.selected_residues[0])
drugres.draw2d(width=700)

### B. Assign bond orders and hydrogens
A PDB file provides only limited information crystal structure doesn't provide indicate bond orders, hydrogen locations, or formal charges, but we can add those with the the `clean_pdb` tool:

In [None]:
drugmol = mdt.add_missing_data(drugres)
drugmol.draw(width=500)

In [None]:
drugmol

### C. Generate forcefield parameters

We'll next generate forcefield parameters using this ready-to-simulate structure.

**NOTE**: for computational speed, we use the `gasteiger` charge model. This is not advisable for production work! `am1-bcc` or `esp` are far likelier to produce sensible results.

In [None]:
drug_parameters = mdt.parameterize(drugmol, charges='gasteiger')

## III. Prepping the protein

Section II. dealt with getting forcefield parameters for an unknown small molecule. Next, we'll prep the other part of the structure.

### A. Strip waters

Waters in crystal structures are usually stripped from a simulation as artifacts of the crystallization process. Here, we'll remove the waters from the protein structure.

In [None]:
dehydrated = mdt.Molecule([atom for atom in protease.atoms if atom.residue.type != 'water'])

### B. Histidine
Histidine is notoriously tricky, because it exists in no less than three different protonation states at biological pH (7.4) - the "delta-protonated" form, referred to with residue name `HID`; the "epsilon-protonated" form aka `HIE`; and the doubly-protonated form `HIP`, which has a +1 charge. Unfortunately, crystallography isn't usually able to resolve the difference between these three.

Luckily, these histidines are pretty far from the ligand binding site, so their protonation is unlikely to affect the dynamics. We'll therefore use the `guess_histidine_states` function to assign a reasonable starting guess.

In [None]:
mdt.guess_histidine_states(dehydrated)

## IV. Prep for dynamics

With these problems fixed, we can succesfully assigne a forcefield and set up the simulation.

### A. Assign the forcefield
Now that we have parameters for the drug and have dealt with histidine, the forcefield assignment will succeed:

In [None]:
sim_mol = mdt.assign_forcefield(dehydrated, parameters=drug_parameters)

### B. Attach and configure simulation methods

Armed with the forcefield parameters, we can connect an energy model to compute energies and forces, and an integrator to create trajectories:

In [None]:
sim_mol.set_energy_model(mdt.models.OpenMMPotential, implicit_solvent='obc', cutoff=8.0*u.angstrom)
sim_mol.set_integrator(mdt.integrators.OpenMMLangevin, timestep=2.0*u.fs)
sim_mol.configure_methods()

### D. Equilibrate the protein
The next series of cells first minimize the crystal structure to remove clashes, then heats the system to 300K.

In [None]:
mintraj = sim_mol.minimize()
mintraj.draw()

In [None]:
traj = sim_mol.run(40*u.ps)

In [None]:
viewer = traj.draw(display=True)
viewer.autostyle()