# Molecular Docking with AutoDock Vina

Inspired by the COVID-19 pandemic, in this exercise we will perfom a docking assay of **nirmatrelvir** onto the binding site of a bat coronavirus MPro. The combination of nirmatrelvir/ritonavir is the new Pfizer drug paxlovoid, which is used to treat COVID-19.

For our exercise, we will install and use **MGLtools** to prepare the target protein files, **OpenBabel** to prepare the ligand files, **AutoDock Vina** for the docking procedure and **py3Dmol** to establish the appropriate search grid configuration and analyze the results. We will also use prody to align the protein to a crystal structure.

This exercise has been adapted to dock to MPro (instead of HIV protease) from [Lab 06 of IIBM3202 Molecular Modeling and Simulation](https://github.com/pb3lab/ibm3202/blob/master/tutorials/lab06_docking.ipynb) from the Institute for Biological and Engineering at Pontificia Universidad Catolica de Chile.

<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_02.png' />
<figcaption>FIGURE 2. General steps of molecular docking. First, the target protein and ligand or ligands are parameterized. Then, the system is prepared by setting up the search grid. Once the docking calculation is performed, ligand poses are scored based on a given energy function. Lastly, the computational search is processed and compared against experimental data for validation <br><i>Taken from Pars Silico (en.parssilico.com).</i></figcaption></center>
</figure>

#Part 0 – Downloading and Installing the required software

We must install several pieces of software to perform this tutorial. Namely:
- **py3Dmol** for visualization of the protein structure and setting up the search grid.
- **miniconda**, a free minimal installer of **conda** for software package and environment management.
- **OpenBabel** for parameterization of our ligand(s).
- **MGLtools** for parameterization of our target protein using Gasteiger charges.
- **Autodock Vina** for the docking process.
- **prolif** for visualizing docking results.

First we will install anything we need to with pip

In [None]:
#Installing py3Dmol using pip
!pip install py3Dmol
!pip install kora

Next, install the package manager conda. The next cell will restart the kernel. After the kernel restarts, you may continue from the next cell.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install_miniconda()

Finally, we will install MGLtools, OpenBabel, and AutoDock Vina

In [None]:
#Install MGLtools, OpenBabel, and AutoDock Vina from the bioconda repository
!conda install -c conda-forge -c bioconda mgltools openbabel zlib vina prody --yes

# Part I – Preparing the Receptor for AutoDock

The first step in a molecular docking procedure is to have a structure of a given target protein. While in some cases a high-quality homology model will be used, most cases start with an experimentally (X-ray, NMR, cryoEM) solved three-dimensional structure. For this exercise, we will use a homology model of MPro.

In [None]:
!wget -L https://raw.githubusercontent.com/CCBatIIT/modelingworkshop/main/labs-complete/1-2/bestmodel_aligned.pdb
!wget -L https://raw.githubusercontent.com/CCBatIIT/modelingworkshop/main/labs-complete/2-1/k5f10lg2t7_20220306/k5f10lg2t7.pqr

Now we will use **MGLtools** to convert the pqr file to the **PDBQT** format used by AutoDock.

In [None]:
# Convert the .pqr file into a .pdbqt file while deleting non-polar
# hydrogens but without changing the AMBER parameters added to the protein
!prepare_receptor4.py -r k5f10lg2t7.pqr -o receptor.pdbqt -C -U nphs_lps -v

Let's look at the structure using **py3Dmol**

In [None]:
import py3Dmol

viewer = py3Dmol.view()
viewer.addModel(open('k5f10lg2t7.pqr','r').read(), 'pdb')
viewer.setStyle({'cartoon':{}})
viewer.zoomTo()

#Part II – Preparing the Ligand for AutoDock

We now need to prepare the ligand that we will use for our docking analysis. In our case, we will use **nirmatrelvir**. This drug is an MPro inhibitor used for the treatment of COVID-19, aiding in decreasing the viral load. In this opportunity, we will attempt to predict the docking pose of nirmatrelvir onto the binding site of MPro.


First, we will define connectivity of nirmatrelvir using a SMILES string. A **Simplified Molecular-Input Line-Entry System** (SMILES) is a text notation that allows a user to represent a chemical structure in a way that can be used by the computer. The elemental notation for different types of bonds between different atoms is as follows:

  \-	for single bonds (eg. C-C or CC is CH3CH3)

  \=	for double bonds (eg. C=C for CH2CH2)

  \#	for triple bonds (eg. C#N for C≡N)

  \	for aromatic bond (eg. C\*1\*C\*C\*C\*C\*C1 or c1ccccc1 for benzene)

  \. for disconnected structures (eg. Na.Cl for NaCl)

  / and \ for double bond stereoisomers (eg. F/C=C/F for trans-1,2-difluoroethylene and F/C=C\F for cis-1,2-difluoroethylene)

  @ and @@ for enantiomers (eg. N\[C@@H](C)C(=O)O for L-alanine and N\[C@H](C)C(=O)O for D-alanine)

The SMILES string can be downloaded from [PubChem](https://pubchem.ncbi.nlm.nih.gov/compound/155903259).

In [None]:
nirmatrelvir_SMILES = "N#C[C@H](C[C@@H]1CCNC1=O)NC(=O)[C@H]1N(C[C@H]2[C@@H]1C2(C)C)C(=O)[C@H](C(C)(C)C)NC(=O)C(F)(F)F"
F = open('nirmatrelvir.smiles','w')
F.write(nirmatrelvir_SMILES)
F.close()

Next, we will take this SMILES format and use it to construct and parameterize a three-dimensional structure of nirmatrelvir in **PDBQT** format for its use in molecular docking. We will use the program **babel** to  convert the SMILES into a 3D **MOL2** file while simultaneously performing and energy minimization using the Generalized Amber Force Field (**GAFF**). Then, use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges.

  Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

In [None]:
#Converting nirmatrelvir from SMILES into a 3D MOL2 format and perform an energy minimization of the conformer using the GAFF forcefield
#Then, prepare ligand for docking using the Autodock script
!obabel nirmatrelvir.smiles -O nirmatrelvir.mol2 --gen3d --best --canonical --minimize --ff GAFF --steps 10000 --sd
!prepare_ligand4.py -l nirmatrelvir.mol2 -o nirmatrelvir.pdbqt -U nphs_lps -v

Let's look at our ligand using py3Dmol

In [None]:
viewer = py3Dmol.view()
viewer.addModel(open('nirmatrelvir.mol2','r').read(), 'mol2')
viewer.setStyle({'stick':{}})
viewer.zoomTo()

**You are all set with your ligand!** Now, we move onto setting up the molecular docking experiment

#Part III – Setting up and Performing Molecular Docking with AutoDock

It is necessary to define the search space for molecular docking on a given target protein through the use of a **grid box**. This grid box is usually centered within the binding, active or allosteric site of the target protein and its size will be sufficiently large such that **all binding residues are placed inside the grid box**.

  Here, we will make use of **py3Dmol** to visually inspect the protein structure in cartoon representation and to draw a grid box. The position and size of the grid box will be defined by the coordinates of its centroid and by its dimensions in x, y and z.

  To better guide the search for the optimal dimensions and coordinates of the grid box, we will also show the residues His 41 and Cys 145, which comprise the catalytic dyad.

  The script that defines the visualizer, which we called **ViewProtGrid**, is first loaded into **Colab** with the following lines of code

In [None]:
#These definitions will enable loading our protein and then
#drawing a box with a given size and centroid on the cartesian space
#This box will enable us to set up the system coordinates for the simulation
#
#ACKNOWLEDGE: This script is largely based on the one created by Jose Manuel 
#Napoles Duarte, Physics Teacher at the Chemical Sciences Faculty of the 
#Autonomous University of Chihuahua (https://github.com/napoles-uach)
#
#First, we define the grid box
def definegrid(object,cx,cy,cz,szx,szy,szz):
  object.addBox({'center':{'x':cx,'y':cy,'z':cz},'dimensions': {'w':szx,'h':szy,'d':szz},'color':'blue','opacity': 0.8})

#Next, we define how the protein will be shown in py3Dmol
#Note that we are also adding a style representation for active site residues
def viewprot(object,prot_PDBfile,resids):
  mol1 = open(prot_PDBfile, 'r').read()
  object.addModel(mol1,'pdb')
  object.setStyle({'cartoon': {'color':'spectrum'}})
  close_to_resi = {'within':{'distance':'7', 'sel':{'resi':resids}}}
  object.addStyle(close_to_resi,{'stick':{'colorscheme':'greenCarbon'}})
  object.addSurface(py3Dmol.VDW, {'opacity':0.6, 'color':'grey'}, {})

#Lastly, we combine the box grid and protein into a single viewer
def viewprotgrid(prot_PDBfile,resids,cx=0,cy=0,cz=0,szx=10,szy=10,szz=10):
  mol_view = py3Dmol.view(1000,600)
  viewprot(mol_view,prot_PDBfile,resids)
  definegrid(mol_view,cx,cy,cz,szx,szy,szz)
  mol_view.addArrow({'start': {'x':cx, 'y':cy, 'z':cz},
                  'end': {'x':cx+szx, 'y':cy, 'z':cz},
                  'radius': 0.5, 'color': 'red'})
  mol_view.addArrow({'start': {'x':cx, 'y':cy, 'z':cz},
                  'end': {'x':cx, 'y':cy+szy, 'z':cz},
                  'radius': 0.5, 'color': 'green'})
  mol_view.addArrow({'start': {'x':cx, 'y':cy, 'z':cz},
                  'end': {'x':cx, 'y':cy, 'z':cz+szz},
                  'radius': 0.5, 'color': 'blue'})
  mol_view.setBackgroundColor('0xffffff')
  mol_view.zoomTo()
  mol_view.show() 


2. Now, we will use our ViewProtGrid to visualize the protein, binding site residues and a grid box of variable size and position that we can manipulate using a slider through *ipywidgets*. You have to edit this viewer by indicating the location of the PDB file in the *prot_PDBfile* variable (e.g. singlepath/'1HSG_prot.pdb') and the residues that you want to show from the PDB in the *resids* variable.


Examples of how to use the *protein_PDBfile* variable
>prot_PDBfile = ['1HSG_prot.pdb'] (if the PDB file is in the current path)

>prot_PDBfile = [singlepath/'1HSG_prot.pdb'] (if the PDB file is in a path defined as singlepath)


Examples of how to use the *resids* variable

>resids = [82] shows a single residue in position 82)

>resids = [82,83,84] shows residues 82, 83 or 84 separately.

In [None]:
from ipywidgets import interact,fixed,IntSlider
import ipywidgets
interact(viewprotgrid,
# --> ADD YOUR PDB LOCATION AND FILENAME HERE
         prot_PDBfile = ['receptor.pdbqt'],
# --> ADD THE RESIDUES YOU WANT TO VISUALIZE HERE
         resids = [1, 2, 3],
         cx=ipywidgets.IntSlider(min=-100,max=100, step=1),
         cy=ipywidgets.IntSlider(min=-100,max=100, step=1),
         cz=ipywidgets.IntSlider(min=-100,max=100, step=1),
         szx=ipywidgets.IntSlider(min=0,max=30, step=1),
         szy=ipywidgets.IntSlider(min=0,max=30, step=1),
         szz=ipywidgets.IntSlider(min=0,max=30, step=1))
print("Box directions are x (red), y (green), z (blue)")

## # --> Take a screen shot 

that shows your grid around the MPro binding site. Save it in your `05-docking` folder as `grid.png`.

3. Now, we will generate a configuration file for **Autodock**. As expected, the configuration file contains information about the target protein and ligand, as well as the position and dimensions of the grid box that defines the search space.

  For defining the grid box, you will use the  box origin and size coordinates that you defined manually in the previous step.

  The following is an example file of a standard **Autodock configuration file**, including all possible variables that can be edited:


```
#CONFIGURATION FILE

#INPUT OPTIONS 
receptor = [target protein pdbqt file]
ligand = [ligand pdbqt file]
flex = [flexible residues in receptor in pdbqt format] 

#SEARCH SPACE CONFIGURATIONS 
#Center of the box (coordinates x, y and z 
center_x = [value] 
center_y = [value]
center_z = [value]
#Size of the box (dimensions in x, y and z) 
size_x = [value]
size_y = [value]
size_z = [value]

#OUTPUT OPTIONS 
#out = [output pdbqt file for all conformations]
#log = [output log file for binding energies]

#OTHER OPTIONS 
cpu = [value] # more cpus reduces the computation time
exhaustiveness = [value] # search time for finding the global minimum, default is 8
num_modes = [value] # maximum number of binding modes to generate, default is 9
energy_range = [value] # maximum energy difference between the best binding mode and the worst one displayed (kcal/mol), default is 3
seed = [value] # explicit random seed, not required
```

The following script will create this file for our docking procedure. **You will need to add the position and dimensions of your grid box**


In [None]:
with open("config_singledock","w") as f:
  f.write("#CONFIGURATION FILE (options not used are commented) \n")
  f.write("\n")
  f.write("#INPUT OPTIONS \n")
  f.write("receptor = receptor.pdbqt \n")
  f.write("ligand = nirmatrelvir.pdbqt \n")
  f.write("#flex = [flexible residues in receptor in pdbqt format] \n")
  f.write("#SEARCH SPACE CONFIGURATIONS \n")
  f.write("#Center of the box (values cx, cy and cz) \n")
# -->CHANGE THE FOLLOWING DATA WITH YOUR BOX CENTER COORDINATES  
  f.write("center_x = 0 \n")
  f.write("center_y = 0 \n")
  f.write("center_z = 0 \n")
# -->CHANGE THE FOLLOWING DATA WITH YOUR BOX DIMENSIONS
  f.write("#Size of the box (values szx, szy and szz) \n")
  f.write("size_x = 20 \n")
  f.write("size_y = 20 \n")
  f.write("size_z = 20 \n")
#MORE OPTIONS
  f.write("#OUTPUT OPTIONS \n")
  f.write("#out = \n")
  f.write("#log = \n")
  f.write("\n")
  f.write("#OTHER OPTIONS \n")
  f.write("#cpu =  \n")
  f.write("#exhaustiveness = \n")
  f.write("#num_modes = \n")
  f.write("#energy_range = \n")
  f.write("#seed = ")

We are now ready to **perform our first molecular docking with Autodock**.

  Once you execute the lines of code shown below, Autodock will show you a progress bar (if running as expected). **This simulation should not take longer than 5 min**.
  
  Note that we are defining the filenames of the output and log file outside the configuration file.

In [None]:
#Executing AutoDock Vina with our configuration file
!vina --config config_singledock --out output.pdbqt


Once the docking is done running, we will split these poses into separate PDB files using **babel**, starting with file numbered as 1 corresponding to the lowest-energy pose.

# # --> Write a short answer here

What is your lowest binding free energy, in kcal/mol? Convert this to a dissociation constant, in molar.

In [None]:
#We need to convert our Autodock Vina results from pdbqt into pdb
#For this, we use babel
!obabel -ipdbqt output.pdbqt -opdb -O nirmatrelvir_dock.pdb -m

# Part IV - Visualizing docking poses

The Docking result can now be visualized! As a comparison, we will align the docked structure to a crystal structure 7VH8.

In [None]:
import prody

In [None]:
import prody
xtal_PDB = prody.parsePDB('7vh8')
mobile = xtal_PDB.select('not water')
receptor_PDB = prody.parsePDB('bestmodel_aligned.pdb', chain='B')
prody.matchAlign(mobile, receptor_PDB, overlap=40) # matchAlign(mobile, target, **kwargs)
protein = mobile.select('protein')
ligand = mobile.select('not protein')
prody.writePDB(f'protein_aligned.pdb', protein)
prody.writePDB(f'ligand_aligned.pdb', ligand)

In [None]:
import os

#View docking results
import py3Dmol
mol_view = py3Dmol.view(1000,600)
mol1 = open('receptor.pdbqt', 'r').read()
mol2 = open('nirmatrelvir_dock1.pdb', 'r').read()
mol3 = open('protein_aligned.pdb', 'r').read()
mol4 = open('ligand_aligned.pdb', 'r').read()
mol_view.addModel(mol1,'pdb')
mol_view.setStyle({'cartoon': {'color':'purple'}})
mol_view.addStyle({'within':{'distance':'7', 'sel':{'resi':41}}},{'stick':{'colorscheme':'purpleCarbon'}})
mol_view.addModel(mol2,'pdb')
mol_view.setStyle({'model':2},{'stick':{'colorscheme':'pinkCarbon'}})
mol_view.setBackgroundColor('0xffffff')
mol_view.zoomTo({'model':2})
mol_view.addModel(mol3,'pdb')
mol_view.setStyle({'model':3}, {'cartoon': {'color':'yellow'}})
mol_view.addModel(mol4,'pdb')
mol_view.setStyle({'model':4},{'stick':{'colorscheme':'yellowCarbon'}})
mol_view.show()

If you have time, try comparing your docking results with a crystal structure, e.g. 7VH8.

## # --> Take a screen shot

that shows the ligand in the MPro binding site. Save it in your 05-docking folder as `docked.png`.

# # --> Write a short answer here

How does your docked pose compare to the crystal structure?

**And this is the end of the docking tutorial!**