#Lab.06 / IBM3202 – Molecular Docking on AutoDock

###Theoretical aspects

As more and more protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, molecular docking is increasingly used as a tool in **drug discovery**.

**Molecular docking simulations** explore the potential binding poses of small molecules on the **binding site** of a target protein for which an experimentally determined structure is available. 
Docking against protein targets generated by **comparative modelling** also becomes possible for proteins whose structures are yet to be solved.

Thus, the **_druggability_** of different compounds and their binding affinity on a given protein target can be calculated for further lead optimization processes.

<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_01.png'/>
<figcaption>FIGURE 1. In molecular docking, binding is evaluated in two steps: A) Energetics of the transition of the unbound states of ligand and target towards the conformations of the bound complex; and B) energetics of protein-ligand binding in these conformations. <br> Huey R et al (2007) <i>J Comput Chem 28(6), 1145-1152.</i></figcaption></center>
</figure>

Molecular docking programs perform a **search algorithm** in which varying conformations of a given ligand, typically generated using Monte Carlo or Genetic algorithms, are recursively evaluated until convergence to an energy minimum is reached. Finally, through an **affinity scoring function**, a ΔG [binding free energy in kcal/mol] is estimated and employed to rank the candidate poses as the sum of several energetic contributions (electrostatics, van der Waals, desolvation, etc).

##Experimental Overview

Typically, in this laboratory session we dock a dimeric analog of the plastic polymer PET (2PET) onto the binding site of PET hydrolase from *I. sakaiensis*, a PET-degrading enzyme discovered in Japan in 2016 for which our lab solved its three-dimensional structure [Fecker T et al (2018) *Biophys J 114 (6), 1302-1312*].

However, inspired by the COVID-19 pandemic, in this laboratory session we will perfom a docking assay of **indinavir**, an active component of the **antiretroviral therapy to treat HIV**, onto the binding site of its target protein, **HIV-2 protease**. 

For our laboratory session, we will install and use **MGLtools** (and alternatively **pdb2pqr**) to prepare the target protein files, **OpenBabel** to prepare the ligand files, **AutoDock Vina** for the docking procedure and **py3Dmol** to establish the appropriate search grid configuration and analyze the results.

<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_02.png' />
<figcaption>FIGURE 2. General steps of molecular docking. First, the target protein and ligand or ligands are parameterized. Then, the system is prepared by setting up the search grid. Once the docking calculation is performed, ligand poses are scored based on a given energy function. Lastly, the computational search is processed and compared against experimental data for validation <br><i>Taken from Pars Silico (en.parssilico.com).</i></figcaption></center>
</figure>

#Part 0 – Downloading and Installing the required software

Before we start, you must first **remember to start the hosted runtime in Google Colab**.

Then, we must install several pieces of software to perform this tutorial. Namely:
- **py3Dmol** for visualization of the protein structure and setting up the search grid.
- **miniconda**, a free minimal installer of **conda** for software package and environment management.
- **OpenBabel** for parameterization of our ligand(s).
- **MGLtools** for parameterization of our target protein using Gasteiger charges.
- **pdb2pqr** for parameterization of our protein using the AMBER ff99 forcefield.
- **Autodock Vina** for the docking process

After several tests, the following installation instructions are the best way of setting up **Google Colab** for this laboratory session.

1. We will first install py3Dmol as follows:

In [5]:
import os
from google.colab import drive
drive.mount('/content/drive')

workshop_dir = '/content/drive/MyDrive/2022S_MODELING_WORKSHOP/'
if not os.path.isdir(workshop_dir):
  os.mkdir(workshop_dir)

os.chdir(workshop_dir)
if not os.path.isdir(workshop_dir + 'modelingworkshop'):
  !git clone https://github.com/CCBatIIT/modelingworkshop

lab_dir = workshop_dir + '2022S_MM_SESSION_2_2' #SPECIFIC TO EACH LAB
if not os.path.isdir(lab_dir):
  os.mkdir(lab_dir)
os.chdir(lab_dir)

#Installing py3Dmol using pip
!pip install py3Dmol
#We will also install kora for using RDkit
!pip install kora

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/bin/bash: /usr/local/lib/libtinfo.so.5: no version information available (required by /bin/bash)
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 7, in <module>
    from pip import main
ImportError: No module named pip
/bin/bash: /usr/local/lib/libtinfo.so.5: no version information available (required by /bin/bash)
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 7, in <module>
    from pip import main
ImportError: No module named pip


In [2]:
#Importing py3Dmol for safety
import py3Dmol

2. Then, we will install PDB2PQR using apt-get as follows:

In [3]:
#Installing pdb2pqr using apt-get
!apt-get install -y pdb2pqr

Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  python-decorator python-networkx python-pkg-resources python-yaml
Suggested packages:
  apbs python-matplotlib python-pydotplus python-scipy python-pygraphviz
  | python-pydot python-setuptools
The following NEW packages will be installed:
  pdb2pqr python-decorator python-networkx python-pkg-resources python-yaml
0 upgraded, 5 newly installed, 0 to remove and 39 not upgraded.
Need to get 1,431 kB of archives.
After this operation, 7,952 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python-decorator all 4.1.2-1 [9,300 B]
Get:2 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 python-networkx all 1.11-1ubuntu3 [804 kB]
Get:3 http://archive.ubuntu.com/ubuntu b

In [4]:
#Checking that pdb2pqr was properly installed
!pdb2pqr --help

Usage: pdb2pqr [options] PDB_PATH PQR_OUTPUT_PATH

This module takes a PDB file as input and performs optimizations before
yielding a new PQR-style file in PQR_OUTPUT_PATH. If PDB_PATH is an ID it will
automatically be obtained from the PDB archive.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit

  Manditory options:
    One of the following options must be used.

    --ff=FIELD_NAME     The forcefield to use - currently amber, charmm,
                        parse, tyl06, peoepb and swanson are supported.
    --userff=USER_FIELD_FILE
                        The user created forcefield file to use. Requires
                        --usernames overrides --ff
    --clean             Do no optimization, atom addition, or parameter
                        assignment, just return the original PDB file in
                        aligned format. Overrides --ff and --userff

  General options:
    --nodebump     

3. And then we will install conda to be able to install MGLtools and OpenBabel

In [5]:
#Install conda using the new conda-colab library
!pip install -q condacolab
import condacolab
condacolab.install_miniconda()

#Install MGLtools and OpenBabel from
#the bioconda repository
!conda install -c conda-forge -c bioconda mgltools openbabel zlib --yes

⏬ Downloading https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:28
🔁 Restarting kernel...
Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
Solving environment: / - \ | / - \ | / - \ | / done


  current version: 4.9.2
  latest version: 4.11.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment locati

4. Finally, we will download the Autodock Vina program from the Scripps website and make an alias to use it during this session

In [2]:
#Download and extract Autodock Vina from SCRIPPS
#Then, we set up an alias for vina to be treated as a native binary
%%bash
wget http://vina.scripps.edu/download/autodock_vina_1_1_2_linux_x86.tgz
tar xzvf autodock_vina_1_1_2_linux_x86.tgz

bash: /usr/local/lib/libtinfo.so.5: no version information available (required by bash)
wget: /usr/local/lib/libuuid.so.1: no version information available (required by wget)
URL transformed to HTTPS due to an HSTS policy
--2022-03-06 17:55:00--  https://vina.scripps.edu/download/autodock_vina_1_1_2_linux_x86.tgz
Resolving vina.scripps.edu (vina.scripps.edu)... 192.26.252.19
Connecting to vina.scripps.edu (vina.scripps.edu)|192.26.252.19|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-03-06 17:55:01 ERROR 404: Not Found.

tar (child): autodock_vina_1_1_2_linux_x86.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now


In [7]:
alias vina /content/autodock_vina_1_1_2_linux_x86/bin/vina
alias vina_split /content/autodock_vina_1_1_2_linux_x86/bin/vina_split

SyntaxError: ignored

**⚠️WARNING:** We will be soon updated to the new Autodock Vina v1.2.2, which can be found [here](https://github.com/ccsb-scripps)

Once these software installation processes are completed, we are ready to perform our experiments

#Part 1 – Downloading and Preparing the Receptor for AutoDock

1. The first step in a molecular docking procedure is to have a structure of a given target protein. While in some cases a high-quality comparative model will be used, most cases start with an experimentally (X-ray, NMR, cryoEM) solved three-dimensional structure. 

  In such cases, a given target protein structure is downloaded from the **Protein Data Bank (PDB)** (https://www.rcsb.org/pdb) using a given accession ID. For example, the PET hydrolase solved by our lab has the accession ID 6ANE.

  For this tutorial, we will use the structure of the HIV-2 protease, solved using X-ray crystallography and deposited in the PDB with the accession ID 1HSG. We can directly download this structure from the PDB using *wget* and extracting it using *gzip*:

In [3]:
#Download a protein structure from PDB using and extract it
#!wget http://www.rcsb.org/pdb/files/[PDB_ID].pdb.gz
#!gzip -d [PDB_ID].pdb.gz 
#DO IT YOURSELF!
!wget http://www.rcsb.org/pdb/files/1HSG.pdb.gz
!gzip -d 1HSG.pdb.gz 

/bin/bash: /usr/local/lib/libtinfo.so.5: no version information available (required by /bin/bash)
wget: /usr/local/lib/libuuid.so.1: no version information available (required by wget)
--2022-03-06 17:57:10--  http://www.rcsb.org/pdb/files/1HSG.pdb.gz
Resolving www.rcsb.org (www.rcsb.org)... 128.6.159.248
Connecting to www.rcsb.org (www.rcsb.org)|128.6.159.248|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.rcsb.org/pdb/files/1HSG.pdb.gz [following]
--2022-03-06 17:57:10--  https://www.rcsb.org/pdb/files/1HSG.pdb.gz
Connecting to www.rcsb.org (www.rcsb.org)|128.6.159.248|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://files.rcsb.org/download/1HSG.pdb.gz [following]
--2022-03-06 17:57:10--  https://files.rcsb.org/download/1HSG.pdb.gz
Resolving files.rcsb.org (files.rcsb.org)... 128.6.158.70
Connecting to files.rcsb.org (files.rcsb.org)|128.6.158.70|:443... connected.
HTTP request sen

2. In the case of X-ray diffraction, this experimental strategy does not discriminate between electron density coming from static protein atoms or water molecules, meaning that most protein structures solved by X-ray diffraction also include so-called **crystallographic waters** (check the non-bonded red dots on the protein structure below). These molecules are not important for our particular docking simulation, and we also have to remove them.
<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_03.png' />
<figcaption>FIGURE 3. Cartoon representation of HIV-2 protease dimer (PDB accession ID 1HSG), with its N-to-C-terminal residues colored from blue to red in rainbow spectrum, showing the crystallographic waters as red spheres.</i></figcaption></center>
</figure>

  Typically, this can be easily done by extracting all of the lines from the PDB file that start with **"ATOM"**, as this is how all of the atoms that belong to amino and nucleic acid residues are termed. In contrasts, the atoms from ligands, water molecules and other non-protein/non-nucleic residues are commonly referred to as **"HETATM"**. Also, the different chains of an oligomer are separated by a **"TER"** string, which is important to keep in our case.

  The following **python script** will first create a folder in which we will store all data related to our molecular docking experiment. Then, it will extract all lines matching the string "ATOM" (for the protein atoms) or "TER" (for the chain separations) into a separate PDB file for further processing. Please take a good look at it.

In [4]:
#This script will create a folder called "single-docking" for our experiment
#Then, it will print all "ATOM" and "TER" lines from a given PDB into a new file

#Let's make a folder first. We need to import the os and path library
from pathlib import Path 

#Then, we define the path of the folder we want to create.
#Notice that the HOME folder for a hosted runtime in colab is /content/
singlepath = Path(lab_dir + "/single-dock/")

#Now, we create the folder using the os.mkdir() command
#The if conditional is just to check whether the folder already exists
#In which case, python returns an error
if os.path.exists(singlepath):
  print("path already exists")
if not os.path.exists(singlepath):
  os.mkdir(singlepath)
  print("path was succesfully created")

#Now we assign a variable "protein" with the name and extension of our pdb
protein = "1HSG.pdb"

#And we use the following script to selectively print the lines that contain the
#string "ATOM" and "TER" into a new file inside our recently created folder
with open(singlepath / "1HSG_prot.pdb","w") as g:
  f = open(protein,'r')
  for line in f:
    row = line.split()
    if row[0] == "ATOM":
      g.write(line)
    elif row[0] == "TER":
      g.write("TER\n")
  g.write("END")
  print("file successfully created")

NameError: ignored

3. Once we printed out the "ATOM" lines of the parent PDB file, we have a new file that contains  the coordinates of our protein target.

  However, for AutoDock to perform a molecular docking experiment, the protein target must contain information about the partial charges of each atom and atom types that are compatible with AutoDock. Such format is referred to as **PDBQT**, a modification of the PDB format that also includes **charges (q)** and **AutoDock-specific atom types (t)** in two extra columns at the end of the now PDBQT file.

  Lastly, the protein target must contain **all polar hydrogens**. Most protein structures have no hydrogens included, meaning that we must add them. 

  This is the part of the tutorial where you have **two different options** to proceed with your experiment!

3. a) Add the polar hydrogens of your protein and parameterize it with **Gasteiger** charges and atom types using **MGLtools** (this is the canonical option for the majority of AutoDock users)

In [None]:
#Parameterizing and adding Gasteiger charges into our protein using MGLtools
!prepare_receptor4.py -r $singlepath/1HSG_prot.pdb -o $singlepath/1HSG_prot.pdbqt -A hydrogens -U nphs_lps -v

3. b) Add the polar hydrogens of your protein and parameterize it based on the pKa of each aminoacid at pH 7.4 with the **AMBER99ff** force field using **pdb2pqr**, followed by deletion of non-polar hydrogens and conversion into **PDBQT** file using **MGLtools**.

  In this case, pdb2pqr generates an intermediate **PQR** file, a modification of the PDB format which allows users to add charge and radius parameters to existing PDB data. This information is then unaltered during the use of **MGLtools**.

In [None]:
#First, using pdb2pqr to parameterize our receptor with AMBER99ff, maintaining
#the chain IDs and setting up the receptor at a pH of 7.4
!pdb2pqr --ff=amber --chain --with-ph=7.4 --verbose $singlepath/1HSG_prot.pdb $singlepath/1HSG_prot.pqr

#Then, convert the .pqr file into a .pdbqt file while deleting non-polar
#hydrogens but without changing the AMBER parameters added to the protein
!prepare_receptor4.py -r $singlepath/1HSG_prot.pqr -o $singlepath/1HSG_prot.pdbqt -C -U nphs_lps -v

**You are all set with your target protein!** 

>Before we move onto preparing the ligand for molecular docking, please consider the following questions:
- Why is it important to add hydrogens for the purposes of our docking simulations?
- Why are we only adding polar hydrogens?
- Do you believe that using different force fields could have an effect on your docking results? (this is something you can actually test!)



#Part 2 – Downloading and Preparing the Ligand for AutoDock

1. We now need to prepare the ligand that we will use for our docking analysis. In our case, we will use **Indinavir**. This drug is a protease inhibitor used as a component of the antiretroviral therapy to treat HIV/AIDS, aiding in decreasing the viral load. In this opportunity, we will attempt to predict the docking pose of indinavir onto the binding site of the HIV-2 protease.

  We will first start by creating a folder in which we will store our ligands for molecular docking.

In [None]:
#Let's make a folder first. We need to import the os and path library
import os
from pathlib import Path

#We will first create a path for all ligands that we will use in this tutorial
#Notice that the HOME folder for a hosted runtime in colab is /content/
ligandpath = Path("/content/ligands/")

#Now, we create the folder using the os.mkdir() command
#The if conditional is just to check whether the folder already exists
#In which case, python returns an error
if os.path.exists(ligandpath):
  print("ligand path already exists")
if not os.path.exists(ligandpath):
  os.mkdir(ligandpath)
  print("ligand path was succesfully created")

2. Now, we will download indinavir from the **DrugBank** database (*Nucleic Acids Res
. 2006; 34, D668-72*). This is comprehensive, freely accessible, online database containing information on drugs and drug targets. You can actually check the detailed chemical, pharmacological and pharmaceutical information on Indinavir [in this DrugBank link](https://www.drugbank.ca/drugs/DB00224).

  We will download this ligand in SMILES format to continue with its preparation for molecular docking



In [None]:
#Downloading Indinavir from the DrugBank database in SMILES format
!wget https://www.drugbank.ca/structures/small_molecule_drugs/DB00224.smiles -P $ligandpath

3. Hey! But what is a SMILES format? Well, the **Simplified Molecular-Input Line-Entry System** (SMILES) is a text notation that allows a user to represent a chemical structure in a way that can be used by the computer. The elemental notation for different types of bonds between different atoms is as follows:

  \-	for single bonds (eg. C-C or CC is CH3CH3)

  \=	for double bonds (eg. C=C for CH2CH2)

  \#	for triple bonds (eg. C#N for C≡N)

  \	for aromatic bond (eg. C\*1\*C\*C\*C\*C\*C1 or c1ccccc1 for benzene)

  \. for disconnected structures (eg. Na.Cl for NaCl)

  / and \ for double bond stereoisomers (eg. F/C=C/F for trans-1,2-difluoroethylene and F/C=C\F for cis-1,2-difluoroethylene)

  @ and @@ for enantiomers (eg. N\[C@@H](C)C(=O)O for L-alanine and N\[C@H](C)C(=O)O for D-alanine)

  **Let's take a look at the SMILES of Indinavir!**

In [None]:
#Print the SMILES of indinavir to see what it is all about
print((ligandpath / "DB00224.smiles").read_text())

In [None]:
#Use the following viewer to load your SMILES as a 3D molecule
import py3Dmol
import kora.install.rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

def MolTo3DView(mol, size=(300, 300), style="stick", surface=False, opacity=0.5):
    assert style in ('line', 'stick', 'sphere', 'carton')
    mblock = Chem.MolToMolBlock(mol)
    viewer = py3Dmol.view()
    viewer.addModel(mblock, 'mol')
    viewer.setStyle({style:{}})
    if surface:
        viewer.addSurface(py3Dmol.SAS, {'opacity': opacity})
    viewer.zoomTo()
    return viewer

from ipywidgets import interact,fixed,IntSlider
import ipywidgets

def smi2conf(smiles):
    '''Convert SMILES to rdkit.Mol with 3D coordinates'''
    mol = Chem.MolFromSmiles(smiles)
    if mol is not None:
        mol = Chem.AddHs(mol)
        AllChem.EmbedMolecule(mol)
        AllChem.MMFFOptimizeMolecule(mol, maxIters=200)
        return mol
    else:
        return None

@interact
def smi2viewer(smi='CC=O'):
    try:
        conf = smi2conf(smi)
        return MolTo3DView(conf).show()
    except:
        return None

4. Now, we will take this SMILES format and use it to construct and parameterize a three-dimensional structure of Indinavir in **PDBQT** format for its use in molecular docking. As with the receptor, we also have different options to prepare our ligand for molecular docking:

4. a) Use the program **babel** to convert the SMILES into a **MOL2** file without any extra work (such as searching for best conformers) except for setting the protonation state to pH 7.4, and then use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges (this is the canonical option for the majority of AutoDock users).

  Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

In [None]:
#Converting Indinavir from SMILES into a 3D PDB format
!obabel $ligandpath/DB00224.smiles -O indinavir.mol2 --gen3d best -p 7.4 --canonical
#Parameterizing and adding Gasteiger charges into our protein using MGLtools
#Adding -z leads to a rigid ligand without any torsions
!prepare_ligand4.py -l indinavir.mol2 -o $singlepath/indinavir.pdbqt -U nphs_lps -v
#NOTE: for some reason, MGLtools does not recognize the ligand when inside a different folder
#Here we are deleting the temporary PDB file required for generating the PDBQT file
os.remove("indinavir.mol2")

4. b) Use the program **babel** to  convert the SMILES into a 3D **MOL2** file while simultaneously performing and energy minimization using the Generalized Amber Force Field (**GAFF**). Then, use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges.

  Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

In [None]:
#Converting Indinavir from SMILES into a 3D MOL2 format and perform an energy minimization of the conformer using the GAFF forcefield
#Then, prepare ligand for docking using the Autodock script
!obabel $ligandpath/DB00224.smiles -O indinavir.mol2 --gen3d --best --canonical --minimize --ff GAFF --steps 10000 --sd
!prepare_ligand4.py -l indinavir.mol2 -o $singlepath/indinavir.pdbqt -U nphs_lps -v
os.remove("indinavir.mol2")

4. c) Use the program **babel** to  convert the SMILES into a 3D **MOL2** file while simultaneously performing a weighted rotor search for the lowest energy conformer using the Generalized Amber Force Field (**GAFF**). Then, use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges.

  Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

In [None]:
#Converting Indinavir from SMILES into a 3D MOL2 format and perform a weighted rotor search for lowest energy conformer
#Then, prepare ligand for docking using the Autodock script
!obabel $ligandpath/DB00224.smiles -O indinavir.mol2 --gen3d --best --canonical --conformers --weighted --nconf 50 --ff GAFF
!prepare_ligand4.py -l indinavir.mol2 -o $singlepath/indinavir.pdbqt -U nphs_lps -v
os.remove("indinavir.mol2")

**You are all set with your ligand!** Now, we move onto setting up the molecular docking experiment

#Part 3 – Setting up and Performing Molecular Docking with AutoDock

1. As explained in the lectures, it is necessary to define the search space for molecular docking on a given target protein through the use of a **grid box**. This grid box is usually centered within the binding, active or allosteric site of the target protein and its size will be sufficiently large such that **all binding residues are placed inside the grid box**.

  Here, we will make use of **py3Dmol** to visually inspect the protein structure in cartoon representation and to draw a grid box. The position and size of the grid box will be defined by the coordinates of its centroid and by its dimensions in x, y and z.

  To better guide the search for the optimal dimensions and coordinates of the grid box, we will also show the residues Val32, Ile47 and Val82 of HIV-2 protease.

  The script that defines the visualizer, which we called **ViewProtGrid**, is first loaded into **Colab** with the following lines of code

In [None]:
#These definitions will enable loading our protein and then
#drawing a box with a given size and centroid on the cartesian space
#This box will enable us to set up the system coordinates for the simulation
#
#HINT: The active site of the HIV-2 protease is near the beta strands in green
#
#ACKNOWLEDGE: This script is largely based on the one created by Jose Manuel 
#Napoles Duarte, Physics Teacher at the Chemical Sciences Faculty of the 
#Autonomous University of Chihuahua (https://github.com/napoles-uach)
#
#First, we define the grid box
def definegrid(object,bxi,byi,bzi,bxf,byf,bzf):
  object.addBox({'center':{'x':bxi,'y':byi,'z':bzi},'dimensions': {'w':bxf,'h':byf,'d':bzf},'color':'blue','opacity': 0.6})

#Next, we define how the protein will be shown in py3Dmol
#Note that we are also adding a style representation for active site residues
def viewprot(object,prot_PDBfile,resids):
  mol1 = open(prot_PDBfile, 'r').read()
  object.addModel(mol1,'pdb')
  object.setStyle({'cartoon': {'color':'spectrum'}})
  object.addStyle({'resi':resids},{'stick':{'colorscheme':'greenCarbon'}})

#Lastly, we combine the box grid and protein into a single viewer
def viewprotgrid(prot_PDBfile,resids,bxi,byi,bzi,bxf=10,byf=10,bzf=10):
  mol_view = py3Dmol.view(1000,600)
  definegrid(mol_view,bxi,byi,bzi,bxf,byf,bzf)
  viewprot(mol_view,prot_PDBfile,resids)
  mol_view.setBackgroundColor('0xffffff')
  mol_view.zoomTo()
  mol_view.show() 


2. Now, we will use our ViewProtGrid to visualize the protein, binding site residues and a grid box of variable size and position that we can manipulate using a slider through *ipywidgets*. You have to edit this viewer by indicating the location of the PDB file in the *prot_PDBfile* variable (e.g. singlepath/'1HSG_prot.pdb') and the residues that you want to show from the PDB in the *resids* variable.


Examples of how to use the *protein_PDBfile* variable
>prot_PDBfile = ['1HSG_prot.pdb'] (if the PDB file is in the current path)

>prot_PDBfile = [singlepath/'1HSG_prot.pdb'] (if the PDB file is in a path defined as singlepath)


Examples of how to use the *resids* variable

>resids = [82] shows a single residue in position 82)

>resids = [82,83,84] shows residues 82, 83 or 84 separately, which you can select in the viewer

>resids = [(82,83,84)] shows residue 82, 83 and 84 in the same visualization

>resids = ['82-84'] shows residue range 82-84 in the same visualization

**NOTE:** This code fails when attempting to show two non-consecutive residues in the same visualization.


In [None]:
from ipywidgets import interact,fixed,IntSlider
import ipywidgets
interact(viewprotgrid,
#ADD YOUR PDB LOCATION AND FILENAME HERE
         prot_PDBfile = [CHANGEME],
#ADD THE RESIDUES YOU WANT TO VISUALIZE HERE
         resids = [CHANGEME],
         bxi=ipywidgets.IntSlider(min=-100,max=100, step=1),
         byi=ipywidgets.IntSlider(min=-100,max=100, step=1),
         bzi=ipywidgets.IntSlider(min=-100,max=100, step=1),
         bxf=ipywidgets.IntSlider(min=0,max=30, step=1),
         byf=ipywidgets.IntSlider(min=0,max=30, step=1),
         bzf=ipywidgets.IntSlider(min=0,max=30, step=1))

3. Now, we will generate a configuration file for **Autodock**. As expected, the configuration file contains information about the target protein and ligand, as well as the position and dimensions of the grid box that defines the search space.

  For defining the grid box, you will use the  box origin and size coordinates that you defined manually in the previous step.

  The following is an example file of a standard **Autodock configuration file**, including all possible variables that can be edited:


```
#CONFIGURATION FILE

#INPUT OPTIONS 
receptor = [target protein pdbqt file]
ligand = [ligand pdbqt file]
flex = [flexible residues in receptor in pdbqt format] 

#SEARCH SPACE CONFIGURATIONS 
#Center of the box (coordinates x, y and z 
center_x = [value] 
center_y = [value]
center_z = [value]
#Size of the box (dimensions in x, y and z) 
size_x = [value]
size_y = [value]
size_z = [value]

#OUTPUT OPTIONS 
#out = [output pdbqt file for all conformations]
#log = [output log file for binding energies]

#OTHER OPTIONS 
cpu = [value] # more cpus reduces the computation time
exhaustiveness = [value] # search time for finding the global minimum, default is 8
num_modes = [value] # maximum number of binding modes to generate, default is 9
energy_range = [value] # maximum energy difference between the best binding mode and the worst one displayed (kcal/mol), default is 3
seed = [value] # explicit random seed, not required
```

The following script will create this file for our docking procedure. **You will need to add the position and dimensions of your grid box**


In [None]:
with open(singlepath / "config_singledock","w") as f:
  f.write("#CONFIGURATION FILE (options not used are commented) \n")
  f.write("\n")
  f.write("#INPUT OPTIONS \n")
  f.write("receptor = 1HSG_prot.pdbqt \n")
  f.write("ligand = indinavir.pdbqt \n")
  f.write("#flex = [flexible residues in receptor in pdbqt format] \n")
  f.write("#SEARCH SPACE CONFIGURATIONS \n")
  f.write("#Center of the box (values bxi, byi and bzi) \n")
#CHANGE THE FOLLOWING DATA WITH YOUR BOX CENTER COORDINATES  
  f.write("center_x = CHANGEME \n")
  f.write("center_y = CHANGEME \n")
  f.write("center_z = CHANGEME \n")
#CHANGE THE FOLLOWING DATA WITH YOUR BOX DIMENSIONS
  f.write("#Size of the box (values bxf, byf and bzf) \n")
  f.write("size_x = CHANGEME \n")
  f.write("size_y = CHANGEME \n")
  f.write("size_z = CHANGEME \n")
  f.write("#OUTPUT OPTIONS \n")
  f.write("#out = \n")
  f.write("#log = \n")
  f.write("\n")
  f.write("#OTHER OPTIONS \n")
  f.write("#cpu =  \n")
  f.write("#exhaustiveness = \n")
  f.write("#num_modes = \n")
  f.write("#energy_range = \n")
  f.write("#seed = ")

4. Lastly, we will enter into the folder that we created for the docking experiment and **perform our first molecular docking with Autodock**.

  Once you execute the lines of code shown below, Autodock will show you a progress bar (if running as expected). **This simulation should not take longer than 5 min**.
  
  Note that we are defining the filenames of the output and log file outside the configuration file.

In [None]:
#Changing directory to the single docking folder
os.chdir(singlepath)
#Executing AutoDock Vina with our configuration file
%vina --config config_singledock --out output.pdbqt --log log.txt
#Exiting the execution directory
os.chdir("/content/")

5. Once the molecular docking has finished running, we will compare the docking poses with the experimentally solved pose for indinavir. In fact, the structure of HIV-2 protease that you downloaded at the beginning of this tutorial was solved with indinavir bound to it.

  The following lines of code are similar to what we did with extracting the 'ATOM' lines of the PDB file, but now we are extracting the lines containing **'MK1'**, the name of the ligand in this PDB file.

In [None]:
#Here, we will be extracting Indinavir, which is present in the structure of
#HIV-2 protease (yes! this is a simulation with experimental validation!)
#The approach is similar to printing the ATOM and TER lines, but we are using
#the residue name given to the ligand in the experimentally solved structure: MK1
protein = "1HSG.pdb"

with open(singlepath/"xtal_ligand.pdb","w") as g:
  f = open(protein,'r')
  for line in f:
    row = line.split()
    if "MK1" in row:
      g.write(line)
  g.write("END")

6. We also need the different docking poses generated as a result of the molecular docking simulation. We will split these poses into separate PDB files using **babel**, starting with file numbered as 1 corresponding to the lowest-energy pose.

In [None]:
#We need to convert our Autodock Vina results from pdbqt into pdb
#For this, we use babel
!obabel -ipdbqt $singlepath/output.pdbqt -opdb -O $singlepath/indinavir_dock.pdb -m

7. Finally, we create another visualizer (**ViewDocking**) to load our protein, any docking pose of our choice and the experimentally solved binding pose of indinavir.

In [None]:
#We finally create a visualization of the protein as cartoon,
#the lowest-energy docking pose with its carbons in green
#and the experimental binding pose with its carbons in gray
def viewdocking(protein_name,ligand_name,exp_name):
  mview = py3Dmol.view(800, 400)  
  mol1 = open(protein_name, 'r').read()
  mol2 = open(ligand_name, 'r').read()
  mol3 = open(exp_name, 'r').read()
  mview.addModel(mol1,'pdb')
  mview.setStyle({'cartoon': {'color':'spectrum'}})
  mview.addModel(mol2,'pdb')
  mview.setStyle({'model':1},{'stick':{'colorscheme':'greenCarbon'}})
  mview.addModel(mol3,'pdb')
  mview.setStyle({'resn':'MK1'},{'stick':{}})
  mview.setBackgroundColor('0xffffff')
  mview.zoomTo()
  mview.show()

8. The ViewDocking visualizer can then be used as indicated below.

In [None]:
#View docking results
#viewdocking('protein_file','docked_ligand_file','exp_ligand_file')
#DO IT YOURSELF!


How similar is your docking pose when compared to the experimentally solved one?

**📚HOMEWORK:** Remember that redocking, i.e. a molecular docking simulation in which the ligand bound to the target is used as the starting conformation for the docking procedure, is commonly used as a control. Based on this information, plan a control simulation using the ligand that you just extracted in this tutorial

>```
>#Example of preparation of the experimentally solved ligand pose for redocking
>
>os.chdir(singlepath)
>!pythonsh /usr/local/bin/prepare_ligand4.py -l xtal_ligand.pdb -o $singlepath/xtal_ind.pdbqt -A hydrogens -U nphs_lps -v
>
>#Then, you can essentially use the same configuration files.
>```

 

**And this is the end of the sixth tutorial!** If you want to download your results, you can compress them into a zip file for manual download.

In [None]:
!zip -r singledocking.zip $singlepath
#By default, automatic download is enabled through the following lines
#but you need to disable your adblocker in order for it to work
from google.colab import files
files.download("/content/singledocking.zip")