# Molecular Docking on AutoDock

**Authors**:
- *Creation*: Engelberger F, Galaz-Davison P, Bravo G, Rivera M, Ramírez-Sarmiento CA   
- *Updates*: Dr Antonia Mey (antonia.mey@ed.ac.uk) and Rohan Gorantla

This notebook was adapted from the original *Lab.06 / IBM3202 – Molecular Docking on AutoDock*. It forms part of an excellent bioinformatics series which can be found [here](https://github.com/pb3lab/ibm3202). This was released under a MIT license.
<a rel="license" href="https://opensource.org/licenses/MIT "><img alt="MIT Licence" src="https://img.shields.io/badge/License-MIT-yellow.svg" align="right"/></a> 

*Citation*: Engelberger F, Galaz-Davison P, Bravo G, Rivera M, Ramírez-Sarmiento CA (2021) Developing and Implementing Cloud-Based Tutorials that Combine Bioinformatics Software, Interactive Coding and Visualization Exercises for Distance Learning on Structural Bioinformatics. J Chem Educ 98(5): 1801-1807. DOI: [10.1021/acs.jchemed.1c00022](https://pubs.acs.org/doi/10.1021/acs.jchemed.1c00022)
<br></br>
Updated material is available under license.
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons Licence" style="width=50" src="https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

**Learning Objectives:**
  - Learn how to create a protein-ligand complex from a crystal structure and a SMILES string
  - prepare a protein for docking with Autodock Vina
  - Prepare a series of ligands for Docking
  - Understand docking results by computing RMSDs and comparing scores against experimental results

**Jupyter cheat sheet:**
- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;

<div class="alert alert-info"><b> Remember: variables persist between cells</b> 
    
Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the <em>order</em> in which they appear. Python will remember <em>all</em> the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. </div> 

## Table of Contents
[0.   Setup](#setup)   
[1.   Theoretical background](#theory)   
[2.   Experimental overview and preparation steps](#experiment)   
[3.   Scoring and docking of the crystal structure ligand](#xray)   
[4.   Redocking the ligand into the protein](#docking)   
[5.   Docking from SMILES strings](#smiles)   
[6.   Conclusion](#conclusion)

## 0. Google Colab setup
<a id='setup'></a>
<div class="alert alert-warning">
<b>Attention:</b> Please only run the cells in this section if you are using Colab!</div>

This tutorial has many dependencies for it to work. They are:
- **biopython** for manipulation of the PDB files
- **py3Dmol** to visualise the protein structure and set up the search grid.
- **miniconda**, a free minimal installer of **conda** for software package and environment management.
- **OpenBabel** for parameterization of our ligand(s).
- **MGLtools** are used to parameterize our target protein using Gasteiger charges.
- **pdb2pqr** for parameterization of our protein using the AMBER ff99 forcefield.
- **Autodock Vina** for the docking process

The following installation instructions are the best way of setting up *Google Colab* for this laboratory session.

### 0.1 Setup conda colab

In [None]:
!if [ -n "$COLAB_RELEASE_TAG" ]; then git clone https://github.com/CCPBioSim/CCP5_Simulation_of_BioMolecules; fi
import os
os.chdir(f"CCP5_Simulation_of_BioMolecules{os.sep}3_Docking")

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

### 0.2 Check conda colab works

In [None]:
import condacolab
condacolab.check()

### 0.3 Install py3D mol for visualisation and pdb2pqr for parametrisation

In [None]:
!pip install py3Dmol
!pip install pdb2pqr

### 0.4 Creating and installing the rest of the environment

In [None]:
# Step 1: Create the env.yml file
env_yaml = """
channels:
  - conda-forge

dependencies:
  - python >= 3.8
  - pip
  - tqdm
  - loguru
  - fsspec
  - s3fs
  - gcsfs
  - joblib

  # Scientific
  - pandas = 2.1
  - numpy
  - scipy

  # Chemistry
  - rdkit
  - datamol
  - openmm >=7.6.0
  - pdbfixer >=1.8
  - openmmforcefields
  - mdtraj
  - mdanalysis

  # Docking
  - openbabel
  - smina
  - qvina
  - fpocket
  - vina

  # Viz
  - matplotlib
  - ipywidgets = 7.7.2
  - nglview
"""

# Write the environment configuration to a file
with open("environment.yml", "w") as file:
    file.write(env_yaml)


And now we can install the environment we have created using mamba. Please be patient! **The next two cells may take some time to run**.

In [None]:
%%capture
!mamba env update -n base -f environment.yml

In [None]:
%%capture
!mamba install -c conda-forge -c bioconda mgltools openbabel zlib ncurses --yes

In [None]:
# Importing py3Dmol for safety
import py3Dmol

In [None]:
# Checking that pdb2pqr was properly installed
!pdb2pqr30 -h | awk 'NR==1{if($1=="usage:") print "PDB2PQR succesfully installed"; else if($1!="usage:") print "Something went wrong. Please install again"}'

Now you should have a working environment and we can actually start thinking about running docking!

## 1. Theoretical background
<a id='theory'></a>

Many protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy and computationally through AlphaFold molecular docking, which is an essential tool in **drug discovery**.

**Molecular docking** explores the potential binding poses of small molecules in the **binding site** of a target protein for which reliable atomic coordinates are available.

Thus, the **_druggability_** of different compounds and their binding affinity on a given protein target can be calculated. This can help identify lead compounds from large molecular libraries.

<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_01.png'/>
<figcaption>FIGURE 1. In molecular docking, binding is evaluated in two steps: A) Energetics of the transition of the unbound states of ligand and target towards the conformations of the bound complex; and B) energetics of protein-ligand binding in these conformations. <br> Huey R et al (2007) <i>J Comput Chem 28(6), 1145-1152.</i></figcaption></center>
</figure>

Molecular docking programs perform a **search algorithm** in which varying conformations of a given ligand, typically generated using Monte Carlo or Genetic Algorithms, are recursively evaluated until convergence to an energy minimum is reached. Finally, through an **affinity scoring function**, a ΔG [binding free energy in kcal/mol] is estimated and employed to rank the candidate poses as the sum of several energetic contributions (electrostatics, van der Waals, desolvation, etc).

## 2. Experimental Overview and Preparation steps
<a id='experiment'></a>

We will use data from the [protein ligand benchmark dataset](https://github.com/openforcefield/protein-ligand-benchmark) for our docking workshop. In particular, we will use the protein *MCL1* which is a well-established cancer target. To find out more about the protein-ligand benchmark check out the publication on it:   
Hahn, D., Bayly, C., Boby, M. L., Bruce Macdonald, H., Chodera, J., Gapsys, V., Mey, A., Mobley, D., Perez Benito, L., Schindler, C., Tresadern, G. ., & Warren, G. (2022). Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks [Living Journal of Computational Molecular Science, 4(1), 1497.](https://doi.org/10.33011/livecoms.4.1.1497)

For this workshop we use **MGLtools** (and alternatively **pdb2pqr**) to prepare the target protein files, **OpenBabel** to prepare the ligand files, **AutoDock Vina** for the docking procedure and **py3Dmol** to establish the appropriate search grid configuration and analyze the results.

<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/docking_02.png' />
<figcaption>FIGURE 2. General steps of molecular docking. First, the target protein and ligand or ligands are parameterized. Then, the system is prepared by setting up the search grid. Once the docking calculation is performed, ligand poses are scored based on a given energy function. Lastly, the computational search is processed and compared against experimental data for validation <br><i>Taken from Pars Silico (en.parssilico.com).</i></figcaption></center>
</figure>

### 2.1 Downloading and Preparing the Receptor for AutoDock

The first step in a molecular docking procedure is to have a structure of a given target protein. While in some cases you may want to use an AlphaFold structure, most cases start with an experimentally (X-ray, NMR, cryoEM) solved three-dimensional structure.

In such scenario, a given target protein structure is downloaded from the **Protein Data Bank (PDB)** (https://www.rcsb.org/pdb) using a given accession ID. For example, for MCL1 the ID:4HW3 is a good structure with a ligand bound to the protein.

We can download the structure from the PDB using **biopython**:

In [None]:
# Downloading the PDB files using biopython
import os
from Bio.PDB import *
pdbid = ['6o6f']
pdbl = PDBList()
for s in pdbid:
    pdbl.retrieve_pdb_file(s, pdir='.', file_format ="pdb", overwrite=True)
    os.rename("pdb"+s+".ent", s+".pdb")

In the case of X-ray diffraction, this experimental strategy does not discriminate between electron density coming from static protein atoms or water molecules, meaning that most protein structures solved by X-ray diffraction also include so-called **crystallographic waters** (check the non-bonded red dots on the protein structure below). These molecules are not important for our particular docking exercise and will remove them. However, water-mediated binding can be important and water molecules may be retained for docking.
<figure>
<center>
<img src='https://github.com/CCPBioSim/CCP5_Simulation_of_BioMolecules/blob/main/3_Docking/resources/4HW3.png?raw=true'/>
<figcaption>FIGURE 3. Cartoon representation of MCL1 (PDB accession ID 4hw3), with its N-to-C-terminal residues colored according to order and disorder</i></figcaption></center>
</figure>

  Typically, this can be easily done by extracting all of the lines from the PDB file that start with **"ATOM"**, as this is how all of the atoms that belong to amino and nucleic acid residues are termed. In contrast, the atoms from ligands, water molecules and other non-protein/non-nucleic residues are commonly referred to as **"HETATM"**. Also, the different chains of an oligomer are separated by a **"TER"** string, which is important to keep in our case.

  The following **Python snippet** will first create a folder in which we will store all data related to our molecular docking experiment. Then, it will extract all lines matching the string "ATOM" (for the protein atoms) or "TER" (for the chain separations) into a separate PDB file for further processing. Please take a good look at it.

#### Cleaning the pdb file and getting it ready for docking
We want to carry out the following steps on this PDB file
1. Select only chain B of the protein
2. Get the ligand out
3. Prepare the protein and ligand to score the X-ray structure pose

We will be using a tool called MDanalysis for this. For now just read the code and see if you can understand it. In [Session 5](../5_Analysis_MDAnalysis) you will learn more about this Python package.

In [None]:
from pathlib import Path
import MDAnalysis as mda

# Then, we define the path of the folder we want to create.
# Notice that the HOME folder for a hosted runtime in colab is /content/
docking_path = Path("/content/single-dock/")
#docking_path = Path("single_dock")

# Now, we create the folder using the os.mkdir() command
# The if conditional is just to check whether the folder already exists
# In which case, python returns an error
if os.path.exists(docking_path):
    print("path already exists")
else:
    os.mkdir(docking_path)
    print("path was succesfully created")

# Now we assign a variable "protein" with the name and extension of our pdb
protein = "6o6f.pdb"

# we create a universe with the protein
u = mda.Universe(protein)
prot_sel = u.select_atoms("record_type ATOM and segid B")
prot_sel.write(os.path.join(docking_path, "6o6f_prot_chainB.pdb"))


In [None]:
# We will first create a path for all ligands that we will use in this tutorial
# Notice that the HOME folder for a hosted runtime in colab is /content/
ligandpath = Path("ligands/")

# Now, we create the folder using the os.mkdir() command
# The if conditional is just to check whether the folder already exists
# In which case, python returns an error
if os.path.exists(ligandpath):
    print("ligand path already exists")
else:
    os.mkdir(ligandpath)
    print("ligand path was successfully created")

# Now we select the ligand from the PDB record we know the ligand is called LOD
lig_sel = u.select_atoms("segid B and resname LOD")
lig_sel.write(os.path.join(ligandpath, "ligand.pdb"))

We now have our ligand and protein files separated.
However, for AutoDock Vina to perform a molecular docking experiment, the target protein file must contain atom types compatible with Autodock for evaluating different types of interaction, as well as their partial charges to evaluate electrostatic interactions. Such information is included in a file known as **PDBQT**, a modification of the PDB format that also includes **charges (q)** and **AutoDock-specific atom types (t)** in two extra columns at the end of the now PDBQT file. It is worth noting, however, that Autodock Vina **ignores the user-supplied partial charges**, as it has its own way of dealing with electrostatic interactions.

  Lastly, the protein target must contain **all polar hydrogens**, such that hydrogen bonds can be formed between the target protein and the ligand. Most protein structures have no hydrogens included, meaning that we must add them.

  This is the part of the tutorial where you have **two different options** to proceed with your experiment!

#### Adding hydrogens according to pKa to the protein

Add the polar hydrogens of your protein and parameterize it based on the pKa of each aminoacid at pH 7.4 with the **AMBER ff99** force field using **pdb2pqr**, followed by deletion of non-polar hydrogens and conversion into **PDBQT** file using **MGLtools**.

  In this case, pdb2pqr generates an intermediate **PQR** file, a modification of the PDB format which allows users to add charge and radius parameters to existing PDB data. This information is then unaltered during the use of **MGLtools**.

In [None]:
!pdb2pqr30 --ff AMBER --titration-state-method propka --with-ph 7.4 $docking_path/6o6f_prot_chainB.pdb $docking_path/6o6f_prot_chainB.pqr

#### Using MDAnalysis to write out the pdbqt file for docking

In [None]:
u = mda.Universe(os.path.join(docking_path, "6o6f_prot_chainB.pqr"))
asel = u.select_atoms("all")
asel.write(os.path.join(docking_path, "6o6f.pdbqt"))

In [None]:
# Now we need to get rid of the first two lines
filename = os.path.join(docking_path,"6o6f.pdbqt")
try:
    # Open the input file in read mode
    with open(filename, 'r') as infile:
        # Read all lines from the file
        lines = infile.readlines()

        # Skip the first two lines
        lines_to_write = lines[2:]
        #print(lines)
    # Open the output file in write mode
    with open(filename, 'w') as outfile:
        # Write the remaining lines to the new file
        outfile.writelines(lines_to_write)
except Exception as e:
  print(e,'Fail...muhahahah')

This will keep the protein side chains rigid in the docking. If you want to use a flexible receptor you will have to use the script

```
!prepare_receptor4.py -r $docking_path/6o6f_prot.pqr -o $docking_path/6o6f_prot.pdbqt -C -U nphs_lps -v
```

<div class="alert alert-success">
<b>Task 1</b>. Here are some questions to think about:

- Why are hydrogens important when you want to run docking?   
- Why are we only adding polar hydrogens?    
- If the partial charges are ignored by Autodock Vina, how can these different strategies affect my docking results? (this is something you can actually test!)
- Should we ignore the water?
</div>

### 2.2 Preparing the Ligand for AutoDock Vina

We already have the ligand written out to a pdb file. To make this file work with Vina we need to wield some magic.
First we load the pdb file, convert it to mol2 and add some hydrogens using the python API of openbabl

In [None]:
from openbabel import pybel
mol = next(pybel.readfile('pdb', os.path.join(ligandpath,'ligand.pdb')))
mol.addh()
mol.calccharges()
mol.write("mol2", "ligand.mol2",overwrite=True)

In [None]:
!prepare_ligand4.py -l ligand.mol2 -o $docking_path/ligand.pdbqt -U nphs_lps -v

We are now ready to try and scoring of the ligand in the binding site.

## 3. Scoring and Docking of the Crystal Structure Ligand
<a id='xray'></a>

### 3.1 Defining the docking grid

To define the docking grid we find the centre of geometry of the ligand using MDanalysis analysis again

In [None]:
# laod the mol2 file
u = mda.Universe('ligand.mol2')

In [None]:
# We select all the atoms and the compute the centre of geometry
sel = u.select_atoms('all')
grid_centre = sel.center_of_geometry()
print(grid_centre)

### 3.2 Visualising the docking grid
Below are a set of predefined function to help with the visualisation of the ligand and the protein and a docking box. You can pick the centre of geometry of your ligand and change the box size to see if the ligand will fit into the box or not.  

In [None]:
def definegrid(object,bxi,byi,bzi,bxf,byf,bzf):
  object.addBox({'center':{'x':bxi,'y':byi,'z':bzi},'dimensions': {'w':bxf,'h':byf,'d':bzf},'color':'red','opacity': 0.6})

# Next, we define how the protein will be shown in py3Dmol
# There is an option to add a style representation for active site residues
def viewprot(object,prot_PDBfile, resids):
    mol1 = open(prot_PDBfile, 'r').read()
    #mol2 = open(lig_file, 'r').read()
    object.addModel(mol1,'pdb')
    object.setStyle({'cartoon': {'color':'lightblue'}})
    object.addStyle({'resi':resids},{'stick':{'colorscheme':'greenCarbon'}})
def viewlig(object,lig_file):
    mol1 = open(lig_file, 'r').read()
    object.addModel(mol1,'mol2')
    object.addStyle({'resname':'LOD401'},{'stick':{'colorscheme':'cyanCarbon'}})

# Lastly, we combine the box grid and protein into a single viewer
def viewprotgrid(prot_PDBfile,ligand_file,resids,bxi,byi,bzi,bxf=10,byf=10,bzf=10):
    mol_view = py3Dmol.view(1000,1000,viewergrid=(1,2))
    definegrid(mol_view,bxi,byi,bzi,bxf,byf,bzf)
    viewprot(mol_view,prot_PDBfile,resids)
    viewlig(mol_view,ligand_file)
    mol_view.setBackgroundColor('0xffffff')
    mol_view.rotate(90, {'x':0,'y':1,'z':0},viewer=(0,1));
    mol_view.zoomTo()
    mol_view.show()

In [None]:
viewprotgrid(prot_PDBfile = '6o6f.pdb', ligand_file='ligand.mol2',resids = 20,
         bxi=-29.32, # here you put your centre of geometry information you have computed before
         byi=-7.88,
         bzi=-19.98,
         bxf=15,   #This sets the size of the box
         byf=15,
         bzf=15)

<div class="alert alert-success">
<b>Task 2.</b> Play around with the docking grid. Try different values and see how it affects the protein
</div>

### 3.2 Scoring the X-ray structure of MCL1
Now we can do the actual scoring of the ligand crystal structure!

In [None]:
from vina import Vina

v = Vina(sf_name='vina')

# Setting the protein
v.set_receptor(os.path.join(docking_path,'6o6f.pdbqt'))
# Setting the ligand
v.set_ligand_from_file(os.path.join(docking_path,'ligand.pdbqt'))

# Now we use the grid centre we computed before
v.compute_vina_maps(center=grid_centre, box_size=[20, 20, 20])

# Score the current pose
energy = v.score()
print('Score before minimization: %.3f (kcal/mol)' % energy[0])

# Minimized locally the current pose
energy_minimized = v.optimize()
print('Score after minimization : %.3f (kcal/mol)' % energy_minimized[0])
v.write_pose(os.path.join(docking_path,'ligand_minized.pdbqt'), overwrite=True)

### 3.3 Visualising the protein and ligand before and after minimization

Converting back to mol2 files using MDAnalysis.

In [None]:
u = mda.Universe(os.path.join(docking_path,'ligand_minized.pdbqt'))
sel = u.select_atoms('all')
sel.write('minimized.pdb')

In [None]:
mol_view = py3Dmol.view(1000,1000)
mol1 = open('minimized.pdb', 'r').read()
mol2 = open(os.path.join('ligand.mol2'), 'r').read()
mol_view.addModel(mol1,'pdb')
mol_view.addModel(mol2,'mol2')
mol_view.setStyle({'stick':{'colorscheme':'cyanCarbon'}})
mol_view.setBackgroundColor('0xffffff')
mol_view.rotate(90, {'x':0,'y':1,'z':0},viewer=(0,1));
mol_view.zoomTo()
mol_view.show()

## 4. Redocking the ligand into the protein  
<a id='docking'></a>

Now we get to **run the actual docking calculation**!
Up until now we only took the crystal pose and scored it using the scoring function and allowed it to minimize its energy a little bit.
Now we want to run Vina to generate multiple poses, see how they score, what their RMSD is and how they differ in location to the protein binding site.



### 4.1 Docking parameter options
You have several possible variables that can be edited. A very relevant variable is the **exhaustiveness**, i.e., the number of independent runs starting from random conformations, and therefore the amount of computational effort during molecular docking.

The other important one is setting the location of the docking grid correctly which you do with:
```
v.compute_vina_maps(center=grid_centre, box_size=[20, 20, 20])
```
The center here is just the geometric centre of your ligand of your X-ray structure.

```
#OPTIONS
cpu = [value] # more cpus reduces the computation time
exhaustiveness = [value] # search time for finding the global minimum, default is 8
num_modes = [value] # maximum number of binding modes to generate, default is 9
energy_range = [value] # maximum energy difference between the best binding mode and the worst one displayed (kcal/mol), default is 3
seed = [value] # explicit random seed, not required
```

### 4.2 Running the docking

In [None]:
v = Vina(sf_name='vina')

# Setting the protein
v.set_receptor(os.path.join(docking_path,'6o6f.pdbqt'))
# Setting the ligand
v.set_ligand_from_file(os.path.join(docking_path,'ligand.pdbqt'))

# Now we use the grid centre we computed before
v.compute_vina_maps(center=grid_centre, box_size=[20, 20, 20])

# Dock the ligand
v.dock(exhaustiveness=32, n_poses=10)
v.write_poses(os.path.join(docking_path,'docked_ligand.pdbqt'), n_poses=3, energy_range=6, overwrite=True)

In [None]:
from openbabel import pybel
mol = next(pybel.readfile("pdbqt", os.path.join(docking_path,"docked_ligand.pdbqt")))
pybelmol = pybel.Molecule(mol)
pybelmol.write("pdb", os.path.join(docking_path,"docked.pdb"), overwrite=True)

We need to convert our output poses into pdb so we can visualise them with py3DMol. Depending on your setting you will have one or multiple poses. You can use MDAnalysis or obabel to do the conversion!

In [None]:
# If you want to convert our Autodock Vina results from pdbqt into many pdbs you can use this
# For this however the commandline obabel option does not seem to work on colab
!obabel -ipdbqt $ligandpath/docked_ligand.pdbqt -opdb -O $ligandpath/docked.pdb -m

### 4.3 Visualising the different docked poses

In [None]:
view = py3Dmol.view()
view.setBackgroundColor('white')
# Loading the target protein as PDB file
view.addModel(open(docking_path/'6o6f_prot_chainB.pdb', 'r').read(),'pdb')
view.setStyle({'cartoon': {'color':'spectrum'}})
view.zoomTo()
# Loading the docking pose
view.addModel(open(docking_path/'docked.pdb', 'r').read(),'pdb')
view.setStyle({'model':1}, {'stick':{'colorscheme':'greenCarbon'}})
# Loading the experimentally solved binding mode
view.addModel(open(ligandpath/'ligand.pdb', 'r').read(),'pdb')
view.setStyle({'resn':'LOD'},{'stick':{}})
view.show()

## 5. Docking from SMILES strings
<a id='smiles'></a>

This is now a team effort. The idea is to pick a ligand and run the docking protocol not from a known x-ray structure, but from a SMILES string. Below is an overview of the task.

<div class="alert alert-success">
<b>Task 3.</b> It is now your turn to dock MCL1 inhibitors and find the most potent inhibitor! This task comprises of the following steps:

1. Pick a SMILES string
2. Prepare the string for docking, that will involve converting it to a mol2 file using --gen3d, minimize the pose using gaff and then creating the pdbqt file for docking. For all of this you can use openbabel!
3. Docking the file into the already prepared MCL1 protein using the same grid as the existing crystal structure ligand
4. Report back your docking score for your ligand!
</div>

### 5.1 What is the SMILES format?

The **Simplified Molecular-Input Line-Entry System** (SMILES) is a text notation that allows a user to represent a chemical structure in a way that can be used by the computer. The elemental notation for different types of bonds between different atoms is as follows:

  \-	for single bonds (e.g., C-C or CC is CH3CH3)

  \=	for double bonds (e.g., C=C for CH2CH2)

  \#	for triple bonds (e.g., C#N for C≡N)

  \	for aromatic bond (e.g., C\*1\*C\*C\*C\*C\*C1 or c1ccccc1 for benzene)

  \. for disconnected structures (e.g., Na.Cl for NaCl)

  / and \ for double bond stereoisomers (e.g., F/C=C/F for trans-1,2-difluoroethylene and F/C=C\F for cis-1,2-difluoroethylene)

  @ and @@ for enantiomers (e.g., N\[C@@H](C)C(=O)O for L-alanine and N\[C@H](C)C(=O)O for D-alanine)


### 5.2. Pick a ligand to work with
You can find all SMILES files in the directory `CCP5_Simulation_of_BioMolecules/3_Docking/resources/`.

A summary of all the ligands, plus experimental data can be found [here](https://docs.google.com/document/d/161CXaFY18dWeg2i804n67TJHnN_q_KJCDShP8x8rCiM/edit?usp=sharing).

Now pick one of the ligands you will work with and put your name in the google doc to keep track of who is working on which ligand!
Below is a rough sketch of the workflow with `ligand 27`.



In [None]:
!head CCP5_Simulation_of_BioMolecules/3_Docking/resources/lig_27.smi

Here is some code to help you visualise the ligand! This will show a graphical interface with a text cell at the top. Pasting your SMILES string in the cell will visualise the molecule using the code below.

In [None]:
# Use the following Viewer to load your SMILES as a 3D molecule
import py3Dmol
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

def MolTo3DView(mol, size=(300, 300), style="stick", surface=False, opacity=0.5):
    assert style in ('line', 'stick', 'sphere', 'carton')
    mblock = Chem.MolToMolBlock(mol)
    viewer = py3Dmol.view()
    viewer.addModel(mblock, 'mol')
    viewer.setStyle({style:{}})
    if surface:
        viewer.addSurface(py3Dmol.SAS, {'opacity': opacity})
    viewer.zoomTo()
    return viewer

from ipywidgets import interact,fixed,IntSlider
import ipywidgets

def smi2conf(smiles):
    '''Convert SMILES to rdkit.Mol with 3D coordinates'''
    mol = Chem.MolFromSmiles(smiles)
    if mol is not None:
        mol = Chem.AddHs(mol)
        AllChem.EmbedMolecule(mol)
        AllChem.MMFFOptimizeMolecule(mol, maxIters=200)
        return mol
    else:
        return None

@interact
def smi2viewer(smi='CC=O'):
    try:
        conf = smi2conf(smi)
        return MolTo3DView(conf).show()
    except:
        return None

### 5.3. Constructing parameters from a SMILES string

Now, we will take this SMILES format and use it to construct and parameterize a three-dimensional structure of Lig27 in **PDBQT** format for its use in molecular docking. We can follow the same strategy as before making use of babel.

Use the program **babel** to convert the SMILES into a **MOL2** file without any extra work (such as searching for best conformers) except for setting the protonation state to pH 7.4, and then use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges (this is the canonical option for the majority of AutoDock users). Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

In [None]:
from openbabel import pybel
smiles = 'O=C([O-])c1[nH]c2ccccc2c1CCCOc1ccccc1' #fixme, find the one for the right ligand
mol = pybel.readstring("smi", smiles)
mol.addh()
mol.make3D()
mol.write("mol2", "lig_27.mol2", overwrite=True) #fix me correct filename

In [None]:
!prepare_ligand4.py -l lig_27.mol2 -o $docking_path/lig_27.pdbqt -U nphs_lps -v
# NOTE: for some reason, MGLtools does not recognize the ligand when inside a different folder
# Here we are deleting the temporary PDB file required for generating the PDBQT file
# os.remove("lig27.mol2")

### 5.4. Now run the docking again
Here is where you will have to write your own code to get this to work (copying from above).

In [None]:
v = Vina(sf_name='vina')

# Setting the protein
v.set_receptor(os.path.join(docking_path,'6o6f.pdbqt'))
# Setting the ligand


# Now we use the grid centre we computed before because we are still docking into the same binding site


# Dock the ligand


`v.energies` will give you back information on your different poses. When using the autodockFF it will give back 5 columns each containing the following information:
`columns=[total energy, inter atomic contributions, intra atomic contributions, torsions contributions,difference between this and the best pose]`

### 5.5. Visualise your docked pose using the code from above
If you want to know what the correct binding pose is for this series of ligands you can check out Lig27's correct binding pose and visualise it against your docked structure. You can find the X-ray structure pose for ligand 27 in `resouces/xtal-pose/lig27_pose.pdb`

**Remember this is how you convert your pdbqt file to multipdb**
```
#We need to convert our Autodock Vina results from pdbqt into pdb
#For this, we use babel
!obabel -ipdbqt docked.pdbqt -opdb -O output_dock.pdb -m
```

In [None]:
## your visualisation code here



<div class="alert alert-success">
<b>Task 4.</b> What could you have done to improve the docking?
</div>

### 5.6. Getting improved starting point for docking from your SMILES string

#### 5.6.1. Parameters with an energy minimisation

You can also convert the SMILES into a 3D **MOL2** file while simultaneously performing and energy minimization using the Generalized Amber Force Field (**GAFF**). Then, use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges. Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

```
from biobb_chemistry.babelm.babel_minimize import babel_minimize
prop = {
    'criteria': 1e-6,
    'method': 'cg',
    'force_field': 'GAFF'
}
babel_minimize(input_path='lig27.mol2',
                output_path='minlig27.mol2',
                properties=prop)


# Preparing the ligand for docking
!prepare_ligand4.py -l minlig27.mol2 -o $docking_path/lig27.pdbqt -U nphs_lps -v
os.remove("minlig27.mol2")
```

#### 5.6.2. Parameters with an energy minimisation and conformer generation

Use the program **babel** to  convert the SMILES into a 3D **MOL2** file while simultaneously performing a weighted rotor search for the lowest energy conformer using the Generalized Amber Force Field (**GAFF**). Then, use **MGLtools** to parameterize the ligand using **Gasteiger** partial charges. Please note that we are generating a ligand in which **all torsions are active** during the docking procedure.

```
# Converting lig27 from SMILES into a 3D MOL2 format and perform a weighted rotor search for lowest energy conformer
# Then, prepare ligand for docking using the Autodock script
!obabel $ligandpath/lig27.smi -O lig27.mol2 --gen3d --best --canonical --conformers --weighted --nconf 50 --ff GAFF
!prepare_ligand4.py -l lig27.mol2 -o $docking_path/lig27.pdbqt -U nphs_lps -v
os.remove("lig27.mol2")
```

### 5.6.3. Defining flexible atoms in the receptor binding site
The last obvious thing to improve docking results is allowing for the binding site side chain residues to be flexible when you do.
To achieve this you can use the `!prepare_receptor4.py`. However, bare in mind this is in Python 2.7 and **maybe using a different docking tool** is an overall better solution!

**And this is the end of the docking tutorial!** If you want to download your results, you can compress them into a zip file for manual download.

In [None]:
!zip -r singledocking.zip $docking_path
# By default, automatic download is enabled through the following lines
# but you need to disable your adblocker in order for it to work
from google.colab import files
files.download("/content/singledocking.zip")

## 6. Conclusion
<a id='conclusion'></a>

<div class="alert alert-info">
    <b>Key points:</b>   

- Docking can provide a binding pose and an associated score with it
- Vina is a tool that can be used for docking that is open source
- Vina can be used through a mixture of command line tools and Python API.
- There are many docking tools available; the docking process is broadly similar for all tools, however, there may be easier variants that have a full Python integration.
- Be aware that scores from docking are not always very reliable. 
</div>

---