# Docking Preparation

* Use the RCSB Search API to find a protein structure for docking.
* Prepare a protein and ligand structures for input into AutoDock Vina.

Structure files ready for docking called pdbqt, which is used by AutoDock Vina.
The PDBQT format is similar to the PDB format, but it includes additional information such as atomic charges.

1. Download the EC 7.1.1.8 protein structure with our ligand of interest docked from the PDB.
2. Isolate the protein in the structure (strip any water molecules and ligand).
3. Add hydrogens to the protein and clean up the structure.
4. Create a PDBQT file for our protein.
5. Create PDBQT files for our ligands.
### Python Libraries

Will add one more library to list - [MDAnalysis](https://www.mdanalysis.org/).
MDAnalysis is a Python library that was written for analyzing molecular dynamics (MD) simulations.
However, it has reading and writing capabilities for many molecular file formats, as well as a selection language for isolating particular parts of molecules.

| Library         | abbreviation | Purpose |
|:-------------|:---------:|:------------|
| rcsbsearchapi | N/A      | functions for searching the Protein Data Bank based on the mmCIF dictionary |
| requests     | N/A  | access web URLs - used with APIs for databases |
| os           | N/A      | operating system functions - handling file paths and directories. |
| nglview      | nv       | for viewing molecular structures |
| MDAnalysis     | mda | molecular dynamics library - used for reading/writing files and selecting atoms |

### Command Line Software Used in this Notebook

Add a few command line scripts and utilities to this notebook.
Usually, these would be executed in the terminal, will run them from the Jupyter interface.
These will be used to prepare our structures for docking calculations

| Software         | Purpose |
|:-------------|:---------|
| pdb2pqr      | adding hydrogens and missing atoms to protein, adjusting for pH |
| meeko        | preparing ligands for docking |

## Finding and saving a protein structure

To find an appropriate protein for docking, use the PDB Search API.
The second part of query will be the ligand name.

In [None]:
from rcsbsearchapi.search import TextQuery
from rcsbsearchapi import rcsb_attributes as attrs

ECnumber = "7.1.1.8"

q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber  # looking for quinol-cytochrome-c reductase
q2 = TextQuery("AZO")

query = q1 & q2               # combining the two queries into one

results = list(query())
print(results)

In [None]:
pdb_id = results[0].lower() # Get the PDB ID and convert to lowercase
print(pdb_id)

Identified the PDB ID of a quinol-cytochrome-c reductase with ligand of interest bound.

Will now create a directory called `protein_structures` to save this in.

In [None]:
import os # for making directories
import requests

# make a directory for pdb files
protein_directory = "protein_structures"
os.makedirs(protein_directory, exist_ok=True)

pdb_request = requests.get(f"https://files.rcsb.org/download/{pdb_id}.pdb")
pdb_request.status_code

In [None]:
with open(f"{protein_directory}/{pdb_id}.pdb", "w+") as f:
    f.write(pdb_request.text)

## Visualizing the protein strucure

Before start to really work with the molecule, investigate the structure.
Will use a library called MDAnalysis to first process the PDB, then visualize it with a library called NGLView.

MDAnalysis is a Python library that is used to process molecular dynamics trajectories and other 3D strucure molecular files.
The core object for MDAnalysis is a "Universe" and it corresponds to a molecular system.
Load a PDB file into MDAnalysis, then do things like measure distances in the structure or isolate particular parts.

In [None]:
import MDAnalysis as mda

# Load into MDA universe
u = mda.Universe(f"{protein_directory}/{pdb_id}.pdb")
u

Inspect the structure visually using a library called NGLView.
NGLView is a molecular visualizer made to work on the web and Jupyter notebooks.(https://www.youtube.com/watch?v=6QHWhycMuXc).

In [None]:
import nglview as nv
view = nv.show_mdanalysis(u)
view

This view looks a bit messy, isolate the protein and ligand for viewing.
MDAnalysis has a human readable [selection syntax](https://docs.mdanalysis.org/stable/documentation_pages/selections.html)
that allows isolating parts of the structure. Take the MDAnalysis Universe (the variable `u`) and use the `select_atoms` function.
Inside this function, fill in what wanted to select.

Create separate variables for the protein and ligand. Can select all protein residues in MDAnalysis using the word "protein" in the `select_atoms` function. Then, select the ligand using `resname AZO`. This corresponds to the residue name in the PDB downloaded.
Can also select waters in the structure by using `"resname HOH"`.

In [None]:
# Select protein atoms
protein = u.select_atoms("protein")
ligand = u.select_atoms("resname 13U")
water = u.select_atoms("resname HOH")

water

After selecting parts of the structure, add them individually to an NGLView view.
In the cell below, visualize the protein's surface area colored by hydrophobicity.
Waters from the crystal structure are in spacefill representation, and add the ligand in a ball and stick representation.

In [None]:
view = nv.show_mdanalysis(protein)
view.clear_representations()
view.add_representation("surface", colorScheme="hydrophobicity")
lig_view = view.add_component(ligand)
lig_view.add_representation("ball+stick")
water_view = view.add_component(water)
water_view.add_representation("spacefill")
view

To perform a docking calculation, isolate the protein from the ligand.
The starting structure for the protein contains extra molecules like ligands and water.
Also notice from examining the visualization that the structure does not include hydrogen atoms.
There are also some missing atoms and some of protein residues have alternate locations marked just like the ligand.

For docking, remove all of these extra molecules and only keep the protein.
Also remove any alternate locations of residues.
Use MDAnalysis to remove these extra molecules and save the starting protein structure as a new file.

In [None]:
# Write protein to new PDB file
protein.write(f"{protein_directory}/protein_{pdb_id}.pdb")

<div class="alert alert-block alert-danger">
<strong>Protein Charge</strong>  

After saving the protein in the cell above, there will be a warning about information for formal charges not being set in the protein.
This warning appears because MDAnalysis did not find specific formal charge data in the PDB file and used a default value instead.
This is not a concern because next will adjust the protonation states of different residues using PDB2PQR in the next steps.
</div>


## Fixing the protein structure

Now that the protein has been isolated, next ensure that to add correctly hydrogen and fix any missing atoms.

For fixing the protein, use a specialized program called PDB2PQR that is made for working with biomolecules like proteins.
The advantage of using PDB2PQR is that it will check protein for missing atoms and multiple occupancy in the protein, and it will pick positions and add missing atoms.

<div class="alert alert-block alert-success">
<strong>More complicated fixes: PDBFixer</strong>  

Another popular software for fixing PDB files is called [PDBFixer](https://github.com/openmm/pdbfixer).
</div>

Will use the command-line interface of this PDB2PQR.

In [None]:
! pdb2pqr --pdb-output=protein_structures/protein_h.pdb --pH=7.4 protein_structures/protein_2zq2.pdb protein_structures/protein_2zq2.pqr --whitespace

## Saving a protein PDBQT File

The PDB2PQR program outputs two files, a PDB file and a PQR file. The PDB file is similar to PDB files before, except that it contains hydrogens.
The PQR file is another molecular file format that is similar to a PDB, but contains information about atomic radii and atomic charges.

For use with AutoDock Vina, protein file should be in the "PDBQT" format. PDBQT is a specialized file format used by AutoDock Vina and other AutoDock tools. Like the PQR format, the PDBQT format can also contain partial charges. Load PQR file and use MDAnalysis to write a PDBQT file.

Use AutoDock Vina with the "vina" scoring function. The vina scoring function doesn't use charges to dock, so could have also used the PDB file without charges to convert to a PDBQT file. However, some scoring functions do use partial charges.

In [None]:
# make a directory for pdb files
pdbqt_directory = "pdbqt"
os.makedirs(pdbqt_directory, exist_ok=True)

u = mda.Universe(f"{protein_directory}/protein_{pdb_id}.pqr")
u.atoms.write(f"{pdbqt_directory}/{pdb_id}.pdbqt")

The PDBQT file generated by MDAnalysis includes two lines at the start of the structure that AutoDock Vina doesn't accept.
These lines start with "TITLE" and "CRYST1". To resolve this, the following cell replaces these lines with "REMARK", which is acceptable to AutoDock Vina.

Can use a different software to write your PDBQT.
[OpenBabel](https://openbabel.org/index.html) is a popular choice.

In [None]:
# Read in the just-written PDBQT file, replace text, and write back
with open(f"{pdbqt_directory}/{pdb_id}.pdbqt", 'r') as file:
    file_content = file.read()

# Replace 'TITLE' and 'CRYST1' with 'REMARK'
file_content = file_content.replace('TITLE', 'REMARK').replace('CRYST1', 'REMARK')

# Write the modified content back to the file
with open(f"{pdbqt_directory}/{pdb_id}.pdbqt", 'w') as file:
    file.write(file_content)

## Ligand Preparation

When preparing small molecule PDBQT files, can also use MDAnalysis or other tools.
There is a special program for small molecules and docking called [meeko](https://github.com/forlilab/Meeko).
Meeko will allow more easily visualization later.
Note that when using meeko, ligands should have hydrogens added already.

Use command line for meeko, similar to PDB2PQR.
Can also choose to use the Python API for this, but the command line is simpler for common tasks like converting an SDF to a PDBQT.

In the cell below, execute a command that converts the ligands that were prepared in the `molecule_manipulation` notebook to a PDBQT file.

In [None]:
# Use meeko to prepare small molecules - using meeko helps us visualize them later.
! mk_prepare_ligand.py -i ligands_to_dock/13U.sdf -o pdbqt/13U.pdbqt
! mk_prepare_ligand.py -i ligands_to_dock/13U_modified_methyl.sdf -o pdbqt/13U_modified_methyl.pdbqt
! mk_prepare_ligand.py -i ligands_to_dock/13U_modified_N.sdf -o pdbqt/13U_modified_N.pdbqt