In [15]:
import sys
print(sys.executable)

C:\Users\carlo\anaconda3\envs\vina-env\python.exe


In [19]:
from rcsbsearchapi.search import TextQuery
from rcsbsearchapi import rcsb_attributes as attrs

    which contains all the same functionalities as rcsbsearchapi and more! New features will only be added to the new rcsb-api package.
    For more details, see https://github.com/rcsb/py-rcsbsearchapi/issues/51.
  from rcsbsearchapi.search import TextQuery


In [21]:
ECnumber = "7.2.2.13"    

q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber  # looking for Na/Kaptase
q2 = TextQuery('ADP')
query = q1 & q2              

results = list(query())
print(results)

['8IJL', '3WGU', '3WGV', '4HQJ', '8D3U', '1MO8', '1MO7', '7E21']


In [22]:
pdb_id = results[3].lower() ## Get the PDB ID from the list and convert it to lowercase
print(pdb_id)

4hqj


In [11]:
import os # for making directories
import requests

# make a directory for pdb files
protein_directory = "protein_structures"

## fill in function to make directories
os.makedirs(protein_directory, exist_ok=True)

pdb_request = requests.get(f"https://files.rcsb.org/download/4hqj.pdb")
pdb_request.status_code

200

In [25]:
with open(f"{protein_directory}/{pdb_id}.pdb", "w+") as f:
    ## fill in write command
    f.write(pdb_request.text)

## Visualizing the protein strucure

In [27]:
import MDAnalysis as mda

# Load into MDA universe
u = mda.Universe(f'{protein_directory}/{pdb_id}.pdb')
u

<Universe with 20571 atoms>

In [32]:
import nglview as nv
view=nv.show_mdanalysis(u)
view
## create and show an NGLView from an MDAnalysis universe



NGLWidget()

In [38]:
# Select protein atoms
protein = u.select_atoms("protein")
ligandA = u.select_atoms("resname ADP")
water = u.select_atoms('resname HOH')## fill in selection for water.

water

<AtomGroup with 0 atoms>

In [40]:
print("ligandA:", len(ligandA))
print("water:", len(water))

ligandA: 54
water: 0


After selecting parts of our structure, we can add them individually to an NGLView view.
In the cell below, we visualize the protein's surface area colored by hydrophobicity.
Waters from the crystal structure are in spacefill representation, and we add the ligand in a ball and stick representation.

In [44]:
view = nv.show_mdanalysis(protein)
view.clear_representations()
view.add_representation('surface', ColorScheme="hydrophobicity")
lig_viewA = view.add_component(ligandA)
lig_viewA.add_representation("ball+stick")

view

NGLWidget()

If you rotate this structure so that you are looking at the bottom, you will be able to see our `13U` ligand bound.
Upon viewing this structure, you will notice that our ligand seems to appear twice. 
If you open the PDB file to investigate, you will see the following in the ligand section:

```
HETATM 1673  C14A13U A 501      18.144  -9.216  12.088  0.61 24.22           C  
ANISOU 1673  C14A13U A 501     1755   4793   2654   1752    148   1233       C  
HETATM 1674  C14B13U A 501      18.147  -8.840  11.672  0.39 24.46           C  
ANISOU 1674  C14B13U A 501     2583   4283   2430   1765    353   1279       C  
HETATM 1675  O32A13U A 501      18.209  -8.355  11.186  0.61 24.38           O  
ANISOU 1675  O32A13U A 501     2354   5394   1514   2217    238    919       O
```

This PDB structure provides [**alternate locations**](https://proteopedia.org/wiki/index.php/Alternate_locations) for each ligand atom. 
These occur when the experimental data supports multiple positions for the same atom.
In excerpt above, you will see C14A13U and C14B13U. These are alternate locations of the same atom. 
Alternate locations can also occur in the protein with some residues.

<div class="alert alert-block alert-success">
<strong>Selecting alternate locations using MDAnalysis</strong>  
    
By checking the [documentation page](https://docs.mdanalysis.org/stable/documentation_pages/selections.html) for MDAnalysis selections, we can see that MDAnalysis is prepared for this scenario. We will want to use the `altloc` keyword. This keyword is described as:

> altLoc alternative-location

> a selection for atoms where alternative locations are available, which is often the case with high-resolution crystal structures e.g. resid 4 and resname ALA and altLoc B selects only the atoms of ALA-4 that have an altLoc B record.

If you wanted to use MDAnalysis to select for a particular ligand location, you could use:

```python
ligand_A = u.select_atoms("resname 13U and altLoc A")
ligand_B = u.select_atoms("resname 13U and altLoc B")
```
</div>

To perform a docking calculation, we will have to isolate the protein.
This starting structure for our protein contains extra molecules like ligands and water.
You will also notice from examining our visualization that our structure does not include hydrogen atoms.
If you were to examine the PDB file, you would also see that there are some missing atoms and some of our protein residues have alternate locations marked just like thie ligand.

For docking, we will want to remove all of these extra molecules and only keep the protein.
We will also want to remove any alternate locations of residues.
We will use MDAnalysis to remove these extra molecules and save our starting protein structure as a new file.

In [46]:
# Write protein to new PDB file
protein.write(f"{protein_directory}/protein_{pdb_id}.pdb")



<div class="alert alert-block alert-danger">
<strong>Protein Charge</strong>  

After saving the protein in the cell above, you may see a warning about information for formal charges not being set in the protein. 
This warning appears because MDAnalysis did not find specific formal charge data in the PDB file and used a default value instead. 
This is not a concern for us because we will adjust the protonation states of different residues using PDB2PQR in the next steps. 
</div>


## Fixing the protein structure

Now that we've isolated our protein, we will want to ensure that we've correctly added hydrogen and fixed any missing atoms.

For fixing our protein, we will use a specialized program called PDB2PQR that is made for working with biomolecules like proteins.
The advantage of using PDB2PQR is that it will check our protein for missing atoms and multiple occupancy in the protein, and it will pick positions and add missing atoms.

<div class="alert alert-block alert-success">
<strong>More complicated fixes: PDBFixer</strong>  

Another popular software for fixing PDB files is called [PDBFixer](https://github.com/openmm/pdbfixer). PDBFixer is an open-source tool developed by the OpenMM team and is designed to fix common problems in Protein Data Bank (PDB) files before they are used in molecular simulations. It can be used to remove unwanted molecules like water, add missing heavy atoms to incomplete residues, add hydrogen atoms where needed.

PDBFixer can be especially useful when there are missing loops or residues. In this workshop, our protein is not missing any residues, but many proteins from the PDB might require more adjustment.

To see an example of preparing proteins with PDBFixer, see this [recent YouTube video](https://www.youtube.com/watch?v=pwfKE6wPaMg) posted by the Open Forcefield Initiative. In this video, the presenter first uses PDBFixer, then PDB2PQR to adjust protonation.

</div>

We will use the command-line interface of this PDB2PQR. This means that you would usually type the command below into your terminal
You can run command line commands in the Jupyter notebook by putting a `!` in front of the command. 

In [1]:
!python -m pdb2pqr --pdb-output=protein_structures/protein_4hqj.pdb --pH=7.4 protein_structures/protein_4hqj.pdb protein_structures/protein_4hqj.pqr --whitespace

INFO:PDB2PQR v3.6.1: biomolecular structure conversion software.
INFO:Please cite:  Jurrus E, et al.  Improvements to the APBS biomolecular solvation software suite.  Protein Sci 27 112-128 (2018).
INFO:Please cite:  Dolinsky TJ, et al.  PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 35 W522-W525 (2007).
INFO:Checking and transforming input arguments.
INFO:Loading topology files.
INFO:Loading molecule: protein_structures/protein_4hqj.pdb
ERROR:Error parsing line: invalid literal for int() with base 10: ''
ERROR:<REMARK     2>
ERROR:Truncating remaining errors for record type:REMARK

ERROR:['REMARK']
INFO:Setting up molecule.
INFO:Created biomolecule object with 2598 residues and 20359 atoms.
INFO:Setting termini states for biomolecule chains.
INFO:Loading forcefield.
INFO:Loading hydrogen topology definitions.
INFO:Attempting to repair 2 missing atoms in biomolecule.
INFO:Added atom OXT to residue SER E 47 

## Saving a protein PDBQT File

The PDB2PQR program outputs two files, a PDB file and a PQR file. The PDB file is similar to PDB files we have worked with before, except that it contains hydrogens.
The PQR file is another molecular file format that is similar to a PDB, but contains information about atomic radii and atomic charges.

For use with AutoDock Vina, we need our protein file to be in the "PDBQT" format. PDBQT is a specialized file format used by AutoDock Vina and other AutoDock tools. Like the PQR format, the PDBQT format can also contain partial charges. We will load our PQR file and use MDAnalysis to write a PDBQT file.

<div class="alert alert-block alert-success">
<strong>What information do we need in the PDBQT file?</strong>

We'll be using AutoDock Vina with the "vina" scoring function (this will be explained in more detail in the next notebook). The vina scoring function doesn't use charges to dock, so we could have also used the PDB file without charges to convert to a PDBQT file. However, some scoring functions do use partial charges.

</div>

In [25]:
# make a directory for pdb files
import os
import MDAnalysis as mda
pdbqt_directory = "pbdqt" ## fill in the name of the directory to write PDBQT files to
os.makedirs(pdbqt_directory, exist_ok=True)

u = mda.Universe(f"{protein_directory}/protein_{pdb_id}.pqr")
u.atoms.write(f"{pdbqt_directory}/{pdb_id}.pdbqt")



The PDBQT file generated by MDAnalysis includes two lines at the start of the structure that AutoDock Vina doesn't accept. 
These lines start with "TITLE" and "CRYST1". To resolve this, the following cell replaces these lines with "REMARK", which is acceptable to AutoDock Vina.

You might have also just chosen to use a different software to write your PDBQT. 
[OpenBabel](https://openbabel.org/index.html) is a popular choice. However, we are using MDAnalysis here for consistency with the rest of the workshop and to limit the number of libraries we are using.

In [27]:
# Read in the just-written PDBQT file, replace text, and write back
with open(f"{pdbqt_directory}/{pdb_id}.pdbqt", 'r') as file:
    file_content = file.read()

# Replace 'TITLE' and 'CRYST1' with 'REMARK'
file_content = file_content.replace('TITLE', 'REMARK').replace('CRYST1', 'REMARK')

# Write the modified content back to the file
with open(f"{pdbqt_directory}/{pdb_id}.pdbqt", 'w') as file:
    file.write(file_content)

## Ligand Preparation

When preparing small molecule PDBQT files, you could have also chosen to use MDAnalysis or other tools.
However, we are going to use a special program for small molecules and docking called [meeko](https://github.com/forlilab/Meeko).
We choose to use meeko for our ligands because it will allow us to more easily visualize our results later.
Note that when using meeko, ligands should have hydrogens added already.

We are using the command line for meeko, similar to PDB2PQR. 
You could also choose to use the Python API for this, but the command line is simpler for common tasks like converting an SDF to a PDBQT.

In the cell below, we execute a command that converts our ligands that in we prepared in the `molecule_manipulation` notebook to a PDBQT file.

In [31]:
# Use meeko to prepare small molecules - using meeko helps us visualize them later.
! mk_prepare_ligand.py -i ligands/resADP.sdf -o pdbqt/resADP.pdbqt

In [None]:
## python prepare_ligand4.py -l resBUF.pdb -o resBUF.pdbqt -v desde Lib site packages
