# Enzyme Commission Class with Ligands
This notebook is intended to help users find ligands for use with docking studies. Here are the steps in the process, all of which will employ the `rcsbsearchapi` package. 
1. Find PDB structures of a given Enzyme Commission class.
2. Select those structures that contain bound small molecules with molecular weights between 300 and 600.
3. Output a list of those ligands
4. Save the ligand structures to the "ligands_for_EC_class_#" folder.

Text (actually markdown) cells will be inserted to explain each step.

In [110]:
# Import the components of rcsbsearchapi needed for this search
from rcsbsearchapi import rcsb_attributes as attrs

## Making queries
To make a query with `rcsbsearchapi` you first must know what you are looking for. I find it helpful to actually write this out by hand sometimes. Here are the characteristics I am looking for in ligands that bind to a specific Enzyme Commission Class of a protein.
- EC Class. I will focus on the EC class for trypsin, 3.4.21.4, but any class should work.
- Ligands. I am looking for ligands that are larger than a single atom (e.g., potassium ion) or a buffer molecule (phosphate), but of a size that consists of 10-30 heavy atoms, so I will aim for a molecular weight between 300 and 800.

In [123]:
# There will be three components to the query, which will be labeled q1, q2 and q3.

ECnumber = "3.4.21.4"     # We will use this variable again later

q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber  # looking for trypsins
q2 = attrs.chem_comp.formula_weight >= 300                       # setting the lower limit for molecular weight
q3 = attrs.chem_comp.formula_weight <= 800                       # setting the upper limit for molecular weight

query = q1 & q2 & q3              # combining the three queries into one

resultL = list(query())           # assign the results of the query to a list variable

print(resultL[0:10])              # list the first 10 results

len(resultL)

['1AQ7', '1AUJ', '1AZ8', '1BJV', '1BTW', '1BTX', '1BTZ', '1C1S', '1C1T', '1C2D']


180

### Finding the ligands

This query provided the list of the first 10 PDB entries for trypsins (EC # 3.4.21.4) that contain ligands between 300 and 800 molecular weight. The last statement

`len(resultL)`

tells us how many PDB entries have ligands of that size. However, a different command must be entered to get the list of the ligands themselves. Instead of the PDB entries, which represent the entire structure of protein and ligand(s), the ligands are identified by `mol_definition` so we'll use that.

In [127]:
molResultL = list(query("mol_definition"))
print("There are",len(molResultL), "ligands for EC Number", ECnumber, "in this list.")
molResultL[0:10]

There are 112 ligands for EC Number 3.4.21.4 in this list.


['0CA', '0CB', '0KV', '0ZG', '0ZW', '0ZX', '0ZY', '10U', '11U', '12U']

### Where can we go to download the ligand files?

To download the files for ligands bound to trypsin in the RCSB PDB, execute the two cells above for finding the trypsin ligands. This will reset the results to the ones we want.

Once this is done, the next step is to determine exactly what we want to download. These ligand files in the PDB are avaiable for download in several formats. A full list and description can be found in the [Small Molecule File table](https://www.rcsb.org/docs/programmatic-access/file-download-services#small-molecule-files) on the [RCSB PDB File Download Services page](https://www.rcsb.org/docs/programmatic-access/file-download-services), which is pasted in here.

![Small molecule file formats that can be downloaded from the RCSB PDB](images/SmallMoleculeFilesTable.png "a title")

From this table, we want the ligand files in mol2 format, which we will later convert to another format called `pdbqt` for docking.

### How do we download the ligand files?

There are several options for downloading files - we will use a Python libary called requests. In the following cells, we will import the library, `requests`, download a single file from the RCSB PDB using the `requests.get` function, and check to make sure the file downloaded properly to the ligands folder. If that is successful, we'll use a `for` loop to download all of the files from our `molResultL` list to the ligands folder.

In [128]:
import requests

In [129]:
# Download one of the files from our list: 11U.mol2

res11U_mol2 = requests.get('https://files.rcsb.org/ligands/download/11U_ideal.mol2')

In [130]:
# check to see that the file downloaded properly. A status code of 200 means everything is okay.

res11U_mol2.status_code

200

In [131]:
# To really be sure, let's look at the file one line at a time. First we write the downloaded content to a file.

with open("ligands/res11u.mol2", "wb") as file:
    file.write(res11U_mol2.content)

In [132]:
# Now we use these commands to read the file and make sure it downloaded properly. As an alternative, we
# could go to the ligands folder in our Jupyter desktop and click on res11u.mol2 to make sure it looks correct.

file1 = open('ligands/res11U.mol2', 'r')
Lines = file1.readlines()
 
count = 0
# Strips the newline character
for line in Lines:
    count += 1
    print("Line{}: {}".format(count, line.strip()))

Line1: @<TRIPOS>MOLECULE
Line2: 11U
Line3: 59    61     0     0     0
Line4: SMALL
Line5: NO_CHARGES
Line6: 
Line7: @<TRIPOS>ATOM
Line8: 1 C1          2.4220    0.4070    0.3360 C.2       1 11U_ideal         0.0000
Line9: 2 O1          2.0060   -0.6420    0.7800 O.2       1 11U_ideal         0.0000
Line10: 3 C2          3.8690    0.5350   -0.0630 C.3       1 11U_ideal         0.0000
Line11: 4 N1          4.5590   -0.7380    0.1810 N.3       1 11U_ideal         0.0000
Line12: 5 C3          5.9760   -0.6510   -0.1970 C.3       1 11U_ideal         0.0000
Line13: 6 C4          6.7790   -0.0680    0.9670 C.3       1 11U_ideal         0.0000
Line14: 7 C5          8.2550    0.0240    0.5730 C.3       1 11U_ideal         0.0000
Line15: 8 C6          8.7810   -1.3740    0.2400 C.3       1 11U_ideal         0.0000
Line16: 9 C7          7.9780   -1.9570   -0.9250 C.3       1 11U_ideal         0.0000
Line17: 10 C8          6.5020   -2.0480   -0.5310 C.3       1 11U_ideal         0.0000
Line18: 11 

### Downloading all of the ligands using a for loop

Now that we know that our process functions, we will use a `for` loop to download the entire list of ligands (all 112) in a single cell. Here are the steps we will take:

1. Define a variable, baseUrl, for the URL where the ligand files are located. The URL only lacks the specific name of the ligand file.
2. Set up a `for` loop to go through each of the items (as ChemID) in the molResultL list that was generated above.
3. Assign the filename based on a variable (the 3-letter name of the ligand as ChemID followed by \_ideal.mol2) to the variable cFile.
4. Assign the full URL (as cFileUrl) that we want to use to download the data from the RCSB PDB API. Notice that the URL will consist of the baseUrl (defined in the first line of the cell) followed by the name of the file we just defined, which is now assigned to the variable, cFile. 
5. Tell the notebook that we want the file (CFileLocal) to be written to the ligands folder, using the os.path command.
6. Use the API call via `requests.get` to download the data from the RCSB PDB.
7. Write the file using the `with open` function.

If all goes according to plan, this should download all of the ligands on our list to the ligands folder.


In [133]:
baseUrl = "https://files.rcsb.org/ligands/download/"

for ChemID in molResultL:
    cFile = f"{ChemID}_ideal.mol2"
    cFileUrl = os.path.join(baseUrl, cFile)
    cFileLocal = os.path.join("ligands", cFile)
    response = requests.get(cFileUrl)
    with open(cFileLocal, "wb") as file:
        file.write(response.content)

### Selected ligands

For our next notebook, we are going to select and modify one of the ligands from the list. Any of them could be used, but we will be using [BRV 5-amino-2,4,6-tribromobenzene-1,3-dicarboxylic acid](https://www.rcsb.org/ligand/BRV).