# Enzyme Commission Class with Ligands

[BRENDA Home Page](https://www.brenda-enzymes.org/index.php).

Look for ligands that will bind to quinol-cytochrome-c reductase.Employ the `rcsbsearchapi` package.

1. Find PDB structures of a given Enzyme Commission class.
2. Select those structures that contain bound small molecules with molecular weights between 300 and 600.
3. Output a list of those ligands
4. Save the ligand structures to the "ligands_for_EC_class_#" folder.

### Libraries

| Library         | abbreviation | Purpose |
|:-------------|:---------:|:------------|
| rcsbsearchapi | N/A      | functions for searching the Protein Data Bank based on the mmCIF dictionary |
| os           | N/A      | operating system functions - handling file paths and directories. |
| requests     | N/A  | access APIs for databases |
| rdkit | rdkit | an open source github repository of cheminformatics software|
| rdkit.Chem | Chem | a subset of rdkit that supports file string to structure conversions |
| rdkit.Chem.AllChem | AllChem | a subset of rdkit.Chem that supports energy optimization |
| rdkit.Chem.Draw | Draw | a subset of rdkit that supports chemical drawing in Python |
| vina | vina | AutoDock Vina software for Python and Jupyter notebooks |



In [None]:
# Import the components of rcsbsearchapi needed for this search
from rcsbsearchapi import rcsb_attributes as attrs

## Making queries
Use `rcsbsearchapi`.
- EC Class. quinol-cytochrome-c reductase is EC 7.1.1.8.
- Ligands. Look for ligands that are larger than a single atom (e.g., potassium ion) or a buffer molecule (phosphate), but of a size that consists of 10-30 heavy atoms, aim for a molecular weight between 300 and 800.

In [None]:
# There will be three components to the query, which will be labeled q1, q2 and q3.

ECnumber = "7.1.1.8"     # We will use this variable again later

q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber    # looking for trypsin structures with EC = 7.1.1.8
q2 = attrs.chem_comp.formula_weight >= 300                       # setting the lower limit for molecular weight
q3 = attrs.chem_comp.formula_weight <= 800                       # setting the upper limit for molecular weight

query = q1 & q2 & q3              # combining the three queries into one

resultL = list(query())           # assign the results of the query to a list variable

print(resultL[0:10])              # list the first 10 results

print("There are", len(resultL), "quinol-cytochrome-c reductase structures that contain ligands in the RCSB PDB.")

### Finding the ligands

This query provided the list of the PDB entries for trypsins (EC 7.1.1.8.) that contain ligands between 300 and 800 molecular weight.
Print the first 10 of these results using `print(resultL[0:10])`.

In [None]:
molResultL = list(query("mol_definition"))
print("There are",len(molResultL), "ligands for EC Number", ECnumber, "in this list. Here is a list of the first 10 ligands.")
molResultL[0:10]

### Download the ligand files

Use a Python libary called `requests`. Import the library, `requests`, download a single file from the RCSB PDB using the `requests.get` function, and check to make sure the file downloaded properly to the ligands folder. If that is successful, use a `for` loop to download all of the files from `molResultL` list to the ligands folder.

In [None]:
import requests  # to enable us to pull files from the PDB
import os        # to enable us to create a directory to store the files

In [None]:
# Download one of the files from our list: 11U.sdf

resAZO_sdf = requests.get('https://files.rcsb.org/ligands/download/AZO_ideal.sdf')

In [None]:
# check to see that the file downloaded properly. A status code of 200 means everything is okay.

resAZO_sdf.status_code

In [None]:
# To really be sure, let's look at the file one line at a time. First we write the downloaded content to a file.

# make a ligands folder for our results
os.makedirs("ligands", exist_ok=True)

with open("ligands/resAZO.sdf", "w+") as file:
    file.write(resAZO_sdf.text)

In [None]:
# Now we use these commands to read the file and make sure it downloaded properly. As an alternative, we
# could go to the ligands folder in our Jupyter desktop and click on res11U.mol2 to make sure it looks correct.

file1 = open('ligands/resAZO.sdf', 'r')
file_text = file1.read() # This reads in the file as a string.

print(file_text)

### Downloading all of the ligands using a for loop

Use a `for` loop to download the entire list of ligands (all 112) in a single cell.

1. Define a variable, baseUrl, for the URL where the ligand files are located. The URL only lacks the specific name of the ligand file.
2. Set up a `for` loop to go through each of the items (as ChemID) in the molResultL list that was generated above.
3. Assign the filename based on a variable (the 3-letter name of the ligand as ChemID followed by \_ideal.mol2) to the variable cFile.
4. Assign the full URL (as cFileUrl) to use to download the data from the RCSB PDB API. Notice that the URL will consist of the baseUrl (defined in the first line of the cell) followed by the name of the file we just defined, which is now assigned to the variable, cFile.
5. Tell the notebook that the file (CFileLocal) to be written to the ligands folder, using the os.path command.
6. Use the API call via `requests.get` to download the data from the RCSB PDB.
7. Write the file using the `with open` function.

In [None]:
baseUrl = "https://files.rcsb.org/ligands/download/"

for ChemID in molResultL:
    cFile = f"{ChemID}_ideal.sdf"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/" + cFile
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)

### Selected ligands

Will use one of the ligands [AZO: METHYL (2Z)-2-(2-{[6-(2-CYANOPHENOXY)PYRIMIDIN-4-YL]OXY}PHENYL)-3-METHOXYACRYLATE](https://www.rcsb.org/ligand/AZO).

<div class="alert alert-block alert-warning">
<h3>Exercise</h3>

To go a bit deeper with these tools, use the [BRENDA Enzyme Database](https://www.brenda-enzymes.org/) to find the EC# for alcohol dehydrogenase (or look for an enzyme that interests you). How many structures have ligands with molecular weights between 400 and 700? How many unique ligands are bound to these structures?

Note: You can enter only the upper levels of an EC Class to identify more ligands. This exercise can be repeated with any EC#. If you have time, try a broader search where you use only 2 or 3 levels, e.g., 3.4 or 3.4.21, and see what you find.
</div>

In [None]:
### Solution

ECnumber = "1.1.1.1"     # We will use this variable again later

q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber  # looking for trypsins
q2 = attrs.chem_comp.formula_weight >= 400                      # setting the lower limit for molecular weight
q3 = attrs.chem_comp.formula_weight <= 700                     # setting the upper limit for molecular weight

query = q1 & q2 & q3              # combining the three queries into one

ResultL = list(query("entry"))
molResultL = list(query("mol_definition"))
print("There are",len(ResultL), "structures from EC Number", ECnumber, "that have bound ligands with molecular weights between 400 and 700).")
print("There are",len(molResultL), "unique ligands for structures with EC Number", ECnumber, "in this list. Here is a list of the", len(molResultL), "ligands.")
molResultL