# PDBe Aggregated API - A step-by-step example

This Jupyter Notebook provides step-by-step instructions for querying the PDBe Aggregated API and retrieving information on predicted binding sites, macromolecular interaction interfaces and observed ligands for the protein Thrombin using Python3 programming language.

# Step 1 - Import necessary dependencies

In order to query the API, import the `requests` library.

In [1]:
import requests

# Step 2 - Choose a UniProt accession and the necessary API endpoints

All the API endpoints have keys that the users must provide. For this example, we will use API endpoints that are keyed on a UniProt accession.

The UniProt accession of Thrombin is "P00734".

For this example, we are interested in functional annotations of Thrombin which are provided to PDBe-KB [1] by consortium partner resources such as P2rank [2] and canSAR [3]. We are also interested in all the macromolecular interaction interface residues of Thrombin, as calculated by the PDBe PISA service [4], and all the observed ligand binding sites, as calculated by Arpeggio [5].

In order to retrieve this (and any other) information, users should study the documentation page of the PDBe Aggregated API:
<a href="https://pdbe.org/graph-api" target="_blank">
    
We set the variables below for the UniProt accession of Thrombin, and the API endpoint URLs we will use.

In [2]:
ACCESSION = "P00734"
ANNOTATIONS_URL = f"https://www.ebi.ac.uk/pdbe/graph-api/uniprot/annotations/{ACCESSION}"
INTERACTIONS_URL = f"https://www.ebi.ac.uk/pdbe/graph-api/uniprot/interface_residues/{ACCESSION}"
LIGANDS_URL = f"https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/{ACCESSION}"

# Step 3 - Define helper functions

We will define a few helper functions to avoid code repetition when retrieving data from the API.

In [3]:
def get_data(accession, url):
    """
    Helper function to get the data from an API endpoint using an accession as key
    
    :param accession: String; a UniProt accession
    :param url: String; a URL to an API endpoint
    :return: Response object or None
    """

    try:
        return requests.get(url)
    except Error as err:
        print("There was an error while retrieving the data: %s" % err)
        
def parse_data(data):
    """
    Helper function to parse a response object as JSON
    
    :param data: Response object; data to be parsed
    :return: JSON object or None
    """
    
    # Check if the status code is 200 and raise error if not
    if data.status_code == 200:
        return data.json()
    else:
        raise ValueError('No data received')

# Step 4 - Get annotations data

We will use the annotations API endpoint (defined as `ANNOTATIONS_URL`) to get the functional annotations for Thrombin (defined as `ACCESSION`)

In [4]:
annotations_data = parse_data(get_data(ACCESSION, ANNOTATIONS_URL))

We then filter the data for the predicted binding sites annotations provided by P2rank and canSAR.

In [5]:
all_predicted_ligand_binding_residues = list()

for provider_data in annotations_data[ACCESSION]["data"]:
    if provider_data["accession"] in ["p2rank", "cansar"]:
        residues = [x["startIndex"] for x in provider_data["residues"]]
        all_predicted_ligand_binding_residues.extend(residues)

These are the residues which are annotated as predicted ligand binding sites: 

In [6]:
print(all_predicted_ligand_binding_residues)

[136, 237, 246, 251, 265, 273, 324, 329, 330, 331, 332, 333, 334, 336, 372, 383, 386, 388, 389, 390, 391, 396, 400, 406, 407, 410, 413, 414, 415, 417, 434, 436, 459, 493, 506, 507, 510, 511, 530, 541, 549, 565, 566, 568, 572, 574, 585, 589, 590, 591, 596, 597, 605, 613, 615, 617]


# Step 5 - Get interaction interfaces data

We will use the interaction interfaces API endpoint (defined as `INTERACTIONS_URL`) to get all the macromolecular interaction interface residues of Thrombin (defined as `ACCESSION`)

In [7]:
interactions_data = parse_data(get_data(ACCESSION, INTERACTIONS_URL))

We then list the macromolecular interaction partners of Thrombin:

In [8]:
interaction_partner_names = list()
for item in interactions_data[ACCESSION]["data"]:
    interaction_partner_names.append(item["name"])
print(interaction_partner_names)

['Prothrombin', 'Hirudin variant-1', 'Proteinase-activated receptor 1', 'Other', 'DNA', 'Tsetse thrombin inhibitor', 'Hirudin variant-2 (Fragment)', 'Hirudin-2', 'Salivary anti-thrombin peptide anophelin', 'Thrombomodulin', 'Heparin cofactor 2', 'Thrombin inhibitor madanin 1', 'Antithrombin-III', 'Staphylocoagulase (Fragment)', 'Thrombininhibitor', 'AGAP008004-PA', 'Pancreatic trypsin inhibitor', 'Uncharacterized protein avahiru', 'RNA', 'Fibrinogen alpha chain', 'Glia-derived nexin', 'Fibrinogen gamma chain', 'Hirudin-2B', 'Variegin', 'Proteinase-activated receptor 4', 'Plasma serine protease inhibitor', 'Hirudin-3A', 'Vitamin K-dependent protein C', 'Platelet glycoprotein Ib alpha chain', 'Hirullin-P18', 'BIVALIRUDIN C-terminus fragment', 'Coagulation factor V', "Hirudin-2'", "Hirudin-3B'", 'D-phenylalanyl-L-prolyl-N~5~-[amino(iminio)methyl]-D-ornithyl-L-cysteinamide', 'Kininogen-1', 'D-phenylalanyl-L-prolyl-N~5~-[amino(iminio)methyl]-D-ornithyl-D-threoninamide', 'Hirudin-PA', "Hirud

We can see it has many interaction partners, and several of them are variants of Hirudin, a natural inhibitor of Thrombin. We will use `Hirudin variant-1` for the next steps of this example.

# Step 6 - Compare the interaction interface residues between Thrombin and Hirudin (variant-1)

We compare the predicted ligand binding site residues with the interaction interface residues of Thrombin that interact with Hirudin (variant 1)

In [9]:
interface_residues_with_hirudin = list()

for item in interactions_data[ACCESSION]["data"]:
    if item["name"] == "Hirudin variant-1":
        interacting_residues = [x["startIndex"] for x in item["residues"] if x["startIndex"] in all_predicted_ligand_binding_residues]
        interface_residues_with_hirudin.extend(interacting_residues)

We can see that there are 9 residues found in the region between GLU388 and GLY591 which both interact with Hirudin and are predicted to bind small molecules:

In [10]:
print(interface_residues_with_hirudin)

[388, 406, 434, 541, 565, 566, 568, 589, 591]


### Summary of the results so far

Using the PDBe Aggregated API we could retrieve all the residues of Thrombin which are predicted to bind small molecules. We then retrieved the data on macromolecular interactions between Thrombin and other proteins/peptides. We could see that Thrombin interacts with several variants of Hirudin.

Next, we compared the predicted ligand binding sites with the interaction interface residues and saw that there is a region on the sequence of Thrombin where several potential target residues can be found.

# Step 7 - Retrieving observed ligand binding sites

Next, we retrieve all the binding sites using the ligand sites API endpoint (defined as `LIGANDS_URL`) to get all the ligand binding residues of Thrombin (defined as `ACCESSION`)

In [11]:
ligands_data = parse_data(get_data(ACCESSION, LIGANDS_URL))

In [12]:
ligand_list = list()

for ligand in ligands_data[ACCESSION]["data"]:
    for residue in ligand["residues"]:
        if residue["startIndex"] in interface_residues_with_hirudin:
            ligand_list.append(ligand["accession"])
            break

Finally, we compare the ligands found in the PDB with the annotations and interaction interfaces we have collated in the previous steps, and we find that indeed there are many small molecules, such as TYS, MRD, P6G that interact with the Thrombin residues which form the macromolecular interaction interface with Hirudin (variant-1).

In [13]:
print("There are %i ligands observed in PDB that bind to this " % len(ligand_list))

There are 279 ligands observed in PDB that bind to this 


In [14]:
print("These are the Chemical Componant identifiers of the ligands:")
print(ligand_list)

These are the Chemical Componant identifiers of the ligands:
['8K2', 'FQI', 'TYS', 'DPN', '71F', 'BAM', 'WCE', 'HBD', 'OJK', 'DKQ', '02N', 'Y4L', 'SZ4', 'C2A', 'ABN', 'APA', 'BEN', 'ESI', 'PRL', 'BT3', 'BT2', 'BZT', 'C2D', 'BAI', 'BAH', 'BAB', '897', '896', '501', '4ND', 'R11', 'DKK', 'I26', 'I25', 'I50', 'C1M', '382', 'L03', '121', 'BMZ', '130', '696', '132', '166', '167', 'GR1', 'L02', 'CR9', 'D6Y', 'NLI', '120', '81A', 'C02', 'C7M', 'C5M', 'C4M', 'C3M', 'UIR', 'UIB', 'F25', 'ESH', '348', 'UIP', 'FSN', 'SHY', 'R56', '0IT', 'L86', 'T76', '1ZV', 'MRQ', 'ODB', 'G44', 'QQW', 'QQE', 'N6H', 'QQK', 'QQT', 'QQ5', 'QQN', 'BT1', 'BPP', 'T42', 'MUQ', '0NW', 'GR4', 'ALZ', 'SJR', 'C24', '165', '2OJ', '2FN', '00R', 'IH3', 'MUZ', 'GAH', 'T19', 'PHW', 'PHV', '34P', 'P05', 'GOZ', 'M6Q', 'LXW', 'MJK', '3SP', 'O5Z', 'J5K', '99P', 'P97', 'CDO', 'B03', 'B01', 'MM9', 'M6S', 'M4Z', 'MVF', 'MEL', 'M67', '45S', 'S49', 'S00', 'S04', '46U', '45U', 'KDQ', 'M32', 'EU5', 'BJA', 'S28', 'M41', 'M34', 'WX5', 'TIF', 

## References

* [1] PDBe-KB: PDBe-KB consortium (2020). PDBe-KB a community-driven resource for structural and functional annotations. Nucleic Acids Res, 48(D1), D344-D353. doi:10.1093/nar/gkz853
* [2] Krivák, R., & Hoksza, D. (2018). P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform. 2018 Aug 14;10(1):39. doi:10.1186/s13321-018-0285-8
* [3] canSAR: Coker, E. A. et al. (2019). canSAR: update to the cancer translational research and drug discovery knowledgebase. Nucleic Acids Res, 47(D1), D917-D922. doi:10.1093/nar/gky1129
* [4] Krissinel, E., & Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. J Mol Biol, 372(3), 774-797. doi:10.1016/j.jmb.2007.05.022
* [5] Jubb, H. C. et al. (2017). Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol. 2017 Feb 3;429(3):365-371. doi:10.1016/j.jmb.2016.12.004