# PDBe API Training

### PDBe Predicted models for a given protein

This tutorial will guide you through obtaining all the experimental and predicted models programmatically.

## Introduction
The 3D-Beacons Network facilitates the aggregation of coordinate files and metadata for both experimental and theoretical protein models. It encompasses a wide range of state-of-the-art and specialized model providers, as well as data from the Protein Data Bank (PDB).

Model providers:
* PDBe
* SWISS-MODEL
* AlphaFold DB
* Genome3D
* SASBDB
* AlphaFill
* ModelArchive
* Protein Ensemble Database

For more information, visit https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/

## Setup

First we will import the code which is required to search the API and reformat the results.

Run the cell below - by pressing the play button.

In [1]:
import sys
sys.path.insert(0,'..')
from tutorial_utilities.api_modules import get_url
import pandas as pd

---
---

## Obtaining the data

The following code retrieves all available structures in 3D-Beacons from a single Uniprot accession ID.

We will retrieve entries for the Human Acetylcholinesterase, which has the UniProt accession P22303.

In [2]:
BASE_URL = "https://www.ebi.ac.uk/pdbe/"
PDBEKB_3BBEACONS_URL = BASE_URL + "pdbe-kb/3dbeacons/api/uniprot/summary/"

def get_all_models(uniprot_accession):
    """
    Get all models for a given uniprot accession
    """
    # Example of a lambda function, which is a small nameless function that has no 
    # previous definition
    dictfilt = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])
    url = f"{PDBEKB_3BBEACONS_URL}/{uniprot_accession}.json"

    data = get_url(url=url)
    data_to_ret = []
    structures = data['structures']
    
    for row in structures:
        my_row = row['summary']
        # Example of list comprehension to quicly create a list
        necc_rows = [keys for keys in my_row.keys() if keys !='entities']
        necc_rows = dictfilt(my_row,necc_rows)

        for item in my_row['entities'] :
            # Example of dictionary comprehension to quicly create a dictionary
            dict3 = {k:v for d in (necc_rows,item) for k,v in d.items()}
            data_to_ret.append(dict3)

    return data_to_ret

In [3]:
uniprot_accession = 'P22303'
result= get_all_models(uniprot_accession)

df_models = pd.DataFrame(result)
df_models.head()

Unnamed: 0,model_identifier,model_category,model_url,model_format,model_type,model_page_url,provider,number_of_conformers,ensemble_sample_url,ensemble_sample_format,...,confidence_avg_local_score,oligomeric_state,preferred_assembly_id,entity_type,entity_poly_type,identifier,identifier_category,description,chain_ids,oligomeric_state_confidence
0,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,POLYMER,POLYPEPTIDE(L),P22303,UNIPROT,Acetylcholinesterase,"[B, A]",
1,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,NO3,CCD,NITRATE ION,[B],
2,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,EDO,CCD,"1,2-ETHANEDIOL","[B, A]",
3,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,1YL,CCD,Dihydrotanshinone I,"[B, A]",
4,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,NAG,CCD,2-acetamido-2-deoxy-beta-D-glucopyranose,[B],


You can filter the models based on Model categories.

3D-Beacons provides four different types of models:
1. EXPERIMENTALLY DETERMINED
2. CONFORMATIONAL ENSEMBLE
3. TEMPLATE-BASED
4. AB-INITIO

Let's look at total number of models available for each of these different types of model categories in this example.

In [4]:
df_models['model_category'].value_counts()

model_category
EXPERIMENTALLY DETERMINED    382
TEMPLATE-BASED                39
AB-INITIO                     10
Name: count, dtype: int64

Let's get all the template-based models for this protein.

In [5]:
df_template_models = df_models[df_models['model_category'] == 'TEMPLATE-BASED']
df_template_models.head()

Unnamed: 0,model_identifier,model_category,model_url,model_format,model_type,model_page_url,provider,number_of_conformers,ensemble_sample_url,ensemble_sample_format,...,confidence_avg_local_score,oligomeric_state,preferred_assembly_id,entity_type,entity_poly_type,identifier,identifier_category,description,chain_ids,oligomeric_state_confidence
382,P22303_36-574:1f8u.1.A,TEMPLATE-BASED,https://swissmodel.expasy.org/3d-beacons/unipr...,PDB,ATOMIC,https://swissmodel.expasy.org/repository/unipr...,SWISS-MODEL,,,,...,0.91,MONOMER,,POLYMER,POLYPEPTIDE(L),P22303,UNIPROT,model based on template 1f8u.1.A: ACETYLCHOLIN...,[A],
383,P22303_39-602:6i2t.1.B,TEMPLATE-BASED,https://swissmodel.expasy.org/3d-beacons/unipr...,PDB,ATOMIC,https://swissmodel.expasy.org/repository/unipr...,SWISS-MODEL,,,,...,0.78,MONOMER,,POLYMER,POLYPEPTIDE(L),P22303,UNIPROT,model based on template 6i2t.1.B: Cholinesterase,[B],
384,P22303_39-605:6i2t.1.A,TEMPLATE-BASED,https://swissmodel.expasy.org/3d-beacons/unipr...,PDB,ATOMIC,https://swissmodel.expasy.org/repository/unipr...,SWISS-MODEL,,,,...,0.773,MONOMER,,POLYMER,POLYPEPTIDE(L),P22303,UNIPROT,model based on template 6i2t.1.A: Cholinesterase,[A],
386,P22303,TEMPLATE-BASED,https://alphafill.eu/v1/aff/P22303,MMCIF,,https://alphafill.eu/model?id=P22303,AlphaFill,,,,...,,,,POLYMER,POLYPEPTIDE(L),,,Acetylcholinesterase,[A],
387,P22303,TEMPLATE-BASED,https://alphafill.eu/v1/aff/P22303,MMCIF,,https://alphafill.eu/model?id=P22303,AlphaFill,,,,...,,,,NON-POLYMER,,,,ZINC ION,"[B, C, D, L, Q]",


You can also filter this data by provider and experimental method. 

Let's filter all the structures solved by `X-RAY DIFFRACTION` provided by PDBe. 

In [6]:
df_xray_models=df_models[
    ( df_models['provider'] == 'PDBe' ) 
    & 
    ( df_models['experimental_method']=='X-RAY DIFFRACTION' )
]
df_xray_models.head()

Unnamed: 0,model_identifier,model_category,model_url,model_format,model_type,model_page_url,provider,number_of_conformers,ensemble_sample_url,ensemble_sample_format,...,confidence_avg_local_score,oligomeric_state,preferred_assembly_id,entity_type,entity_poly_type,identifier,identifier_category,description,chain_ids,oligomeric_state_confidence
0,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,POLYMER,POLYPEPTIDE(L),P22303,UNIPROT,Acetylcholinesterase,"[B, A]",
1,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,NO3,CCD,NITRATE ION,[B],
2,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,EDO,CCD,"1,2-ETHANEDIOL","[B, A]",
3,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,1YL,CCD,Dihydrotanshinone I,"[B, A]",
4,4m0e,EXPERIMENTALLY DETERMINED,https://www.ebi.ac.uk/pdbe/static/entry/4m0e_u...,MMCIF,,https://www.ebi.ac.uk/pdbe/entry/pdb/4m0e,PDBe,,,,...,,,,NON-POLYMER,,NAG,CCD,2-acetamido-2-deoxy-beta-D-glucopyranose,[B],
