# PDBe API Training

### PDBe Predicted models for a given protein

This tutorial will guide you through obtaining all the experimental and predicted models programmatically.

## Introduction
The 3D-Beacons Network facilitates the aggregation of coordinate files and metadata for both experimental and theoretical protein models. It encompasses a wide range of state-of-the-art and specialized model providers, as well as data from the Protein Data Bank (PDB).

Model providers:
* PDBe
* SWISS-MODEL
* AlphaFold DB
* Genome3D
* SASBDB
* AlphaFill
* ModelArchive
* Protein Ensemble Database

For more information, visit https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/

## Setup

First we will import the code which is required to search the API and reformat the results.

Run the cell below - by pressing the play button.

In [None]:
import sys
sys.path.insert(0,'..')
from tutorial_utilities.api_modules import get_url
import pandas as pd

---
---

## Obtaining the data

The following code retrieves all available structures in 3D-Beacons from a single Uniprot accession ID.

We will retrieve entries for the Human Acetylcholinesterase, which has the UniProt accession P22303.

In [None]:
BASE_URL = "https://www.ebi.ac.uk/pdbe/"
PDBEKB_3BBEACONS_URL = BASE_URL + "pdbe-kb/3dbeacons/api/uniprot/summary/"

def get_all_models(uniprot_accession):
    """
    Get all models for a given uniprot accession
    """
    # Example of a lambda function, which is a small nameless function that has no 
    # previous definition
    dictfilt = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])
    url = f"{PDBEKB_3BBEACONS_URL}/{uniprot_accession}.json"

    data = get_url(url=url)
    data_to_ret = []
    structures = data['structures']
    
    for row in structures:
        my_row = row['summary']
        # Example of list comprehension to quicly create a list
        necc_rows = [keys for keys in my_row.keys() if keys !='entities']
        necc_rows = dictfilt(my_row,necc_rows)

        for item in my_row['entities'] :
            # Example of dictionary comprehension to quicly create a dictionary
            dict3 = {k:v for d in (necc_rows,item) for k,v in d.items()}
            data_to_ret.append(dict3)

    return data_to_ret

In [None]:
uniprot_accession = 'P22303'
result= get_all_models(uniprot_accession)

df_models = pd.DataFrame(result)
df_models.head()

You can filter the models based on Model categories.

3D-Beacons provides four different types of models:
1. EXPERIMENTALLY DETERMINED
2. CONFORMATIONAL ENSEMBLE
3. TEMPLATE-BASED
4. AB-INITIO

Let's look at total number of models available for each of these different types of model categories in this example.

In [None]:
df_models['model_category'].value_counts()

Let's get all the template-based models for this protein.

In [None]:
df_template_models = df_models[df_models['model_category'] == 'TEMPLATE-BASED']
df_template_models.head()

You can also filter this data by provider and experimental method. 

Let's filter all the structures solved by `X-RAY DIFFRACTION` provided by PDBe. 

In [None]:
df_xray_models = df_models[
    ( df_models['provider'] == 'PDBe' ) 
    & 
    ( df_models['experimental_method']=='X-RAY DIFFRACTION' )
]
df_xray_models.head()