# PDBe API Training

### PDBe Interactions

This tutorial will guide you through searching PDBe programmatically.


First we will import the code which will do the work
Run the cell below - by pressing the green play button.

In [1]:
from pprint import pprint
import sys
sys.path.insert(0,'..') # to ensure the below import works in all Jupyter notebooks
from python_modules.api_modules import run_sequence_search, pandas_dataset, get_url_with_accession, pdbe_kb_interacting_residues_api


Now we are ready to actually run the sequence search we did in the last module

We will search for a sequence with an example sequence from UniProt P24941 -
Cyclin-dependent kinase 2

In [2]:
sequence_to_search = """
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNH
PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS
HRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYY
STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF
PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL"""

filter_list = ['pfam_accession', 'pdb_id', 'molecule_name', 'ec_number',
               'uniprot_accession_best', 'tax_id']

search_results = run_sequence_search(sequence_to_search,
                                     filter_terms=filter_list,
                                     number_of_rows=1000
                                     )

Number of results 1000


In [3]:
df = pandas_dataset(search_results)
df = df.query('percentage_identity > 50')
group_by_uniprot = df.groupby('uniprot_accession_best').count().sort_values('pdb_id', ascending=False)

How many UniProt accessions were there?

In [4]:
len(group_by_uniprot)

29

In [5]:
group_by_uniprot

Unnamed: 0_level_0,ec_number,entity_id,entry_entity,molecule_name,pdb_id,pfam_accession,tax_id,e_value,percentage_identity
uniprot_accession_best,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
P24941,413,413,413,413,413,413,413,413,413
P20248,0,95,95,95,95,95,95,95,95
P30274,0,16,16,16,16,16,16,16,16
P06493,10,10,10,10,10,10,10,10,10
P14635,0,8,8,8,8,8,8,8,8
Q00535,6,6,6,6,6,6,6,6,6
Q15078,0,5,5,5,5,3,5,5,5
K9J4F7,0,5,5,5,5,5,5,5,5
P17157,4,4,4,4,4,4,4,4,4
Q07785,4,4,4,4,4,4,4,4,4


get the first UniProt

In [6]:
first_uniprot = df['uniprot_accession_best'].iloc[0]

first_uniprot

'P24941'

Get compounds which interact with the UniProt

In [7]:
url = pdbe_kb_interacting_residues_api
print(url)
data = get_url_with_accession(url=url, accession=first_uniprot)

https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/
