# PDBe API Training

### PDBe Interactions

This tutorial will guide you through searching PDBe programmatically.


First we will import the code which will do the work
Run the cell below - by pressing the green play button.

In [3]:
import sys
sys.path.insert(0,'..') # to ensure the below import works in all Jupyter notebooks
from python_modules.api_modules import run_sequence_search, pandas_dataset, get_ligand_site_data


Now we are ready to actually run the sequence search we did in the last module

We will search for a sequence with an example sequence from UniProt P24941 -
Cyclin-dependent kinase 2

In [4]:
sequence_to_search = """
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNH
PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS
HRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYY
STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF
PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL"""

filter_list = ['pfam_accession', 'pdb_id', 'molecule_name', 'ec_number',
               'uniprot_accession_best', 'tax_id']

search_results = run_sequence_search(sequence_to_search,
                                     filter_terms=filter_list,
                                     number_of_rows=1000
                                     )

Number of results 1000


In [5]:
df = pandas_dataset(search_results)
df = df.query('percentage_identity > 50')
group_by_uniprot = df.groupby('uniprot_accession_best').count().sort_values('pdb_id', ascending=False)

How many UniProt accessions were there?

In [6]:
len(group_by_uniprot)

29

In [7]:
group_by_uniprot

Unnamed: 0_level_0,ec_number,entity_id,entry_entity,molecule_name,pdb_id,pfam_accession,tax_id,e_value,percentage_identity
uniprot_accession_best,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
P24941,418,418,418,418,418,418,418,418,418
P20248,0,96,96,96,96,96,96,96,96
P30274,0,16,16,16,16,16,16,16,16
P06493,10,10,10,10,10,10,10,10,10
P14635,0,8,8,8,8,8,8,8,8
Q00535,6,6,6,6,6,6,6,6,6
Q15078,0,5,5,5,5,3,5,5,5
K9J4F7,0,5,5,5,5,5,5,5,5
P17157,4,4,4,4,4,4,4,4,4
Q07785,4,4,4,4,4,4,4,4,4


get the first UniProt

In [8]:
uniprot_accession = df['uniprot_accession_best'].iloc[0]

uniprot_accession

'P24941'

Get compounds which interact with the UniProt

In [13]:
ligand_data = get_ligand_site_data(uniprot_accession=uniprot_accession)
df2 = pandas_dataset(ligand_data)

https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/P24941


In [14]:
df2.head()


Unnamed: 0,startIndex,endIndex,startCode,endCode,indexType,interactingPDBEntries,allPDBEntries,ligand_accession,ligand_name,ligand_num_atoms,uniprot_accession,interation_ratio
0,11,11,GLY,GLY,UNIPROT,"{'pdbId': '3qhr', 'entityId': 1, 'chainIds': '...","3qhr,3qhw,4i3z,4ii5",ADP,ADENOSINE-5'-DIPHOSPHATE,27,P24941,0.75
1,13,13,GLY,GLY,UNIPROT,"{'pdbId': '3qhr', 'entityId': 1, 'chainIds': '...","3qhr,3qhw,4i3z,4ii5",ADP,ADENOSINE-5'-DIPHOSPHATE,27,P24941,0.5
2,15,15,TYR,TYR,UNIPROT,"{'pdbId': '3qhr', 'entityId': 1, 'chainIds': 'A'}","3qhr,3qhw,4i3z,4ii5",ADP,ADENOSINE-5'-DIPHOSPHATE,27,P24941,0.25
3,16,16,GLY,GLY,UNIPROT,"{'pdbId': '3qhr', 'entityId': 1, 'chainIds': '...","3qhr,3qhw,4i3z,4ii5",ADP,ADENOSINE-5'-DIPHOSPHATE,27,P24941,0.5
4,18,18,VAL,VAL,UNIPROT,"{'pdbId': '3qhr', 'entityId': 1, 'chainIds': '...","3qhr,3qhw,4i3z,4ii5",ADP,ADENOSINE-5'-DIPHOSPHATE,27,P24941,1.0
