# PDBe API Training

### PDBe Interactions

This tutorial will guide you through searching PDBe programmatically.


First we will import the code which will do the work
Run the cell below - by pressing the green play button.

In [3]:
import sys
sys.path.insert(0,'..') # to ensure the below import works in all Jupyter notebooks
from python_modules.api_modules import run_sequence_search, explode_dataset, get_ligand_site_data


Now we are ready to actually run the sequence search we did in the last module

We will search for a sequence with an example sequence from UniProt P24941 -
Cyclin-dependent kinase 2

In [5]:
sequence_to_search = """
MEDAKNIKKGPAPFYPLEDGTAGEQLHKAMKRYALVPGTIAFTDAHIEVNITYAEYFEMS
VRLAEAMKRYGLNTNHRIVVCSENSLQFFMPVLGALFIGVAVAPANDIYNERELLNSMNI
SQPTVVFVSKKGLQKILNVQKKLPIIQKIIIMDSKTDYQGFQSMYTFVTSHLPPGFNEYD
FVPESFDRDKTIALIMNSSGSTGLPKGVALPHRTACVRFSHARDPIFGNQIIPDTAILSV
VPFHHGFGMFTTLGYLICGFRVVLMYRFEEELFLRSLQDYKIQSALLVPTLFSFFAKSTL
IDKYDLSNLHEIASGGAPLSKEVGEAVAKRFHLPGIRQGYGLTETTSAILITPEGDDKPG
AVGKVVPFFEAKVVDLDTGKTLGVNQRGELCVRGPMIMSGYVNNPEATNALIDKDGWLHS
GDIAYWDEDEHFFIVDRLKSLIKYKGYQVAPAELESILLQHPNIFDAGVAGLPDDDAGEL
PAAVVVLEHGKTMTEKEIVDYVASQVTTAKKLRGGVVFVDEVPKGLTGKLDARKIREILI
KAKKGGKSKL
"""
filter_list = ['pfam_accession', 'pdb_id', 'molecule_name', 'ec_number',
               'uniprot_accession_best', 'tax_id']

search_results = run_sequence_search(sequence_to_search,
                                     filter_terms=filter_list,
                                     number_of_rows=1000
                                     )

Number of results 222


In [8]:
df = explode_dataset(search_results)
df = df.query('percentage_identity > 80')
group_by_uniprot = df.groupby('uniprot_accession_best').count().sort_values('pdb_id', ascending=False)

How many UniProt accessions were there?

In [9]:
len(group_by_uniprot)

2

In [10]:
group_by_uniprot

Unnamed: 0_level_0,chain_id,ec_number,entity_id,entry_entity,molecule_name,pdb_id,pfam_accession,tax_id,e_value,percentage_identity,result_sequence
uniprot_accession_best,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
P08659,48,48,48,48,48,48,46,48,48,48,0
Q5UFR2,4,4,4,4,4,4,4,4,4,4,0


get the first UniProt

In [11]:
uniprot_accession = df['uniprot_accession_best'].iloc[0]

uniprot_accession

'P08659'

Get compounds which interact with the UniProt

In [12]:
ligand_data = get_ligand_site_data(uniprot_accession=uniprot_accession)
df2 = explode_dataset(ligand_data)

https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/P08659


In [13]:
df2.head()


Unnamed: 0,startIndex,endIndex,startCode,endCode,indexType,interactingPDBEntries,allPDBEntries,ligand_accession,ligand_name,ligand_num_atoms,uniprot_accession,interation_ratio
0,316,316,GLY,GLY,UNIPROT,"{'pdbId': '5gyz', 'entityId': 1, 'chainIds': 'A'}",5gyz,AMP,ADENOSINE MONOPHOSPHATE,23,P08659,1.0
1,317,317,ALA,ALA,UNIPROT,"{'pdbId': '5gyz', 'entityId': 1, 'chainIds': 'A'}",5gyz,AMP,ADENOSINE MONOPHOSPHATE,23,P08659,1.0
2,318,318,PRO,PRO,UNIPROT,"{'pdbId': '5gyz', 'entityId': 1, 'chainIds': 'A'}",5gyz,AMP,ADENOSINE MONOPHOSPHATE,23,P08659,1.0
3,339,339,GLY,GLY,UNIPROT,"{'pdbId': '5gyz', 'entityId': 1, 'chainIds': 'A'}",5gyz,AMP,ADENOSINE MONOPHOSPHATE,23,P08659,1.0
4,340,340,TYR,TYR,UNIPROT,"{'pdbId': '5gyz', 'entityId': 1, 'chainIds': 'A'}",5gyz,AMP,ADENOSINE MONOPHOSPHATE,23,P08659,1.0
