# PDBe API Training

### PDBe Macromolecular Interactions for a given protein

This tutorial will guide you through searching PDBe for macromolecular interactions programmatically.

### Setup

First we will import the code which is required to search the API and plot the results.

Run the cell below - by pressing the play button.

In [None]:
import pandas as pd
from pprint import pprint
import sys
sys.path.insert(0,'..')
from tutorial_utilities.api_modules import explode_dataset, get_macromolecule_interaction_data

### Obtaining data

Now we are ready to find all the macromolecular interactions that a protein in the PDB archive forms.

We will get the macromolecular interactions of Human Acetylcholinesterase, which has the UniProt accession P22303.

In [None]:
uniprot_accession = "P22303"
interaction_data = get_macromolecule_interaction_data(uniprot_accession=uniprot_accession)

In [None]:
pprint(interaction_data[0])

### Reformatting the data

The output results of the query contain all the information about the macromolecular interactions, however it is in a complex nested list that makes it difficult to parse without reformatting.

The following code simplifies the data, flattening the nested format:

In [None]:
# Reformat data to make it a list of the macromolecular interactions found in the PDB archive for the protein
df_exploded = explode_dataset(result=interaction_data, column_to_explode='interactingPDBEntries')

In [None]:
print(df_exploded)

### Exploring the data

The following code lists all the unique macromolecules (UniProt IDs) that interact with human Acetylcholinesterase in the PDB archive:

**--The following filtering fulfils Project Aim 1B--**

In [None]:
df_exploded['interaction_accession'].unique()

Some post processing is required to reformat interactingPDBEntries into separate columns. Here we convert the interactingPDBEntries column from a semi-structured JSON data format into a flat table:

In [None]:
data = pd.json_normalize(df_exploded['interactingPDBEntries'])
df_interactions = df_exploded.join(data).drop(columns='interactingPDBEntries')

In [None]:
df_interactions

startIndex and endIndex are the UniProt residue number, so we'll make a new column called residue_number and copy the startIndex there. We are also going to "count" the number of results - so we'll make a dummy count column to store it in.

In [None]:
df_interactions['residue_number'] = df_interactions['startIndex']
dfdf_interactions3['count'] = df_interactions['pdbId']

Now we are ready to use the data.

In [None]:
df_interactions.head()

Now, you can do similar analysis as you did in 2_ligand_interactions.ipynb. Investigate if there is an overlap in drug-binding and macro-molecular binding site.