<h1> Exploring macromolecular assemblies and interaction interfaces with PDE PISA-API </h1>

<img src="https://www.ebi.ac.uk/pdbe/docs_dev/logos/images/RGB/PDBe-logo-RGB_2013.png" height="150" align="left" padding="20px 20px">

The PDBe team introduced three new API endpoints, providing JSON-formatted responses. These endpoints retrieve data from PISA, focusing on protein interfaces and assemblies. You can find detailed JSON schemas for these endpoints on [Apiary](https://pisalite.docs.apiary.io/#reference/0/pisaqualifierjson/interaction-interface-data-per-pdb-assembly-entry), including comprehensive parameter descriptions for both assemblies and interfaces.

This notebook serves as a practical resource fon how to retrieve data for assemblies and interaction interfaces using PDBe PISA API.

<br>

**Further readings:**

- Krissinel, E. (2010). Crystal contacts as nature’s Docking Solutions. Journal of Computational Chemistry, 31(1), 133–143. [https://doi.org/10.1002/jcc.21303](https://doi.org/10.1002/jcc.21303)

- Krissinel, E., &amp; Henrick, K. (2007). Inference of Macromolecular Assemblies from Crystalline State. Journal of Molecular Biology, 372(3), 774–797. [https://doi.org/10.1016/j.jmb.2007.05.022](https://doi.org/10.1016/j.jmb.2007.05.022)


  ## Instructions <a name="INSTRUCTIONS"></a>

* Using Google Colab <a name="Google Colab"></a>

1. To execute a code cell, simply click on the cell to select it. You'll notice a play button (▶️) on the left side of the cell. Click the play button or press Shift+Enter to run the code within the selected cell.
2. The code will begin execution, and any resulting output will be displayed beneath the code cell.
3. Proceed to the subsequent code cell, and repeat steps 2 and 3 until you've executed all the desired code cells in sequence.
4. The current running step is indicated by a circle with a stop sign beside it. If you wish to halt or interrupt a code cell's execution, you can click the stop button (■) positioned next to the play button.
5. Remember to execute the code cells in the proper order, as their execution might rely on variables or functions established in preceding cells. You can modify the code in a cell, rerun it, and observe updated outcomes.

*Note: Should the notebook runtime be restarted, you'll need to rerun the initial 3 code segments located in the Setup section.*

## Contact us

For inquiries or bug reports, feel free to contact us at pdbekb_help@ebi.ac.uk.

Protocol 1: Programmatic Access to PDBe PISA API
This protocol details the procedure for programmatically retrieving and analyzing PISA data via its three primary API endpoints. The protocol is divided into three sections:

Assembly Endpoint: Accessing global structural and thermodynamic properties of a complete macromolecular assembly.

Single Interface Endpoint: Inspecting the granular, specific details of an individual pairwise interface.

All Interfaces Endpoint: Retrieving a comprehensive dataset of all interfaces within an assembly for comparative and aggregate analysis.

We will utilize PDB entry 1iru (Yeast 20S Proteasome), a large 28-subunit complex, as the working example.

## &nbsp; Set Up
Import Libraries

**This cell imports all necessary Python libraries for the protocol:**

In [None]:
#@markdown Import Libraries
import requests, sys, json
import pandas as pd
from collections import Counter

pd.set_option('display.max_rows', None)

print("Libraries imported successfully.")

##1.&nbsp; RETRIEVE DATA FROM AN ASSEMBLY

The specified API endpoint delivers JSON-formatted responses for a requested assembly based on the provided PDB entry (pdbid) and assembly ID (assemblyid). The JSON file includes parameters detailing the structural and thermodynamic properties of assemblies, such as the number of interfaces, macromolecular size, dissociation energy, and accessible surface area. You can access comprehensive parameter descriptions at [Apiary](https://pisalite.docs.apiary.io/#reference/0/pisaqualifierjson/interaction-interface-data-per-pdb-assembly-entry).

The API endpoint pattern is: *https://www.ebi.ac.uk/pdbe/api/pisa/assembly/:pdbid/:assemblyid*


### 1.1.&nbsp; Retrieve all data


For example, if you would like to send a request to get assembly details for entry **pdbid = 1iru** and the preferred assembly is **assemblyid = 1**, the following Python script could help:

In [None]:
#Define the API endpoint for accessing PISA assembly data
API_point = 'https://www.ebi.ac.uk/pdbe/api/pisa/assembly/'

# Define the PDB ID and assembly ID for the query
query_pdbid = '1iru'
query_assembly_id = '1'

# Make a GET request
response = requests.get(f'{API_point}{query_pdbid}/{query_assembly_id}')

# Parse and print the response returned by the API
if response.status_code == 200:
  print(json.dumps(response.json(), indent=3))


### 1.2.&nbsp;  Parse and Display Key Assembly Metrics

You can extract specific information from this JSON response. For example, if you want to know the number of interfaces (`interface_count`) or the assembly dissociation energy  (`dissociation_energy`), the following script could help:


In [None]:
# Parse the previous JSON response and store it in a variable
response_data = response.json()

# Get the PDB ID from the keys of the parsed data
pdb_id = list(response_data.keys())[0]

# Access the specific assembly data for the given PDB ID
assembly_data = response_data[pdb_id]['assembly']

# Extract information from the assembly data
total_size = assembly_data.get('size', 'N/A')
macromolecular_size = assembly_data.get('macromolecular_size', 'N/A')
buried_surface_area = assembly_data.get('buried_surface_area', 'N/A')
accessible_surface_area = assembly_data.get('accessible_surface_area', 'N/A')
no_interfaces = assembly_data.get('interface_count', 'N/A')
dissociation_energy = assembly_data.get('dissociation_energy', 'N/A')
entropy = assembly_data.get('entropy', 'N/A')

# Print the formatted assembly information
print(f"--- Assembly Summary for {pdb_id} (Assembly {query_assembly_id}) ---")
print(f"  Components: {total_size} total ({macromolecular_size}-mer protein chains)")
print(f"  Interfaces: {no_interfaces} total")
print(f"  Accessible Surface Area: {accessible_surface_area} Å²")
print(f"  Buried Surface Area: {buried_surface_area} Å²")
print(f"  Dissociation Energy (ΔGdiss): {dissociation_energy} kcal/mol")
print(f"  Entropy: {entropy} kcal/mol")

if response_data :
  print("Cannot parse data, the request was not successful.")


A positive dissociation energy suggests that the assembly is thermodynamically stable and favoured. Higher positive values imply greater stability. Surface area indicates the total solvent-accessible surface area of the complex, in Å2. Buried area indicates, in Å2, the total solvent-accessible surface area of the complex, buried upon formation of all complex's interfaces. Indicates the free energy of complex dissociation, in kcal/mol. The free energy of dissociation corresponds to the free energy difference between dissociated and associated states. Positive values of ΔGdiss indicate that an external driving force should be applied in order to dissociate the complex, therefore complexes with ΔGdiss>0 are thermodynamically stable. Indicates the rigid-body entropy change at dissociation, in kcal/mol. The entropy change corresponds to the lowest free energy way to dissociate the complex into a set of stable complexes or monomeric units.


##2.&nbsp; SINGLE INTERFACE IN AN ASSEMBLY

The second API endpoint provides a JSON response containing details of a single interface selected within assembly, based on a given PDB entry, an assembly ID and an interface ID. Descriptions of the interface parameters can be found at [Apiary](https://pisalite.docs.apiary.io/#reference/0/pisaqualifierjson/interaction-interface-data-per-pdb-assembly-entry).

For a single interface, the data contains a list of contacts or pair interactions and detailed properties such as atom site IDs, UniProt residue indices, residue IDs, chains, sequence IDs, distances, and more.

The API enpoint pattern is : *https://www.ebi.ac.uk/pdbe/api/pisa/interface/:pdbid/:assemblyid/:interfaceid*



### 2.1.&nbsp;  Retrieve single interface data in an assembly

For instance, from previous example we know that the assembly entry **pdbid = 3gz1** with **assemblyid = 1** has 3 interfaces. We can access each interface details with the interface ID, e.g. **interfaceid = 2** :

In [None]:
#Define API point
API_point_single_interface = "https://www.ebi.ac.uk/pdbe/api/pisa/interface/"

# Define the PDB ID, assembly ID and interface ID
query_pdbid = '1iru'
query_assemblyid = '1'
query_interfaceid = '25'

# Make a GET request
response_single_interface = requests.get(f'{API_point_single_interface}{query_pdbid}/{query_assemblyid}/{query_interfaceid}')

#Parse and print the JSON response
# if response_single_interface.status_code == 200:
#   print(json.dumps(response_single_interface.json(), indent=3))

if response_single_interface.status_code == 200:
    interface_data = response_single_interface.json()
    print(f"Interface {query_interfaceid} Summary:")
    print(f"  Interface Area: {interface_data.get('interface_area')} Å²")
    print(f"  Solvation Energy: {interface_data.get('solvation_energy')} kcal/mol")
    print(f"  Stabilization Energy: {interface_data.get('stabilization_energy')} kcal/mol")
    print(f"  Hydrogen Bonds: {interface_data.get('number_hydrogen_bonds')}")
    print(f"  Salt Bridges: {interface_data.get('number_salt_bridges')}")

    # Identify interacting chains
    molecules = interface_data.get('molecules', [])
    chains = [molecule.get('chain_id') for molecule in molecules]
    print(f"  Interacting chains: {chains}")




### 2.2.&nbsp;  Number of interface residues and number of contacts, for a type of contact

With all the interface details available in this JSON response, we can e.g. extract information about the number of interface residues or number of contacts, for a selected type of contact (hydrogen bond, salt bridge, covalent bond, etc.)

In [None]:
# Extract the JSON data from the response for the single interface
interface_data = response_single_interface.json()

# Extract relevant information from the interface data
interface_id = interface_data.get('interface_id')
no_interface_residues = interface_data.get('number_interface_residues')
no_hydrogen_bonds = interface_data.get('number_hydrogen_bonds')
no_salt_bridges = interface_data.get('number_salt_bridges')

# Print a formatted message with the extracted information using f-string
print(f"The interface {interface_id} of preferred assembly {pdb_id} has {no_interface_residues} interface residues with {no_hydrogen_bonds} hydrogen bonds and {no_salt_bridges} salt bridges.")


In [None]:
def analyze_interface(interface_data):
    """
    Extracts key quantitative metrics from interface data and returns them
    in a structured dictionary.

    Args:
        interface_data (dict): The raw data dictionary for a single interface.

    Returns:
        dict: A dictionary containing all key metrics.
    """
    analysis = {}

    # --- Quantitative Metrics ---
    # Use .get() with a default value (0.0 or 0) in case the key is missing
    analysis['interface_area'] = interface_data.get('interface_area', 0.0)
    analysis['stabilization_energy'] = interface_data.get('stabilization_energy', 0.0)
    analysis['solvation_energy'] = interface_data.get('solvation_energy', 0.0)
    analysis['number_hydrogen_bonds'] = interface_data.get('number_hydrogen_bonds', 0)
    analysis['number_salt_bridges'] = interface_data.get('number_salt_bridges', 0)
    analysis['number_disulfide_bonds'] = interface_data.get('number_disulfide_bonds', 0)
    analysis['p_value'] = interface_data.get('p_value', None)

    # --- Molecular Information ---
    molecules = interface_data.get('molecules', [])  # Default to empty list
    # Use a list comprehension to get all chain_ids that are not None
    chains = [
        molecule.get('chain_id')
        for molecule in molecules
        if molecule.get('chain_id') is not None
    ]
    analysis['interacting_chains'] = chains

    # The p-value comment from your original code is good context.
    # If it's in the data, you could add it here:
    # analysis['p_value'] = interface_data.get('p_value', None)

    return analysis



analyze_interface(interface_data)




### 2.3.&nbsp;  Read label sequence and atom Indexes

It is possible e.g. to read the label sequence and atom indexes for each hydrogen bond in the interface.

In [None]:
import tabulate
import pandas as pd
from pandas import DataFrame
from tabulate import tabulate

# Create an empty list to store interface properties
interface_pair_prop = []

# Loop through the interface_data dictionary and extract properties
for key, prop in interface_data.items():
    if key in ["hydrogen_bonds"]:
        label_seq_atom_1 = prop["atom_site_1_label_seq_ids"]
        label_seq_atom_2 = prop["atom_site_2_label_seq_ids"]
        label_id_atom_1 = prop["atom_site_1_label_atom_ids"]
        label_id_atom_2 = prop["atom_site_2_label_atom_ids"]

# Loop through the extracted properties
for (item1, item2, item3, item4) in zip(
    label_seq_atom_1, label_id_atom_1,
    label_seq_atom_2, label_id_atom_2
):
    # Append the zipped properties to the interface_pair_prop list
    interface_pair_prop.append([item1, item2, item3, item4])

# Create a DataFrame from the interface_pair_prop list with specified column names
df = pd.DataFrame(
    interface_pair_prop, columns=[
        "label_sequence_index_a", "label_atom_index_a",
        "label_sequence_index_b", "label_atom_index_b"])

# Print the DataFrame using tabulate
df


##3.&nbsp; EVERY INTERFACE IN AN ASSEMBLY

This API endpoint provides a JSON response containing details about all the interfaces within an assembly, based on the given PDB entry and assembly ID. You can find comprehensive descriptions of the interface parameters in the returned data at [Apiary](https://pisalite.docs.apiary.io/#reference/0/pisaqualifierjson/interaction-interface-data-per-pdb-assembly-entry).

The data encompasses both structural and thermodynamic information for the assembly (e.g., number of interfaces, size, dissociation energy, etc.) and interface data. The latter includes specifics for each enumerated interface in the assembly (e.g., interface area, stabilization energy, number of interface residues, etc.) as well as interface interactions (e.g., hydrogen bonds, salt bridges, covalent bonds, etc.).

For each interface, the data features a list of contacts or pair interactions, detailing properties such as atom site IDs, UniProt residue indices, residue IDs, chains, sequence IDs, distances, and more.

The API endpoint pattern is: *https://www.ebi.ac.uk/pdbe/api/pisa/interfaces/:pdbid/:assemblyid*


### 3.1.&nbsp;  Retrieve interfaces in Assemblies

For example, for entry **pdbid = 1iru** and preferred assembly **assemblyid = 1**, there are 3 interfaces identified:

In [None]:
# Define varfiables
API_point_all_interfaces ="https://www.ebi.ac.uk/pdbe/api/pisa/interfaces/"

query_pdbid = '1iru'
query_assemblyid = '1'

# Make a GET request to fetch data about all interfaces for a specific assembly
response_all_interfaces = requests.get(f'{API_point_all_interfaces}{query_pdbid}/{query_assemblyid}')
data = response_all_interfaces.json()

# Extract the number of interfaces from the response data
num_interfaces = data[query_pdbid]['assembly']['interface_count']

# Print the number of interfaces using an f-string
print(f"There are {num_interfaces} interfaces")

The complete JSON can be printed below:

In [None]:
# Print the JSON data
print(json.dumps(data, indent=5))

In [None]:
def compare_interfaces(assembly_data):
    """Compare all interfaces in an assembly"""
    interfaces = assembly_data.get('interfaces', [])
    comparison = []

    for interface in interfaces:
        analysis = analyze_interface(interface)
        analysis['interface_id'] = interface['interface_id']
        comparison.append(analysis)

    return pd.DataFrame(comparison)

# Use the correct part of the JSON
df_comparison = compare_interfaces(data[query_pdbid]['assembly'])
df_comparison

### 3.2.&nbsp;  Number of hydrogen bonds for each interface

With comprehensive interface details available in the JSON, it is possible to extract information such as the number of hydrogen bonds for each interface.:

In [None]:
# Extract the JSON data from the response and assign it to 'assembly_data_all'
assembly_data_all = response_all_interfaces.json()

#Extractign data from JSON
pdbid = list(assembly_data_all.keys())[0] # Get the PDB ID (keys are PDB IDs) from the JSON data
assembly_data_all = assembly_data_all[pdbid] # Extract the assembly data for the specified PDB ID
assembly_data_all = assembly_data_all['assembly'] # Extract the 'assembly' section from the assembly data
interfaces = assembly_data_all['interfaces'] # Extract the 'interfaces' section from the assembly data

# Loop through each interface in the 'interfaces' list
for interface in interfaces:
    # Get the 'interface_id' and 'number_hydrogen_bonds' from the interface data
    interface_id = interface.get('interface_id')
    no_hydrogen_bonds = interface.get('number_hydrogen_bonds')

    # Print the information about the interface and its number of hydrogen bonds
    print(f"The number of hydrogen bonds in interface {interface_id} is {no_hydrogen_bonds}")


It is also possible to read the UniProt sequence indexes for each hydrogen bond in the interface.

In [None]:
import tabulate
import pandas as pd
from pandas import DataFrame
from tabulate import tabulate

# Initialize an empty list to store interface pairs information
interface_pairs = []

# Loop through each interface in the 'interfaces' list
for interface in interfaces:
    # Get the 'interface_id' for the current interface
    interface_id = interface.get('interface_id')

    # Then loop through properties of the current interface
    for key, prop in interface.items():
        # Check if the property key matches the desired property, e.g., "hydrogen_bonds"
        if key in ["hydrogen_bonds"]:
            # Extract residue numbers for both atoms in the hydrogen bond
            unp_nums_atom_1 = prop["atom_site_1_unp_nums"]
            unp_nums_atom_2 = prop["atom_site_2_unp_nums"]

    # Loop through the extracted residue numbers for both atoms
    for (item1, item2) in zip(unp_nums_atom_1, unp_nums_atom_2):
        # Append the extracted information to the interface_pairs list
        interface_pairs.append([interface_id, item1, item2])

# Create a DataFrame from the interface_pairs list with specified column names
df = pd.DataFrame(
    interface_pairs, columns=[
        "interface_id", "uniprot_residue_index_a", "uniprot_residue_index_b"])

# Print the DataFrame using tabulate for formatting
df

In [None]:
from collections import Counter

# 1. Initialize lists to store all interacting residues
all_hbond_residues = []
all_salt_bridge_residues = []

# 2. Loop through each interface
for interface in interfaces:
    interface_id = interface.get('interface_id')

    # 3. Get residues from HYDROGEN BONDS
    if "hydrogen_bonds" in interface:
        prop = interface["hydrogen_bonds"]
        # Get chain and UNP number for a more unique ID
        chains_1 = prop["atom_site_1_chains"]
        unp_nums_1 = prop["atom_site_1_unp_nums"]
        chains_2 = prop["atom_site_2_chains"]
        unp_nums_2 = prop["atom_site_2_unp_nums"]

        # Add as tuples (Chain, ResNum)
        all_hbond_residues.extend([(f"Chain {c}", n) for c, n in zip(chains_1, unp_nums_1)])
        all_hbond_residues.extend([(f"Chain {c}", n) for c, n in zip(chains_2, unp_nums_2)])

    # 4. Get residues from SALT BRIDGES
    if "salt_bridges" in interface:
        prop = interface["salt_bridges"]
        chains_1 = prop["atom_site_1_chains"]
        unp_nums_1 = prop["atom_site_1_unp_nums"]
        chains_2 = prop["atom_site_2_chains"]
        unp_nums_2 = prop["atom_site_2_unp_nums"]

        all_salt_bridge_residues.extend([(f"Chain {c}", n) for c, n in zip(chains_1, unp_nums_1)])
        all_salt_bridge_residues.extend([(f"Chain {c}", n) for c, n in zip(chains_2, unp_nums_2)])

# 5. Count the frequency of each residue
hbond_counts = Counter(all_hbond_residues)
salt_bridge_counts = Counter(all_salt_bridge_residues)

# 6. Convert to DataFrames for nice printing
df_hbond_hotspots = pd.DataFrame(hbond_counts.most_common(), columns=["Residue (Chain, UNP Num)", "H-Bond Count"])
df_salt_bridge_hotspots = pd.DataFrame(salt_bridge_counts.most_common(), columns=["Residue (Chain, UNP Num)", "Salt Bridge Count"])

In [None]:
print("--- Hydrogen Bond Hotspots ---")
df_hbond_hotspots.head(10)


In [None]:
print("\n--- Salt Bridge Hotspots ---")
df_salt_bridge_hotspots.head(10)

# Bugs

If you encounter any bugs, please report the issue to pdbekb_help@ebi.ac.uk.
