<a href="https://colab.research.google.com/github/paulynamagana/3D-Beacons/blob/main/Harnessing_3DBeaconsAPI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Accessing data though 3D-Beacons API

<img src="https://raw.githubusercontent.com/3D-Beacons/3D-Beacons/main/assets/3D-Beacons-logo.png" height="100" align="right">

## Introduction

Welcome to this Google Colab notebook, an essential companion to our paper on accessing data through the 3D-Beacons API.

This notebook serves as a practical resource for researchers, developers, and data enthusiasts who are interested in exploring the potential of the 3D-Beacons API and leveraging its capabilities. By following this guide, you will gain a deep understanding of how to interact with the 3D-Beacons API to access and analyze various types of data.
To supplement your learning, we have provided links to the full paper as well as documentation resources that will assist you in navigating the API effectively.

Paper Link: Accessing Data through 3DBEacons API - Full Paper
** TO LINK IT TO THE PAPER**

Documentation Link: [3D-Beacons API Documentation](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/#/default/get_uniprot_summary_uniprot_summary__qualifier__json_get)

<br>

**Further readings:**

**3D-Beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources**

Mihaly Varadi, Sreenath Nair, Ian Sillitoe, Gerardo Tauriello, Stephen Anyango, Stefan Bienert, Clemente Borges, Mandar Deshpande, Tim Green, Demis Hassabis, Andras Hatos, Tamas Hegedus, Maarten L Hekkelman, Robbie Joosten, John Jumper, Agata Laydon, Dmitry Molodenskiy, Damiano Piovesan, Edoardo Salladini, Steven L. Salzberg, Markus J Sommer, Martin Steinegger, Erzsebet Suhajda, Dmitri Svergun, Luiggi Tenorio-Ku, Silvio Tosatto, Kathryn Tunyasuvunakool, Andrew Mark Waterhouse, Augustin Žídek, Torsten Schwede, Christine Orengo, Sameer Velankar
3 August 2022; BioRxiv https://doi.org/10.1101/2022.08.01.501973


  ## Instructions <a name="Instructions"></a>

* Quick Start <a name="Quick Start"></a>

In order to make the learning experience more accessible and interactive, we have incorporated widgets that allow you to provide inputs and customize certain aspects of the code. These widgets serve as interactive elements that enhance your ability to interact with the code and observe the impact of different inputs on the results. Throughout this tutorial, you will come across code chunks that include these interactive widgets.

  * How to use Google Colab <a name="Google Colab"></a>
1. To run a code cell, click on the cell to select it. You will notice a play button (▶️) on the left side of the cell. Click on the play button or press Shift+Enter to run the code in the selected cell.
2. The code will start executing, and you will see the output, if any, displayed below the code cell.
3. Move to the next code cell and repeat steps 2 and 3 until you have executed all the desired code cells in sequence.
4. The currently running step is indicated by a circle with a stop sign next to it.
If you need to stop or interrupt the execution of a code cell, you can click on the stop button (■) located next to the play button.

*Remember to run the code cells in the correct order, as their execution might depend on variables or functions defined in previous cells. You can modify the code in a code cell and re-run it to see updated results.*

## Contact us

Requests for clarifications or reporting bugs can be made via pdbekb_help@ebi.ac.uk.




## &nbsp; Set up

* The code below will install and import all neccessary packages.
* We have defined functions for easier functionality.



In [None]:
#@title Install dependencies
!pip install ijson gwpy &> /dev/null

* After running the next code, you may be prompted to grant access to your Google Drive. This is necessary for Google Colab to download the files and save them to your Drive.
<br>
<br>
Please follow the on-screen instructions to provide the necessary permissions, as it enables seamless integration between Colab and your Drive. Rest assured that your data and files are secure and will not be accessed without your explicit permission.
<br>

In [None]:
#@title Initialise
## function for searching in the 3D Beacons Network
import ijson
import requests, sys, json
import ipywidgets as wgt
from urllib.request import urlopen
# Importing the necessary libraries: os for interacting with the operating system, and drive from google.colab for mounting Google Drive
import os
from google.colab import drive

# Mounting the Google Drive to access files and directories
drive.mount('/content/drive')
destination_path = '/content/drive/MyDrive/3D_Beacons_files'

# Check whether the specified path exists or not
isExist = os.path.exists(destination_path)
if not isExist:

   # Create a new directory because it does not exist
   os.makedirs(destination_path)
   print("The new directory is created!")

#defining functions for search and download
def Search3DBeacons(ID):
  WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"

  r = ijson.parse(urlopen(f"{WEBSITE_API}{ID}.json"))
  structures = list(ijson.items(r, "structures.item", use_float=True))
  return structures

# Function to download a file from a given URL and save it to the Google Drive
def download_file(url):
  os.chdir(destination_path)
  !wget "$url"

#Defining UniProt ID
Uniprot_ID = wgt.Textarea(
    value= "",
    placeholder='Uniprot ID',
    description='Enter an Uniprot ID:',
    style={'description_width': 'initial'},
    disabled=False
)

#Defining all model type
model_type = wgt.Textarea(
    value="",
    placeholder='Enter model category',
    description='Enter a model type:',
    style={'description_width': 'initial'},
    disabled=False
)

#create a widget for provider
provider_value = wgt.Textarea(
    value="",
    placeholder='Enter  provider',
    description='Enter a provider:',
    style={'description_width': 'initial'},
    disabled=False
)

#create a widget for resolution
resolution_filter = wgt.FloatSlider(
    value=2.0,
    min=0,
    max=5.0,
    step=0.1,
    description='Test:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)

#create a widget to download specific model
model_download = wgt.Textarea(
    value="",
    placeholder='Enter model identifier',
    description='Model identifier:',
    style={'description_width': 'initial'},
    disabled=False
)


#endpoint
WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##1.&nbsp; Basic search

### 1.1.&nbsp;  Get all macromolecular structures for a single entry

  The following block retrieves all available structures in 3D-Beacons from a single Uniprot accession ID.

In [None]:
#widget Uniprot ID
display(Uniprot_ID) #display the widget

Textarea(value='', description='Enter an Uniprot ID:', placeholder='Uniprot ID', style=DescriptionStyle(descri…

In [None]:
#define API endpoint
structures = Search3DBeacons(Uniprot_ID.value)

#print available structures from different providers
for structure in structures:
    print(structure)

{'summary': {'model_identifier': '3d06', 'model_category': 'EXPERIMENTALLY DETERMINED', 'model_url': 'https://www.ebi.ac.uk/pdbe/static/entry/3d06_updated.cif', 'model_format': 'MMCIF', 'model_type': None, 'model_page_url': 'https://www.ebi.ac.uk/pdbe/entry/pdb/3d06', 'provider': 'PDBe', 'number_of_conformers': None, 'ensemble_sample_url': None, 'ensemble_sample_format': None, 'created': '2008-05-01', 'sequence_identity': Decimal('99.0'), 'uniprot_start': 94, 'uniprot_end': 293, 'coverage': Decimal('0.509'), 'experimental_method': 'X-RAY DIFFRACTION', 'resolution': Decimal('1.2'), 'confidence_type': None, 'confidence_version': None, 'confidence_avg_local_score': None, 'oligomeric_state': None, 'preferred_assembly_id': None, 'entities': [{'entity_type': 'POLYMER', 'entity_poly_type': 'POLYPEPTIDE(L)', 'identifier': 'P04637', 'identifier_category': 'UNIPROT', 'description': 'Cellular tumor antigen p53', 'chain_ids': ['A']}, {'entity_type': 'NON-POLYMER', 'entity_poly_type': None, 'identi

### 1.2.&nbsp;  Performing one  filter


#### 1.2.1.&nbsp; Filter by Model type
Results can be filtered according to the model category in the 3D-Beacons Network.
Models are classified as:

* EXPERIMENTALLY DETERMINED
* CONFORMATIONAL ENSEMBLE
* TEMPLATE-BASED
* AB-INITIO

Code below allows you to filter the previous search using a model category.

In [None]:
#Run this code to create a widget for input
display(model_type) #display widget

Textarea(value='', description='Enter a model type:', placeholder='Enter model category', style=DescriptionSty…

In [None]:
#filter for experimental structures according to model category
for structure in structures:
    model_category = structure.get('summary', {}).get('model_category')
    if model_category == model_type.value:
        print(structure)

{'summary': {'model_identifier': '3d06', 'model_category': 'EXPERIMENTALLY DETERMINED', 'model_url': 'https://www.ebi.ac.uk/pdbe/static/entry/3d06_updated.cif', 'model_format': 'MMCIF', 'model_type': None, 'model_page_url': 'https://www.ebi.ac.uk/pdbe/entry/pdb/3d06', 'provider': 'PDBe', 'number_of_conformers': None, 'ensemble_sample_url': None, 'ensemble_sample_format': None, 'created': '2008-05-01', 'sequence_identity': Decimal('99.0'), 'uniprot_start': 94, 'uniprot_end': 293, 'coverage': Decimal('0.509'), 'experimental_method': 'X-RAY DIFFRACTION', 'resolution': Decimal('1.2'), 'confidence_type': None, 'confidence_version': None, 'confidence_avg_local_score': None, 'oligomeric_state': None, 'preferred_assembly_id': None, 'entities': [{'entity_type': 'POLYMER', 'entity_poly_type': 'POLYPEPTIDE(L)', 'identifier': 'P04637', 'identifier_category': 'UNIPROT', 'description': 'Cellular tumor antigen p53', 'chain_ids': ['A']}, {'entity_type': 'NON-POLYMER', 'entity_poly_type': None, 'identi


#### 1.2.2.&nbsp; Filter by Provider

The 3D-Beacons Network facilitates the aggregation of coordinate files and metadata for both experimental and theoretical protein models. It encompasses a wide range of state-of-the-art and specialized model providers, as well as data from the European Protein Data Bank (PDBe).

Model providers:
* PDBe
* SWISS-MODEL
* AlphaFold DB
* Genome3D
* SASBDB
* AlphaFill
* ModelArchive
* Protein Ensemble Database

In [None]:
#widget
display(provider_value)

Textarea(value='', description='Enter a provider:', placeholder='Enter  provider', style=DescriptionStyle(desc…

In [None]:
#filter for experimental structures for PDBe
for structure in structures:
    provider = structure.get('summary', {}).get('provider')

    if provider == provider_value.value:
        print(structure)

{'summary': {'model_identifier': '3d06', 'model_category': 'EXPERIMENTALLY DETERMINED', 'model_url': 'https://www.ebi.ac.uk/pdbe/static/entry/3d06_updated.cif', 'model_format': 'MMCIF', 'model_type': None, 'model_page_url': 'https://www.ebi.ac.uk/pdbe/entry/pdb/3d06', 'provider': 'PDBe', 'number_of_conformers': None, 'ensemble_sample_url': None, 'ensemble_sample_format': None, 'created': '2008-05-01', 'sequence_identity': Decimal('99.0'), 'uniprot_start': 94, 'uniprot_end': 293, 'coverage': Decimal('0.509'), 'experimental_method': 'X-RAY DIFFRACTION', 'resolution': Decimal('1.2'), 'confidence_type': None, 'confidence_version': None, 'confidence_avg_local_score': None, 'oligomeric_state': None, 'preferred_assembly_id': None, 'entities': [{'entity_type': 'POLYMER', 'entity_poly_type': 'POLYPEPTIDE(L)', 'identifier': 'P04637', 'identifier_category': 'UNIPROT', 'description': 'Cellular tumor antigen p53', 'chain_ids': ['A']}, {'entity_type': 'NON-POLYMER', 'entity_poly_type': None, 'identi

### 1.3.&nbsp;  Performing two filters

#### 1.3.1&nbsp;  Filter by Model type and Coverage

The code below will filter all the structures from a Uniprot ID and will save the model with the highest coverage to Google Drive.

In [None]:
display(Uniprot_ID) #display widget for UniProt ID
display(model_type)

Textarea(value='P04637', description='Enter an Uniprot ID:', placeholder='Uniprot ID', style=DescriptionStyle(…

Textarea(value='EXPERIMENTALLY DETERMINED', description='Enter a model type:', placeholder='Enter model catego…

In [None]:
response = urlopen(f"{WEBSITE_API}{Uniprot_ID.value}.json")
r = ijson.parse(response)
structures = list(ijson.items(r, "structures.item", use_float=True))
structures.sort(key=lambda x: x.get('summary', {}).get('coverage', 0), reverse=True)

highest_coverage_structure = None
for structure in structures:
    model_category = structure.get('summary', {}).get('model_category')
    if model_category == model_type.value:
        highest_coverage_structure = structure
        break

if highest_coverage_structure is not None:
    print(highest_coverage_structure)

model_download = highest_coverage_structure.get('summary', {}).get('model_identifier')
for structure in structures:
    model = structure.get('summary', {}).get('model_identifier')
    if model == model_download:
        model_url = structure.get("summary", {}).get("model_url")
        download_file(model_url)


{'summary': {'model_identifier': '7xzz', 'model_category': 'EXPERIMENTALLY DETERMINED', 'model_url': 'https://www.ebi.ac.uk/pdbe/static/entry/7xzz_updated.cif', 'model_format': 'MMCIF', 'model_type': None, 'model_page_url': 'https://www.ebi.ac.uk/pdbe/entry/pdb/7xzz', 'provider': 'PDBe', 'number_of_conformers': None, 'ensemble_sample_url': None, 'ensemble_sample_format': None, 'created': '2022-06-03', 'sequence_identity': Decimal('100.0'), 'uniprot_start': 1, 'uniprot_end': 393, 'coverage': Decimal('1.0'), 'experimental_method': 'ELECTRON MICROSCOPY', 'resolution': Decimal('4.07'), 'confidence_type': None, 'confidence_version': None, 'confidence_avg_local_score': None, 'oligomeric_state': None, 'preferred_assembly_id': None, 'entities': [{'entity_type': 'POLYMER', 'entity_poly_type': 'POLYPEPTIDE(L)', 'identifier': 'P04637', 'identifier_category': 'UNIPROT', 'description': 'Cellular tumor antigen p53', 'chain_ids': ['N', 'L', 'K', 'M']}, {'entity_type': 'POLYMER', 'entity_poly_type': '

#### 1.3.2&nbsp;  Filter by model type and retrieve results according to resolution.


The code below will filter all the structures from a Uniprot ID and will save the models with a resolution higher than the specified on the widget.

In [None]:
#display widgets
display(Uniprot_ID)
display(provider_value)
display(resolution_filter)

Textarea(value='', description='Enter an Uniprot ID:', placeholder='Uniprot ID', style=DescriptionStyle(descri…

Textarea(value='', description='Enter a provider:', placeholder='Enter  provider', style=DescriptionStyle(desc…

FloatSlider(value=2.0, continuous_update=False, description='Test:', max=5.0, readout_format='.1f')

In [None]:
r = ijson.parse(urlopen(f"{WEBSITE_API}{Uniprot_ID.value}.json")) #build
structures = list(ijson.items(r, "structures.item", use_float=True))

#filter for experimental structures
high_resolution_structures = []
for structure in structures:
    provider = structure.get('summary', {}).get('provider')
    resolution = structure.get('summary', {}).get('resolution')

    if provider == provider_value.value and resolution is not None and resolution < resolution_filter.value:
        # Append the structure to the list without assigning the result back to the list
        high_resolution_structures.append(structure)

# Perform filter with the high-resolution structures
for structure in high_resolution_structures:
    print(structure)
    model_url = structure.get("summary", {}).get("model_url")
    # Use wget or other methods to download the files
    download_file(model_url)
    print("Downloading:", model_url)

{'summary': {'model_identifier': '3d06', 'model_category': 'EXPERIMENTALLY DETERMINED', 'model_url': 'https://www.ebi.ac.uk/pdbe/static/entry/3d06_updated.cif', 'model_format': 'MMCIF', 'model_type': None, 'model_page_url': 'https://www.ebi.ac.uk/pdbe/entry/pdb/3d06', 'provider': 'PDBe', 'number_of_conformers': None, 'ensemble_sample_url': None, 'ensemble_sample_format': None, 'created': '2008-05-01', 'sequence_identity': Decimal('99.0'), 'uniprot_start': 94, 'uniprot_end': 293, 'coverage': Decimal('0.509'), 'experimental_method': 'X-RAY DIFFRACTION', 'resolution': Decimal('1.2'), 'confidence_type': None, 'confidence_version': None, 'confidence_avg_local_score': None, 'oligomeric_state': None, 'preferred_assembly_id': None, 'entities': [{'entity_type': 'POLYMER', 'entity_poly_type': 'POLYPEPTIDE(L)', 'identifier': 'P04637', 'identifier_category': 'UNIPROT', 'description': 'Cellular tumor antigen p53', 'chain_ids': ['A']}, {'entity_type': 'NON-POLYMER', 'entity_poly_type': None, 'identi

##2.&nbsp; Sequence-Based Search

##4.&nbsp; Download specific model

Once you found the model/ models you need, you can customise this code to download the MMCIF file to Google Drive

In [None]:
#display widget
display(model_download)

Textarea(value='', description='Model identifier:', placeholder='Enter model identifier', style=DescriptionSty…

In [None]:
### Run this code to filter for specific model
for structure in structures:
  model = structure.get('summary', {}).get('model_identifier')
  if model == model_download.value:
    model_url = structure.get("summary", {}).get("model_url")

# Calling the download_file function to download a file from the specified model_url
download_file(model_url)

--2023-06-13 16:39:34--  https://www.ebi.ac.uk/pdbe/static/entry/6vqo_updated.cif
Resolving www.ebi.ac.uk (www.ebi.ac.uk)... 193.62.193.80
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.193.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2459022 (2.3M) [text/plain]
Saving to: ‘6vqo_updated.cif’


2023-06-13 16:39:41 (434 KB/s) - ‘6vqo_updated.cif’ saved [2459022/2459022]



# Bugs

If you encounter any bugs, please report the issue to pdbekb_help@ebi.ac.uk.
