## BioGRID REST Sevices:  ORCS

- for context BioGRID has two primary APIs that differ slightly in focus
- examples in this notebook use the ORCS API
- Use the API base url to request an API key and capture it in an .env file

### Protein, Genetic and Chemical Interactions
- Site URL:  https://thebiogrid.org/
- API docs:  https://wiki.thebiogrid.org/doku.php/biogridrest
- API base url:  https://webservice.thebiogrid.org
- API key:  https://webservice.thebiogrid.org

### Open Repository of CRISPR Screens (ORCS)
- Site URL:  https://orcs.thebiogrid.org/
- API docs:  https://wiki.thebiogrid.org/doku.php/orcs:webservice
- API base url:  https://orcsws.thebiogrid.org
- API key:  https://orcsws.thebiogrid.org

CITATION:
- original examples were based on
  - [BIOGRID-REST-EXAMPLES](https://github.com/BioGRID/BIOGRID-REST-EXAMPLES) GitHub repository
  - [ORCS-REST-EXAMPLES](https://github.com/BioGRID/ORCS-REST-EXAMPLES) GitHub repository

In [1]:
# Import necessary libraries
import os
from dotenv import load_dotenv
import requests
from pprint import pprint

# Load environment variables from .env file
load_dotenv()

# Fetch the API keys and base URLs from the .env file
BG_INT_ACCESS_KEY = os.getenv("BG_INT_ACCESS_KEY")
BG_INT_BASE_URL = os.getenv("BG_INT_BASE_URL")
BG_ORCS_ACCESS_KEY = os.getenv("BG_ORCS_ACCESS_KEY")
BG_ORCS_BASE_URL = os.getenv("BG_ORCS_BASE_URL")

# Validate the environment variables
if not BG_INT_ACCESS_KEY or not BG_INT_BASE_URL:
    raise ValueError("BG_INT_ACCESS_KEY or BG_INT_BASE_URL is missing from the .env file.")

if not BG_ORCS_ACCESS_KEY or not BG_ORCS_BASE_URL:
    raise ValueError("BG_ORCS_ACCESS_KEY or BG_ORCS_BASE_URL is missing from the .env file.")

In [2]:
from pathlib import Path

# Define configuration constants
INTERIM_DIR = Path("../data/interim")  # Standardize path using pathlib

## BioGRID interactions

In [3]:
import os
import requests

# API endpoint and parameters
request_url = BG_INT_BASE_URL + "/evidence/"
params = {
    "format": "json",
    "accesskey": BG_INT_ACCESS_KEY  # Replace with your access key
}

# Send the API request
response = requests.get(request_url, params=params)

In [4]:
from pprint import pprint

## pprint evidence parameters 
pprint(response.json())

{'affinity capture-luminescence': '',
 'affinity capture-ms': '',
 'affinity capture-rna': '',
 'affinity capture-western': '',
 'biochemical activity': '',
 'co-crystal structure': '',
 'co-fractionation': '',
 'co-localization': '',
 'co-purification': '',
 'cross-linking-ms (xl-ms)': '',
 'dosage growth defect': '',
 'dosage lethality': '',
 'dosage rescue': '',
 'far western': '',
 'fret': '',
 'negative genetic': '',
 'pca': '',
 'phenotypic enhancement': '',
 'phenotypic suppression': '',
 'positive genetic': '',
 'protein-peptide': '',
 'protein-rna': '',
 'proximity label-ms': '',
 'reconstituted complex': '',
 'synthetic growth defect': '',
 'synthetic haploinsufficiency': '',
 'synthetic lethality': '',
 'synthetic rescue': '',
 'two-hybrid': ''}


In [5]:
import os
import requests

# PMID(s) to search for (pipe-separated for multiple genes)
pubmed_article = "35559673"

# Evidence types to include in the response (pipe-separated for multiple types)
evidence_list = "synthetic lethality|negative genetic"

# API endpoint and parameters
request_url = BG_INT_BASE_URL + "/interactions/"
params = {
    "taxId": "9606",  # Human tax ID
    "pubmedList": pubmed_article, # pubmed article ID(s)
    # "geneList": gene_name,
    # "searchNames": "true",
    "includeInteractors": "true",
    "evidenceList": evidence_list,
    "includeEvidence": "true",
    "format": "json",
    "accesskey": BG_INT_ACCESS_KEY  # Replace with your access key
}

# Send the API request
response = requests.get(request_url, params=params)

In [6]:
import json

# print length of response
print(f"Number of interactions found: {len(response.json())}")

# Construct the output file path
file_name = f"pmid_{pubmed_article}_interactions.json"
file_path = INTERIM_DIR / file_name

# Write data to the file
with open(file_path, "w") as f:
    json.dump(response.json(), f, indent=4)

Number of interactions found: 475


In [7]:
import os
import requests

# Gene ID(s) to search for (pipe-separated for multiple genes)
gene_id = "7157"

# Evidence types to include in the response (pipe-separated for multiple types)
evidence_list = "synthetic lethality|negative genetic"

# API endpoint and parameters
request_url = BG_INT_BASE_URL + "/interactions/"
params = {
    "taxId": "9606",  # Human taxonomy ID
    "geneList": gene_id,
    "includeInteractors": "true",
    "evidenceList": evidence_list,
    "includeEvidence": "true",
    "format": "json",
    "accesskey": BG_INT_ACCESS_KEY  # Replace with your access key
}

# Send the API request
response = requests.get(request_url, params=params)

In [8]:
import json

# print length of response
print(f"Number of interactions found: {len(response.json())}")

# Construct the output file path
file_name = f"{gene_id}_interactions.json"
file_path = INTERIM_DIR / file_name

# Write data to the file
with open(file_path, "w") as f:
    json.dump(response.json(), f, indent=4)

Number of interactions found: 214


In [9]:
import os
import requests

# Gene name(s) to search for (pipe-separated for multiple genes)
gene_name = "TP53"

# Evidence types to include in the response (pipe-separated for multiple types)
evidence_list = "synthetic lethality|negative genetic"

# API endpoint and parameters
request_url = BG_INT_BASE_URL + "/interactions/"
params = {
    "taxId": "9606",  # Human taxonomy ID
    "geneList": gene_name,
    "searchNames": "true", # Search by gene name instead of gene ID    
    "includeInteractors": "true",
    "evidenceList": evidence_list,
    "includeEvidence": "true",
    "format": "json",
    "accesskey": BG_INT_ACCESS_KEY  # Replace with your access key
}

# Send the API request
response = requests.get(request_url, params=params)

In [10]:
# print length of response
print(f"Number of interactions found: {len(response.json())}")

# Construct the output file path
file_name = f"{gene_name}_interactions.json"
file_path = INTERIM_DIR / file_name

# Write data to the file
with open(file_path, "w") as f:
    json.dump(response.json(), f, indent=4)

Number of interactions found: 207


In [11]:
import os
import requests

# Gene name(s) to search for (pipe-separated for multiple genes)
gene_name = "RB1"

# Evidence types to include in the response (pipe-separated for multiple types)
evidence_list = "synthetic lethality|negative genetic"

# API endpoint and parameters
request_url = BG_INT_BASE_URL + "/interactions/"
params = {
    "taxId": "9606",  # Human tax ID
    "geneList": gene_name,  # gene name(s)
    "searchNames": "true", # Search by gene name instead of gene ID
    "includeInteractors": "true",
    "evidenceList": evidence_list,
    "includeEvidence": "true",
    "format": "json",
    "accesskey": BG_INT_ACCESS_KEY  # Replace with your access key
}

# Send the API request
response = requests.get(request_url, params=params)

In [12]:
# print length of response
print(f"Number of interactions found: {len(response.json())}")

# Construct the output file path
file_name = f"{gene_name}_interactions.json"
file_path = INTERIM_DIR / file_name

# Write data to the file
with open(file_path, "w") as f:
    json.dump(response.json(), f, indent=4)

Number of interactions found: 38


In [13]:
import pandas as pd
import numpy as np

def convert_rb1_interactions_to_df(json_data):
    """
    Convert RB1 interactions JSON data to an optimized pandas DataFrame.
    
    Args:
        json_data (dict): The JSON data containing RB1 interactions
        
    Returns:
        pd.DataFrame: Optimized DataFrame with appropriate data types
    """
    # Convert JSON to DataFrame
    df = pd.DataFrame.from_dict(json_data, orient='index')
    
    # Drop ONTOLOGY_TERMS column if it exists
    if 'ONTOLOGY_TERMS' in df.columns:
        df = df.drop('ONTOLOGY_TERMS', axis=1)
    
    # Convert numeric columns to appropriate types
    int_columns = [
        'BIOGRID_INTERACTION_ID', 
        'ENTREZ_GENE_A', 
        'ENTREZ_GENE_B',
        'BIOGRID_ID_A', 
        'BIOGRID_ID_B',
        'ORGANISM_A', 
        'ORGANISM_B',
        'PUBMED_ID'
    ]
    
    for col in int_columns:
        df[col] = pd.to_numeric(df[col], errors='coerce').astype('Int64')
    
    # Convert categorical columns to save memory
    categorical_columns = [
        'EXPERIMENTAL_SYSTEM',
        'EXPERIMENTAL_SYSTEM_TYPE',
        'THROUGHPUT',
        'SOURCEDB',
        'OFFICIAL_SYMBOL_A',
        'OFFICIAL_SYMBOL_B'
    ]
    
    for col in categorical_columns:
        df[col] = df[col].astype('category')
    
    # Replace '-' with None for cleaner data
    replace_dash_columns = [
        'SYSTEMATIC_NAME_A',
        'SYSTEMATIC_NAME_B',
        'MODIFICATION',
        'QUANTITATION',
        'TAGS'
    ]
    
    for col in replace_dash_columns:
        df[col] = df[col].replace('-', None)
    
    # Split pipe-delimited columns into lists
    list_columns = ['SYNONYMS_A', 'SYNONYMS_B', 'QUALIFICATIONS']
    for col in list_columns:
        df[f'{col}_LIST'] = df[col].str.split('|')
    
    # Add convenience columns
    df['PUBLICATION_YEAR'] = df['PUBMED_AUTHOR'].str.extract(r'\((\d{4})\)').astype('Int64')
    
    # Sort by BIOGRID_INTERACTION_ID
    df = df.sort_values('BIOGRID_INTERACTION_ID')
    
    # Set BIOGRID_INTERACTION_ID as index but keep it as a column too
    df.set_index('BIOGRID_INTERACTION_ID', inplace=True, drop=False)
    
    return df

In [14]:
import json

# Read the JSON file
with open(file_path, 'r') as f:
    data = json.load(f)

# Convert to DataFrame
df = convert_rb1_interactions_to_df(data)

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 38 entries, 2208329 to 3411689
Data columns (total 27 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   BIOGRID_INTERACTION_ID    38 non-null     Int64   
 1   ENTREZ_GENE_A             38 non-null     Int64   
 2   ENTREZ_GENE_B             38 non-null     Int64   
 3   BIOGRID_ID_A              38 non-null     Int64   
 4   BIOGRID_ID_B              38 non-null     Int64   
 5   SYSTEMATIC_NAME_A         33 non-null     object  
 6   SYSTEMATIC_NAME_B         16 non-null     object  
 7   OFFICIAL_SYMBOL_A         38 non-null     category
 8   OFFICIAL_SYMBOL_B         38 non-null     category
 9   SYNONYMS_A                38 non-null     object  
 10  SYNONYMS_B                38 non-null     object  
 11  EXPERIMENTAL_SYSTEM       38 non-null     category
 12  EXPERIMENTAL_SYSTEM_TYPE  38 non-null     category
 13  PUBMED_AUTHOR             38 non-null     obje

In [16]:
df.head()

Unnamed: 0_level_0,BIOGRID_INTERACTION_ID,ENTREZ_GENE_A,ENTREZ_GENE_B,BIOGRID_ID_A,BIOGRID_ID_B,SYSTEMATIC_NAME_A,SYSTEMATIC_NAME_B,OFFICIAL_SYMBOL_A,OFFICIAL_SYMBOL_B,SYNONYMS_A,...,THROUGHPUT,QUANTITATION,MODIFICATION,QUALIFICATIONS,TAGS,SOURCEDB,SYNONYMS_A_LIST,SYNONYMS_B_LIST,QUALIFICATIONS_LIST,PUBLICATION_YEAR
BIOGRID_INTERACTION_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2208329,2208329,23476,5925,117036,111860,,RP11-174I10.1,BRD4,RB1,CAP|HUNK1|HUNKI|MCAP,...,High Throughput,,,CRISPR GI screen|Cell Line:HEK293T EFO:0001184...,,BIOGRID,"[CAP, HUNK1, HUNKI, MCAP]","[OSRC, PPP1R130, RB, p105-Rb, pRb, pp110]","[CRISPR GI screen, Cell Line:HEK293T EFO:00011...",2017
2208411,2208411,672,5925,107140,111860,,RP11-174I10.1,BRCA1,RB1,BRCAI|BRCC1|BROVCA1|FANCS|IRIS|PNCA4|PPP1R53|P...,...,High Throughput,,,CRISPR GI screen|Cell Line: A-549 EFO:0001086|...,,BIOGRID,"[BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PP...","[OSRC, PPP1R130, RB, p105-Rb, pRb, pp110]","[CRISPR GI screen, Cell Line: A-549 EFO:000108...",2017
2341984,2341984,3320,5925,109552,111860,,RP11-174I10.1,HSP90AA1,RB1,EL52|HSP86|HSP89A|HSP90A|HSP90N|HSPC1|HSPCA|HS...,...,High Throughput,,,Chemo-genetic screen with siRNAs|Drug AUY922|H...,,BIOGRID,"[EL52, HSP86, HSP89A, HSP90A, HSP90N, HSPC1, H...","[OSRC, PPP1R130, RB, p105-Rb, pRb, pp110]","[Chemo-genetic screen with siRNAs, Drug AUY922...",2016
2342156,2342156,7465,5925,113303,111860,,RP11-174I10.1,WEE1,RB1,WEE1A|WEE1hu,...,High Throughput,,,Chemo-genetic screen with siRNAs|Drug: MK-1775...,,BIOGRID,"[WEE1A, WEE1hu]","[OSRC, PPP1R130, RB, p105-Rb, pRb, pp110]","[Chemo-genetic screen with siRNAs, Drug: MK-17...",2016
2342225,2342225,7846,5925,113603,111860,,RP11-174I10.1,TUBA1A,RB1,B-ALPHA-1|LIS3|TUBA3,...,High Throughput,,,Chemo-genetic screen with siRNAs|Drug: vinorel...,,BIOGRID,"[B-ALPHA-1, LIS3, TUBA3]","[OSRC, PPP1R130, RB, p105-Rb, pRb, pp110]","[Chemo-genetic screen with siRNAs, Drug: vinor...",2016
