# Bioactivity Analysis for Proteins

This notebook demonstrates how to:
1. Fetch bioactivity data from PubChem and ChEMBL for a given protein (by UniProt ID).
2. Summarize key bioactivity metrics (e.g., IC50, Ki, Kd).
3. Save results to files in JSON or YAML format.

**Tools:** 
- Python
- Requests (for PubChem API)
- ChEMBL Web Resource Client

1. Import required modules

In [1]:
from bioactivity_analysis.fetch_pubchem import fetch_pubchem_bioactivity
from bioactivity_analysis.fetch_chembl import fetch_chembl_bioactivity
from bioactivity_analysis.utils import generate_summary, save_to_file
import pandas as pd
import json

2. Feel free to change the UniProt ID and define output settings interactively.

In [11]:
# Input variables
uniprot_id = "Q9NZQ7"  # Example UniProt ID
output_base = "../../results"
output_format = "json"  # Choose "json" or "yaml"

# Construct output file name
output_file = f"{output_base}/{uniprot_id}.{output_format}"

3. Fetch Bioactivity Data
Fetch the data from PubChem and ChEMBL.

In [12]:
# Fetch data from PubChem
pubchem_data = fetch_pubchem_bioactivity(uniprot_id)
print(f"PubChem data fetched: {len(pubchem_data)} records")

# Fetch data from ChEMBL
chembl_data = fetch_chembl_bioactivity(uniprot_id)
print(f"ChEMBL data fetched: {len(chembl_data)} records")


PubChem data fetched: 0 records
ChEMBL data fetched: 134 records


4. Summarize Bioactivity Data
Generate and display summaries of IC50, Ki, and Kd ranges.

In [13]:
# Generate summaries
pubchem_summary = generate_summary(pubchem_data)
chembl_summary = generate_summary(chembl_data)

# Display summaries
print("PubChem Summary:")
print(json.dumps(pubchem_summary, indent=4))

print("ChEMBL Summary:")
print(json.dumps(chembl_summary, indent=4))

PubChem Summary:
{}
ChEMBL Summary:
{
    "IC50": {
        "min": 1.0,
        "max": 93500.0
    },
    "Kd": {
        "min": 0.019,
        "max": 3000000.0
    }
}


5. Save Results
Save the fetched data and summaries to files.

In [14]:
# Prepare output data
output_data = {
    "UniProt_ID": uniprot_id,
    "PubChem": {
        "Bioactivity": pubchem_data,
        "Summary": pubchem_summary,
    },
    "ChEMBL": {
        "Bioactivity": chembl_data,
        "Summary": chembl_summary,
    },
}

# Save the file
save_to_file(output_data, output_file, format=output_format)
print(f"Results saved to {output_file}")

Results saved to ../../results/Q9NZQ7.json


7. Visualize Overlaps
Add a visualization for overlaps between datasets (e.g., a Venn diagram).

In [15]:
# Visualization placeholder
# Example: Compare PubChem and ChEMBL molecule IDs
pubchem_ids = {entry.get("CID") for entry in pubchem_data if "CID" in entry}
chembl_ids = {entry.get("molecule_chembl_id") for entry in chembl_data if "molecule_chembl_id" in entry}

# Intersection
overlap = pubchem_ids.intersection(chembl_ids)
print(f"Number of overlapping compounds: {len(overlap)}")

Number of overlapping compounds: 0
