# End-to-End Drug Repurposing with 24Agents

This notebook demonstrates a full, realistic drug repurposing workflow using datasets hosted on the **24Agents** platform and accessed via the **Hypha RPC API**.

We combine measured binding affinities (BindingDB) with curated repurposing metadata from the Broad Institute to rank candidate compounds for a target of interest.

The notebook is designed to run in **Python 3.11** and is **Pyodide-compatible** (browser execution).

## 1. Environment setup (Pyodide-safe)

We install only pure-Python dependencies using `micropip`, so this notebook can run in a browser-backed Pyodide kernel.

In [1]:
import sys

# Detect Pyodide
IS_PYODIDE = 'pyodide' in sys.modules

if IS_PYODIDE:
    import micropip
    await micropip.install([
        'pandas',
        'matplotlib',
        'python-dotenv',
        'httpx',
        'hypha-rpc'
    ])

## 2. Authentication

The notebook expects an API token to be available via a `.env` file or environment variable.

```bash
BIOIMAGEIO_API_TOKEN=your-token-here
```

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

API_TOKEN = os.getenv('HYPHA_TOKEN')
if not API_TOKEN:
    raise RuntimeError('HYPHA_TOKEN not found in environment')

RuntimeError: BIOIMAGEIO_API_TOKEN not found in environment

## 3. Connect to the 24Agents backend

We use the Hypha RPC client to connect and retrieve the `artifact-manager` service, which provides access to datasets.

In [None]:
from hypha_rpc import connect_to_server

server = await connect_to_server(
    server_url='https://hypha.aicell.io',
    token=API_TOKEN
)

artifact_manager = await server.get_service('public/services/artifact-manager')

## 4. Locate datasets of interest

We programmatically reference known dataset IDs in the `24agents-science` workspace.

In [None]:
DATASETS = {
    'bindingdb': '24agents-science/dataset-bindingdb-all-202409',
    'repurposing_smiles': '24agents-science/dataset-broad-repurposing-hub-molecule-with-smiles',
    'repurposing_meta': '24agents-science/dataset-broad-repurposing-hub-phase-moa-target-info'
}

DATASETS

## 5. Download and load data

We download dataset files via pre-signed URLs and load them into pandas.

In [None]:
import pandas as pd
import httpx

async def load_dataset(artifact_id):
    files = await artifact_manager.list_files(artifact_id=artifact_id)
    data_file = [f for f in files if f['type'] == 'file'][0]['name']
    url = await artifact_manager.get_file(artifact_id=artifact_id, file_path=data_file)
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)
        resp.raise_for_status()
import io
    if data_file.endswith('.csv') or data_file.endswith('.tsv'):
        sep = '	' if data_file.endswith('.tsv') else ','
        return pd.read_csv(io.StringIO(resp.text), sep=sep)
    return pd.read_parquet(io.BytesIO(resp.content))

bindingdb = await load_dataset(DATASETS['bindingdb'])
repurposing_smiles = await load_dataset(DATASETS['repurposing_smiles'])
repurposing_meta = await load_dataset(DATASETS['repurposing_meta'])

## 6. Define a target-centric question

As an example, we focus on a single protein target and ask:

> *Which clinically advanced compounds show strong binding evidence for this target?*

In [None]:
TARGET_NAME = 'EGFR'

target_hits = bindingdb[bindingdb['Target Name'].str.contains(TARGET_NAME, na=False)]
target_hits = target_hits.dropna(subset=['Kd (nM)'])

## 7. Integrate repurposing metadata

We annotate binding hits with clinical phase and mechanism-of-action information.

In [None]:
merged = target_hits.merge(
    repurposing_smiles,
    left_on='Ligand Name',
    right_on='compound_name',
    how='left'
).merge(
    repurposing_meta,
    on='compound_id',
    how='left'
)

## 8. Rank candidate compounds

We construct a simple, transparent scoring function combining affinity and clinical maturity.

In [None]:
def phase_score(phase):
    mapping = {
        'Approved': 3,
        'Phase 3': 2,
        'Phase 2': 1.5,
        'Phase 1': 1,
    }
    return mapping.get(str(phase), 0.5)

merged['affinity_score'] = -merged['Kd (nM)'].apply(lambda x: float(x))
merged['phase_score'] = merged['clinical_phase'].apply(phase_score)
merged['total_score'] = merged['affinity_score'] * merged['phase_score']

ranked = merged.sort_values('total_score', ascending=False)

## 9. Inspect top candidates

In [None]:
ranked[[
    'Ligand Name',
    'Kd (nM)',
    'clinical_phase',
    'mechanism_of_action',
    'total_score'
]].head(10)

## 10. Visualize the trade-off between affinity and maturity

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(6, 4))
plt.scatter(-ranked['Kd (nM)'], ranked['phase_score'], alpha=0.5)
plt.xlabel('Binding affinity (higher is better)')
plt.ylabel('Clinical phase score')
plt.title(f'Target: {TARGET_NAME}')
plt.show()

## 11. Next steps

This workflow can be extended by:
- Adding DDInter data to flag risky combinations
- Incorporating target expression (e.g. DepMap)
- Persisting ranked candidates back to 24Agents as a new dataset artifact

All of these can be achieved using the same `artifact-manager` API demonstrated above.