# ClinicalTrials.gov API → Portfolio Context & ROI Scenarios

This notebook fetches **trial metadata** from ClinicalTrials.gov and demonstrates how to:
1. Build a **portfolio view** (counts, statuses, durations)  
2. Feed **ROI scenarios** (e.g., external control arms) using the `scripts/roi.py` model.

> Uses the *StudyFields* endpoint for broad coverage; you can switch to v2 API if preferred.

In [None]:
import requests, pandas as pd, numpy as np
from datetime import datetime
from scripts.roi import TrialScenario, roi_summary
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', 50)
print("Setup complete.")

## 1) Fetch trials by condition/keyword

Adjust `expr` to your TA of interest (e.g., oncology).

In [None]:
BASE = "https://clinicaltrials.gov/api/query/study_fields"
params = {
    "expr": "oncology",  # change as needed
    "fields": "NCTId,Condition,OverallStatus,StartDate,CompletionDate,StudyType,Phase,EnrollmentCount",
    "min_rnk": 1,
    "max_rnk": 300,
    "fmt": "json"
}
r = requests.get(BASE, params=params, timeout=30)
r.raise_for_status()
data = r.json()["StudyFieldsResponse"]["StudyFields"]
len(data)

## 2) Normalize + derive durations

In [None]:
def first(x): return x[0] if isinstance(x, list) and x else None
def to_dt(s):
    if s is None: return None
    for fmt in ("%B %Y", "%Y-%m", "%Y-%m-%d", "%Y"):
        try:
            return datetime.strptime(s, fmt)
        except Exception:
            pass
    return None

rows = []
for rec in data:
    start = to_dt(first(rec.get("StartDate")))
    comp = to_dt(first(rec.get("CompletionDate")))
    dur_m = (comp - start).days/30.4 if (start and comp) else None
    rows.append({
        "nctid": first(rec.get("NCTId")),
        "condition": first(rec.get("Condition")),
        "status": first(rec.get("OverallStatus")),
        "study_type": first(rec.get("StudyType")),
        "phase": first(rec.get("Phase")),
        "enrollment": (int(first(rec.get("EnrollmentCount"))) if first(rec.get("EnrollmentCount")) else None),
        "start": start, "completion": comp, "duration_months": dur_m
    })
df = pd.DataFrame(rows)
df.head()

## 3) Portfolio view

In [None]:
status_counts = df['status'].value_counts(dropna=False).sort_values(ascending=False)
status_counts

In [None]:
fig, ax = plt.subplots()
ax.bar(status_counts.index.astype(str), status_counts.values)
ax.set_title("Trial Status Distribution")
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()

## 4) ROI scenario (illustrative)

We take median enrollment and duration to seed an ROI scenario.

In [None]:
med_enroll = int(df['enrollment'].dropna().median()) if df['enrollment'].notna().any() else 300
med_duration = float(df['duration_months'].dropna().median()) if df['duration_months'].notna().any() else 36.0

trial = TrialScenario(
    baseline_duration_months = med_duration,
    patients_treatment = med_enroll//2,
    patients_control = med_enroll//2,
    cost_per_patient_usd = 50000,
    prob_reg_accept_rwe = 0.65,
    prob_reg_accept_trad = 0.55,
    discount_rate_annual = 0.10,
    monthly_benefit_usd = 5_000_000
)
res = roi_summary(trial, months_saved_with_rwe=6)
res

In [None]:
fig, ax = plt.subplots()
labels = ["Direct Savings", "Time Benefit", "EV Uplift"]
values = [res["savings"], res["discounted_benefit"], res["ev_uplift"]]
ax.bar(labels, values)
ax.set_title("Benefit Components (USD)")
for i, v in enumerate(values):
    ax.text(i, v, f"{int(v):,}", ha="center", va="bottom", rotation=90)
plt.tight_layout()

*Next:* switch `expr` to your priority TA, stratify by phase, and compare scenarios across geographies or sponsors.