# CDC API → Public Health Signals into RWE Governance

This notebook demonstrates pulling **CDC open data** (example: COVID-19 state-level cases)
to showcase how public health signals can be incorporated into **RWE governance** (e.g., timeliness, completeness).

In [None]:
import pandas as pd, numpy as np, requests, matplotlib.pyplot as plt
from datetime import datetime

print("Setup complete.")

## 1) Fetch CDC data via Socrata API

Endpoint example: `https://data.cdc.gov/resource/9mfq-cb36.json` (COVID-19 Case Surveillance).

In [None]:
URL = "https://data.cdc.gov/resource/9mfq-cb36.json"
params = {"$limit": 5000, "$select": "submission_date,state,tot_cases,conf_cases,prob_cases,new_case"}
r = requests.get(URL, params=params, timeout=30)
r.raise_for_status()
raw = r.json()
len(raw)

## 2) Basic quality/timeliness checks

In [None]:
df = pd.DataFrame(raw)
df['submission_date'] = pd.to_datetime(df['submission_date'], errors='coerce')
for col in ['tot_cases','conf_cases','prob_cases','new_case']:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.head()

In [None]:
timeliness = (df['submission_date'] >= (pd.Timestamp.utcnow() - pd.Timedelta(days=14))).mean()
completeness = df[['tot_cases','conf_cases','prob_cases','new_case']].notna().mean().mean()
timeliness, completeness

In [None]:
daily = df.groupby('submission_date')['new_case'].sum().dropna()
fig, ax = plt.subplots()
ax.plot(daily.index, daily.values)
ax.set_title("New Cases Over Time (CDC example)")
plt.tight_layout()

*Next:* Join CDC signals with trial geographies to inform enrollment feasibility or external control calibration, and feed into the governance scorecard.