# Notebook 3: ΔAUC Bias and One-Tailed Paired Test

This notebook computes **ΔAUC** (MAR − Non-MAR) and performs a **one-tailed paired test** over bootstrap replicates, consistent with Annex GG recommendations.


In [None]:
import numpy as np
from sklearn.metrics import roc_auc_score
from mareval import compute_auc, bootstrap_auc_difference

rng = np.random.default_rng(123)
N = 800
# Non-MAR dataset
x0 = rng.normal(0, 1, size=N)
x1 = rng.normal(0.35, 1, size=N)
labels = np.r_[np.zeros(N), np.ones(N)]
scores_nonmar = np.r_[x0, x1]
auc_nonmar = compute_auc(labels, scores_nonmar)

# MAR dataset (shifted mean for lesions)
y0 = rng.normal(0, 1, size=N)
y1 = rng.normal(0.55, 1, size=N)
scores_mar = np.r_[y0, y1]
auc_mar = compute_auc(labels, scores_mar)

print('AUC (Non-MAR):', round(auc_nonmar, 3))
print('AUC (MAR):    ', round(auc_mar, 3))

# Bootstrap ΔAUC with paired resampling
res = bootstrap_auc_difference(labels, scores_mar, scores_nonmar, n_boot=2000, rng=rng)
print('\nΔAUC (mean):', round(res["delta_auc_mean"], 4))
print('ΔAUC CI (95%): [{:.4f}, {:.4f}]'.format(res["ci_low"], res["ci_high"]))
print('One-tailed p-value (MAR > Non-MAR):', res["p_value_one_tailed"])

**Interpretation (per Annex GG):**

- **ΔAUC (bias)**: positive values indicate improved lesion detectability with MAR.
- **One-tailed paired test**: tests H₀: ΔAUC ≤ 0 vs H₁: ΔAUC > 0.
- Report **ΔAUC**, **95% CI**, and **p-value** alongside scanner/configuration metadata.
