"The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification"
Tavor Baharav
AISTATS 2026 Spotlight (earlier version appeared in ICML 2025 EXAIT Workshop).
arXiv link: https://arxiv.org/abs/2510.01020
Patients arrive over time
-
Predict (
$Z_t = 0$ ): predict the patient's disease status$Y_t$ without testing, incurring a potential misclassification error. -
Test (
$Z_t = 1$ ): test the patient and observe the true label, then record it as the prediction.
The goal is to minimize the total number of tests used while ensuring the cumulative misclassification rate never exceeds a user-supplied tolerance
The disease label follows a logistic model:
Intuitively,
The opt baseline assumes full knowledge of
With probability at least
numpy
scipy
matplotlib
tqdm
Install with:
pip install numpy scipy matplotlib tqdmpython SCOUT.pyThis calls generate_paper_figures(), which runs three experiments (d=2, α=0.05), (d=2, α=0.10), (d=8, α=0.10) and saves PDF figures to figures/.
import numpy as np
from SCOUT import eval_on_real_data
# X: (n_samples, n_features) float array
# Y: (n_samples,) binary int array {0, 1}
results = eval_on_real_data(X, Y, alpha=0.05, num_perms=10)
print("Oracle theta_star:", results['theta_star'])
print("Oracle threshold tau_star:", results['tau_star'])
print("Oracle test fraction:", results['opt_test_rate'])eval_on_real_data fits θ* on the full dataset (the oracle), then runs SCOUT over num_perms random orderings of the data, comparing SCOUT's online decisions against the oracle baseline. Results are plotted and saved to figures/. See scout_tester.ipynb for an end-to-end example with synthetic data.
from SCOUT import run_scout_experiment
import numpy as np
d, alpha = 4, 0.05
true_theta = np.ones(d) / np.sqrt(d)
results = run_scout_experiment(d=d, T=10_000, alpha=alpha, delta=0.1,
S=1, true_theta=true_theta, num_runs=20)from SCOUT import generate_synthetic_data
import numpy as np
# Returns feature matrix X, binary labels Y, and the true_theta used.
X, Y, true_theta = generate_synthetic_data(d=4, n=5000, S=1, seed=42)validate_on_synthetic_data generates a synthetic logistic dataset and immediately passes it through the full eval_on_real_data pipeline, so you can verify correctness against a known ground truth.
from SCOUT import validate_on_synthetic_data
results = validate_on_synthetic_data(d=2, n=5000, alpha=0.05, num_perms=5)
print("True theta:", results['true_theta'])
print("Fitted theta_star:", results['theta_star'])A minimal smoke-test suite is included in test_scout.py. It covers all public API functions with small parameters (T=300, n=300) so it completes in under a minute.
# with pytest
pytest test_scout.py -v
# or standalone
python test_scout.pyEach experiment produces:
- A 3-panel PDF figure in
figures/showing cumulative test rate, excess tests (regret), and cumulative error rate — all with 10–90 percentile bands across runs. - A
.npzfile infigures/containing the raw per-run arrays for further analysis.