# Method Comparison

This notebook compares conformal estimation variants on a fixed benchmark dataset.
If you are new to this topic, read each section in order and focus on how the decision rule changes while the detector and data stay constant.

## Import

This section loads all dependencies used throughout the notebook.
The probabilistic sections require optional probabilistic extras.

In [1]:
import logging

import pandas as pd
from oddball import Dataset, load
from pyod.models.hbos import HBOS
from scipy.stats import false_discovery_control

from nonconform import ConformalDetector, Empirical, Probabilistic, Split
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
from nonconform.metrics import false_discovery_rate, statistical_power
from nonconform.weighting import (
    BootstrapBaggedWeightEstimator,
    logistic_weight_estimator,
)

root_logger = logging.getLogger("nonconform")
if not root_logger.handlers:
    root_logger.addHandler(logging.NullHandler())
root_logger.setLevel(logging.ERROR)

## Setup

We load the Shuttle benchmark with a deterministic seed for reproducible outputs.
Using one fixed dataset helps isolate differences between estimation and selection methods.

In [2]:
x_train, x_test, y_test = load(Dataset.SHUTTLE, setup=True, seed=1)

n_calib = 1_000
strategy = Split(n_calib=n_calib)
alpha = 0.2

n_positives = int(y_test.sum())
print(f"x_train: {x_train.shape}, x_test: {x_test.shape}")
print(f"y_test positives: {n_positives}")
print(f"alpha={alpha}, calibration size={n_calib}")

x_train: (22793, 9), x_test: (1000, 9)
y_test positives: 100
alpha=0.2, calibration size=1000


## Method Comparison

We evaluate four configurations that combine estimation type and selection rule.
Note, that weighted p-values need distinct multiple testing adjustment.

1. Standard Empirical + Benjamini-Hochberg Procedure (BH)
2. Standard Probabilistic + Benjamini-Hochberg Procedure (BH)
3. Weighted Empirical + Weighted Conformal Selection (WCS)
4. Weighted Probabilistic + Weighted Conformal Selection (WCS)

In [3]:
def summarize_row(name, decisions):
    """Compute summary metrics for one decision vector."""
    return {
        "method": name,
        "discoveries": int(decisions.sum()),
        "fdr": float(false_discovery_rate(y=y_test, y_hat=decisions)),
        "power": float(statistical_power(y=y_test, y_hat=decisions)),
    }


rows = []

# 1) Standard Empirical + BH
ce = ConformalDetector(
    detector=HBOS(),
    strategy=strategy,
    estimation=Empirical(),
    seed=1,
)
ce.fit(x_train)
p_values = ce.compute_p_values(x_test)
decisions = false_discovery_control(p_values, method="bh") <= alpha
rows.append(summarize_row("Standard Empirical (BH)", decisions))

# 2) Standard Probabilistic + BH
pce = ConformalDetector(
    detector=HBOS(),
    strategy=strategy,
    estimation=Probabilistic(n_trials=10),
    seed=1,
)
pce.fit(x_train)
p_values = pce.compute_p_values(x_test)
decisions = false_discovery_control(p_values, method="bh") <= alpha
rows.append(summarize_row("Standard Probabilistic (BH)", decisions))

# 3) Weighted Empirical + WCS
wce = ConformalDetector(
    detector=HBOS(),
    strategy=strategy,
    weight_estimator=BootstrapBaggedWeightEstimator(
        base_estimator=logistic_weight_estimator(),
        n_bootstraps=100,
    ),
    estimation=Empirical(),
    seed=1,
)
wce.fit(x_train)
_ = wce.compute_p_values(x_test)
decisions = weighted_false_discovery_control(
    result=wce.last_result,
    alpha=alpha,
    pruning=Pruning.DETERMINISTIC,
    seed=1,
)
rows.append(summarize_row("Weighted Empirical (WCS)", decisions))

# 4) Weighted Probabilistic + WCS
wpce = ConformalDetector(
    detector=HBOS(),
    strategy=strategy,
    weight_estimator=BootstrapBaggedWeightEstimator(
        base_estimator=logistic_weight_estimator(),
        n_bootstraps=100,
    ),
    estimation=Probabilistic(n_trials=10),
    seed=1,
)
wpce.fit(x_train)
_ = wpce.compute_p_values(x_test)
decisions = weighted_false_discovery_control(
    result=wpce.last_result,
    alpha=alpha,
    pruning=Pruning.DETERMINISTIC,
    seed=1,
)
rows.append(summarize_row("Weighted Probabilistic (WCS)", decisions))

results = pd.DataFrame(rows)
print(results.to_string(index=False, float_format=lambda x: f"{x:.4f}"))

                      method  discoveries    fdr  power
     Standard Empirical (BH)          118 0.2034 0.9400
 Standard Probabilistic (BH)          112 0.1607 0.9400
    Weighted Empirical (WCS)          105 0.1048 0.9400
Weighted Probabilistic (WCS)          105 0.1048 0.9400


## Interpretation

Use this section to connect each row in the table back to its statistical decision procedure.

- Standard rows use Benjamini-Hochberg on p-values.
- Weighted rows use Weighted Conformal Selection (WCS) via `weighted_false_discovery_control`.
- Probabilistic estimators can be more powerful with smaller calibration sets (<1000),
  but they trade the strict finite-sample guarantees of empirical conformal p-values
  for asymptotic guarantees.
