# Conformal Prediction

## Overview

**Conformal Prediction (CP)** is a framework that wraps around any predictive model and converts its point predictions into **prediction sets** (or **intervals** for regression) that contain the true label or value with a user-chosen probability (e.g. 90% or 95%).

The key appeal of conformal prediction is that it provides **finite-sample, distribution-free coverage guarantees** under very weak assumptions—primarily that the data are **exchangeable**.

Conformal prediction is **not a predictive model itself**, but a **calibration layer** applied on top of an existing model such as linear regression, random forests, gradient boosting, or neural networks.

---

## Core Idea and Goal

The main goal of conformal prediction is to answer:

> *“How uncertain should I be about this prediction, given how wrong my model has been in the past?”*

Instead of outputting a single prediction, conformal prediction outputs a **set of plausible labels** such that, in repeated sampling, the true label lies inside the set with probability at least \(1 - \varepsilon\).

This guarantee:
- Holds in **finite samples**
- Does **not** depend on the correctness of the model
- Does **not** assume any parametric form of the noise

---

## Key Assumption: Exchangeability

Conformal prediction relies on **exchangeability** of the data:

> The joint distribution of the data does not change under permutations of the sample order.

Exchangeability is slightly weaker than i.i.d., but it still excludes:
- Strong time dependence
- Severe non-stationarity
- Uncorrected distribution shift

Violations of exchangeability are the most common reason conformal methods fail in practice.

---

## Full (Transductive) Conformal Prediction

### High-Level Intuition

For a new input \(x_{n+1}\), conformal prediction:
1. Tries each possible label \(y\)
2. Measures how “strange” the pair \((x_{n+1}, y)\) looks compared to past data
3. Keeps only labels that are **not unusually strange**

---

### Step-by-Step Algorithm

#### 1. Choose a nonconformity measure

A **nonconformity score** maps a labeled example \((x_i, y_i)\) to a real number measuring how unusual it is.

Examples:
- Regression:  
  \[
  \alpha_i = |y_i - \hat{f}(x_i)|
  \]
- Classification:  
  \[
  \alpha_i = 1 - \hat{P}(y_i \mid x_i)
  \]

---

#### 2. Augment data with a candidate label

For a new input \(x_{n+1}\), consider a candidate label \(y\).

Temporarily add \((x_{n+1}, y)\) to the dataset.

---

#### 3. Refit (if using full conformal) and compute scores

Train or refit the underlying model on all \(n+1\) points and compute nonconformity scores:
\[
\alpha_1, \alpha_2, \dots, \alpha_{n+1}
\]

Let \(\alpha_{n+1}\) be the score of the candidate example.

---

#### 4. Compute the conformal p-value

\[
p(y) = \frac{|\{ i : \alpha_i \ge \alpha_{n+1} \}|}{n+1}
\]

Under exchangeability, these p-values are uniformly (or conservatively) distributed.

---

#### 5. Form the prediction set

For a target error rate \(\varepsilon\):
\[
\mathcal{C}(x_{n+1}) = \{ y : p(y) > \varepsilon \}
\]

This set contains the true label with probability at least \(1 - \varepsilon\).

---

### Practical Limitation

Full conformal prediction requires retraining or reevaluating the model **for each candidate label**, which is computationally expensive for modern models.

---

## Split Conformal Prediction (Practical Variant)

Split conformal prediction avoids repeated retraining and is the most widely used version in practice.

---

### Algorithm

1. **Split data** into:
   - Training set
   - Calibration set

2. **Train the base model** on the training set only.

3. **Compute nonconformity scores** on the calibration set:
   - Regression: residuals
   - Classification: misclassification or probability-based scores

4. **Compute a quantile** of calibration scores:
   \[
   q = \text{Quantile}_{(1-\varepsilon)\left(1 + \frac{1}{n_{\text{cal}} + 1}\right)}
   \]

5. **Predict for new data**:
   - Regression interval:
     \[
     [\hat{y}(x) - q,\ \hat{y}(x) + q]
     \]
   - Classification set:
     include all labels whose scores are below the threshold

---

### Why Split Conformal Works

Even without retraining, the calibration residuals remain exchangeable with future residuals, preserving finite-sample coverage guarantees.

---

## What the Guarantees Mean

The main guarantee is **marginal coverage**:

> Over repeated draws of the entire dataset (training, calibration, and test point), the prediction set contains the true label with probability at least \(1 - \varepsilon\).

Important clarifications:
- Coverage is **not conditional on a specific \(x\)**
- Some subpopulations may be over- or under-covered
- Guarantees are frequentist, not Bayesian

---

## Online and Streaming Conformal Prediction

Conformal prediction can be adapted to streaming settings:
- Produces approximately independent error indicators
- Enables real-time monitoring of coverage
- Useful for detecting concept drift and distribution shift

---

## Pros

### Distribution-free, finite-sample guarantees
Valid coverage without assumptions on:
- Noise distribution
- Linearity
- Model correctness

Only exchangeability is required.

---

### Model-agnostic
Works with any predictor:
- Linear models
- Tree-based models
- Neural networks
- Black-box systems

---

### Simple and scalable (split conformal)
Requires only:
- A train–calibration split
- Computing empirical quantiles

Easy to integrate into existing pipelines.

---

### Finite-sample validity
Guarantees hold at **any sample size**, not just asymptotically.

---

### Useful for monitoring and diagnostics
Nonconformity scores and p-values can flag:
- Outliers
- Distribution shifts
- Concept drift

---

## Cons

### High computational cost (full conformal)
Full conformal prediction is often infeasible for:
- Large datasets
- Deep or complex models

---

### Marginal, not conditional, coverage
Standard conformal methods:
- Guarantee coverage on average
- Do **not** ensure equal coverage across subgroups or feature regions

---

### Large prediction sets when the model is weak
If the base model performs poorly:
- Prediction intervals become wide
- Classification sets include many labels

This is statistically valid but may be operationally unhelpful.

---

### Exchangeability violations in real data
Time series, non-stationarity, and regime changes can break assumptions unless:
- Adaptive conformal
- Online conformal
- Covariate-conditional variants are used

---

### Design choices affect efficiency
Choices of:
- Nonconformity score
- Data splitting strategy
- Calibration window size

directly impact the tightness and usefulness of prediction sets.

---

## Summary

Conformal prediction provides **honest uncertainty** by calibrating predictions against historical errors rather than modeling uncertainty explicitly.

It excels when:
- Coverage guarantees matter
- Models are black-box
- Distributional assumptions are risky

But it requires care under:
- Distribution shift
- Heterogeneous subpopulations
- Real-time, non-stationary systems


# Split Conformal Prediction

## Overview

**Split Conformal Prediction** is a practical and scalable variant of conformal prediction.  
It transforms point predictions from any base model into **prediction sets or intervals** with **finite-sample coverage guarantees**.

Given a user-chosen miscoverage level \(\alpha\) (e.g. \(\alpha = 0.1\) for 90% coverage), split conformal prediction guarantees at least \(1 - \alpha\) marginal coverage **in finite samples**, under the assumption of **exchangeability**, conditional on the training set.

---

## Intuition

> *Use a held-out calibration set to measure how wrong the model tends to be, and then inflate future predictions just enough to guarantee coverage.*

Split conformal avoids repeated retraining by separating **model fitting** and **uncertainty calibration**.

---

## General Algorithm

Split conformal prediction applies to **any base model** and works for both **regression** and **classification**.

---

### Step 1: Split the data

Divide the dataset into two disjoint parts:

- **Proper training set** \(D_1\), size \(n_1\)
- **Calibration set** \(D_2\), size \(n_2\)

A common choice is a 50/50 split, though larger calibration sets improve the stability of the coverage.

---

### Step 2: Train the base model

Train the predictor \(\hat{f}_{n_1}\) using only the training set \(D_1\):

\[
\hat{f}_{n_1} \leftarrow \text{Train}(D_1)
\]

The calibration data must not be used for training.

---

### Step 3: Compute calibration nonconformity scores

For each calibration example \((X_i, Y_i) \in D_2\), compute a **nonconformity score**:

\[
R_i = V(X_i, Y_i)
\]

where \(V(\cdot)\) is **negatively oriented** (smaller values indicate better conformity).

**Common choices:**
- Regression:  
  \[
  V(x, y) = |y - \hat{f}_{n_1}(x)|
  \]
- Classification:  
  \[
  V(x, y) = 1 - \hat{P}(y \mid x)
  \]

---

### Step 4: Compute the conformal quantile

Let \(\{R_1, \dots, R_{n_2}\}\) be the calibration scores.

Define the conformal threshold:

\[
\hat{q}_{n_2} =
\text{the } \left\lceil (1 - \alpha)(n_2 + 1) \right\rceil
\text{-th smallest value of } \{R_i\}
\]

This quantile choice ensures finite-sample coverage.

---

### Step 5: Predict for a new input

For a new input \(x\), define the conformal prediction set:

\[
\hat{C}_n(x) = \{ y : V(x, y) \le \hat{q}_{n_2} \}
\]

This set contains the true label with probability at least \(1 - \alpha\).

---

## Regression Case

For regression, using absolute residuals as the nonconformity score:

\[
V(x, y) = |y - \hat{f}(x)|
\]

the conformal prediction set becomes an **interval**:

\[
\boxed{
\left[
\hat{f}(x) - \hat{q}_{n_2},
\;
\hat{f}(x) + \hat{q}_{n_2}
\right]
}
\]

---

## Coverage Guarantee

Under exchangeability, split conformal prediction satisfies:

\[
\mathbb{P}\left( Y_{\text{new}} \in \hat{C}_n(X_{\text{new}}) \right) \ge 1 - \alpha
\]

Key points:
- The guarantee is **finite-sample**, not asymptotic
- Coverage is **marginal**, not conditional on a specific \(x\)
- Valid **conditional on the trained model**

---

## Practical Notes

- Larger calibration sets → more stable quantiles
- Weak base models → wider prediction intervals
- Exchangeability violations (e.g. time series, drift) require adaptive or online variants

---

## Summary

Split conformal prediction is a simple, model-agnostic method to obtain **honest uncertainty estimates**:

- Train once
- Calibrate once
- Predict with guaranteed coverage

It is widely used because it scales to complex models while preserving strong statistical guarantees.


In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor  # Or any model

# Assume X_train (n x d), y_train (n,)
X_cal, X_prop, y_cal, y_prop = train_test_split(X_train, y_train, test_size=0.5, random_state=42)

# Train on proper training set
model = RandomForestRegressor()  # Replace with your model
model.fit(X_prop, y_prop)

# Calibration residuals (absolute)
residuals = np.abs(y_cal - model.predict(X_cal))

# Adjusted quantile for alpha=0.1 (90% coverage)
alpha = 0.1
q_hat = np.quantile(residuals, ((n_cal + 1) * (1 - alpha)) / n_cal)  # Equivalent form

# Predict interval for new x_test
point_pred = model.predict(x_test)
lower = point_pred - q_hat
upper = point_pred + q_hat

NameError: name 'X_train' is not defined

This yields intervals with coverage in 
[
1
−
α
,
1
−
α
+
1
/
(
n
2
+
1
)
]
[1−α,1−α+1/(n 
2
 +1)].

## Classification Example

For multiclass, use 
V
(
x
,
y
)
=
−
p
^
y
(
x
)
V(x,y)=− 
p
^
  
y
 (x) (negative log-prob or 1 - prob of class y). Include classes where score ≤ 
q
^
n
2
q
^
  
n 
2
 
 .


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelBinarizer

# After split and model.fit(X_prop, y_prop) as above (classifier)

# Calibration scores: negative prob of true class
probs_cal = model.predict_proba(X_cal)
true_probs_cal = np.array([row[y] for row, y in zip(probs_cal, y_cal)])  # Prob of true label
residuals = -true_probs_cal  # Or np.log(1 - true_probs_cal + 1e-10)

q_hat = np.quantile(residuals, ((n_cal + 1) * (1 - alpha)) / n_cal)

# For new x_test
probs_test = model.predict_proba(x_test.reshape(1, -1))[0]
pred_set = np.where(-probs_test <= q_hat)[0]  # Classes in set


Libraries and Tips
MAPIE (Python): from mapie.regression import MapieRegressor; mapie = MapieRegressor(model, method='plus'); mapie.fit(X_prop, y_prop); y_pred, y_pis = mapie.predict(X_test, alpha=0.1) for split conformal.

ConformalPrediction.jl (Julia) or tidymodels (R) for other languages.

Use studentized residuals 
V
(
x
,
y
)
=
∣
y
−
f
^
(
x
)
∣
/
σ
^
(
x
)
V(x,y)=∣y− 
f
^
 (x)∣/ 
σ
^
 (x) for heteroscedastic data (train extra spread model 
σ
^
σ
^
 ).
​

Larger calibration set (
n
2
≥
100
n 
2
 ≥100) reduces coverage variance.

For efficiency, better base models yield tighter sets.