# Conformal Prediction

**Sources**: <br>
[Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511](https://arxiv.org/abs/2107.07511)



## Conformal Prediction Part 1: Basic Understanding

## General Understanding

**Conformal Prediction** 

Conformal Prediction can be used with any pre-trained model, such as a neural network. It aims to predict an interval of target value instead of a point-wise prediction.It works perfectly well with the concept of **uncertainty**.

*What we need*:
- A *Dataset* : $\{x_i,y_i\}^n_{i=1}$, which is independent and identically distributed (i.i.d).
- A *Model* : $\hat{\pi}_y(x)$, $\pi_y(x) = \mathbb{P}[Y=y | X=x]$
- A *New Sample* : $X_{n+1}$

*What we generate*:
- A Set of prediction $T(x_{n+1})  \subseteq y$, which contains the true class $y_{n+1}$ with high $\mathbb{P}$ (probability)

**Coverage**

Coverage means that $\mathbb{P}[y_{n+1} \in T(x_{n+1})] \geq 1-\alpha$. The probability of the label is in the prediction set generated by the the new sample $x_{n+1}$ is greater than $1-\alpha$, where $\alpha$ is a hyper-parameter error rate.'

**Goal**
- Exact Coverage: $|T(x_{n+1})| = 1$
- Small Set: $|T(x_{n+1})| < n$
- "Adaptive": $|T(x_{n+1})|$ smaller for easy task and $|T(x_{n+1})| is bigger for hard task$

**Property**
- Any Model
- Any Dataset
- Score Function Matters (Important Decision)

**Conformal Prediction (General Case)**
1) Identify a heuristic notion of uncertainty
2) Define a scalar score function $s(x,y) \in R$
3) Compute $\hat{q} = \frac{\lceil (n+1)(1-\alpha)\rceil}{n}$ is the quantile of $s(x_1,y_1),\cdots, s(x_n),y_n$ (ca)
4) Deploy: $T(x) = \{y: s(x,y) \leq \hat{q}\}$

### Theorem 1 [Coverage]

$1-\alpha \leq \mathbb{P}[y_{n+1} \in T(x_{n+1})] \leq 1-\alpha +\frac{1}{n+1}$

This can be applied to any algorithm, any dataset.


*Why it works?*  

Symmetry!!! The probability of x3 falls on the left of  q is greater than $1-\alpha$

---x2-------------x1----------q--------

## Methods 1 (Vovk et al)

What if we learn a rule to predict sets? using calbration dataset $\{x_i,y_i\}^{n}_{i=1}$

1. Get estimated score of the correct classes ($y_i$) for each of $(x_i,y_i)$: $\{E_{i}\}^{n}_{i=1}$
2. Take the 10% quantile: $\hat{q}$ - At least 90% of examples have true class score above $\hat{q}$
```
q_hat = np.quantile([E1,...En],0.1,'lower')
```

3. Form prediction sets: {All classes whose score exceeds $\hat{q}$ when $x_{n+1}$ input} = Valid Prediction Set! $T(x_{n+1})$

## Method 2 (Romano et al)

|       Method 1       |   Method 2|
|----------------------|-----------|
|smallest average size | usually larger size|
| not very adaptive | designed to be adaptive|
|only use output of true class | use output of all classes|

### Conformalized Quantile **(Classification)**

1) Get a score of the correct class
> Sort the softmax estimation. $E_i = \sum_{j=1}^k \hat{\pi}(x_i)_{(j)}$ where k is the rank of true class. Sum up all scores from high to low until the score of the true label is included.

2) Take the 90% quantile
>```q_hat = np.quantile([E1,...En],0.9,'upper')```

3) Form prediction sets:
> {The K most likely classes where $\sum_{j=1}^K \hat{\pi}(x_{n+1})_{y_{n+1}} \geq \hat{q}$ = Valid Prediction Set $T(x_{n+1})$



### Conformalized Quantile **(Regression)**

We have two models $\hat{t}_{\alpha/2}(x)$ and$\hat{t}_{1-\alpha/2}(x)$, which is 5% quantile and 95% quantile. This can be achieved by training NN with [pinball loss](https://www.lokad.com/pinball-loss-function-definition/).

> *pinball* loss, also referred to as the quantile loss, can be used to assess the accuracy of a quantile forecast.
> 
> $\begin{align}L_{\tau}(y,z) & = (y-z)\tau & \text{ if } y\geq z\\
        &= (z-y)(1-\tau) & \text{ if } z > y
        \end{align} $
> 
> where $\tau$ is the target quantile, $y$ is the real value and $z$ is the quamtile forecast
>
> The pinball loss is always positive. The larger the value of *pinball* loss, the further away from the target $y$. The lower the pinball loss, the more accurate the quantile forecast.

<div style="text-align:center;">
  <img height="100%" width="50%" src="sources/conformalized_quantile_regression.png" />
</div>

1) Get score of correct class
> $E_i$ = Projection of $y_i$ onto $[\hat{t}_{\alpha/2}(x_i),\hat{t}_{1-\alpha/2}(x_i)]$, or distance of how far the estimation is outside of the band.

2) Take the 90% quantile
>```q_hat = np.quantile([E1,...En],0.9,'upper')```

3) Form prediction sets
>$T(x_{n+1})$ = Valid Prediction Set $T(x_{n+1}) = [\hat{t}_{\alpha/2}(x_{n+1})-\hat{q},\hat{t}_{1-\alpha/2}(x_{n+1})+\hat{q}]$ = Valid Prediction Set! $T(x_{n+1})$


## Conformal Prediction Part 2: Conditional Coverage and Diagnostics