# Conformal prediction

Conformal predictors are predictive models that associate each of their predictions with a measure of statistically valid confidence. Given a test object $x_i$ and a user-specified significance level $\epsilon \in (0, 1)$, a conformal predictor outputs a prediction region $\Gamma_i^{\epsilon} \subseteq Y$ that contains the true output value $y_i \in Y$ with probability $1-\epsilon$.

# Suggested reading

TODO

# Nonconformist Usage:
## 1. Basics

### Example 1a: Simple ICP (classification)
In this example, we construct a simple inductive conformal predictor for classification, using a support vector classifier as the underlying model.

In [3]:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.svm import SVC
from nonconformist.cp import IcpClassifier
from nonconformist.nc import NcFactory
    
iris = load_iris()
idx = np.random.permutation(iris.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:50], idx[50:100], idx[100:]

model = SVC(probability=True)	# Create the underlying model
nc = NcFactory.create_nc(model)	# Create a default nonconformity function
icp = IcpClassifier(nc)			# Create an inductive conformal classifier

# Fit the ICP using the proper training set
icp.fit(iris.data[idx_train, :], iris.target[idx_train])

# Calibrate the ICP using the calibration set
icp.calibrate(iris.data[idx_cal, :], iris.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
prediction = icp.predict(iris.data[idx_test, :], significance=0.05)

# Print the first 5 predictions
print(prediction[:5, :])

[[ True False False]
 [ True False False]
 [ True False False]
 [False  True False]
 [False  True False]]


The result is a boolean numpy.array with shape (n_test, n_classes), where each row is a boolean vector denoting the class labels included in the prediction region at the specified significance level.

For this particular example, we might obtain, for a given test object, a boolean vector [ True True False ], meaning that the $1-\epsilon$ confidence prediction region contains class labels 0 and 1 (i.e., with 95% probability, one of these two classes will be correct).

### Example 1b: Simple TCP (classification)
In this example, we construct a simple transductive conformal predictor for classification, using a support vector classifier as the underlying model.

In [10]:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.svm import SVC
from nonconformist.cp import TcpClassifier
from nonconformist.nc import NcFactory
    
iris = load_iris()
idx = np.random.permutation(iris.target.size)

# Divide the data into training set and test set
idx_train, idx_test = idx[:100], idx[100:]

model = SVC(probability=True)	# Create the underlying model
nc = NcFactory.create_nc(model)	# Create a default nonconformity function
tcp = TcpClassifier(nc)			# Create an transductive conformal classifier

# Fit the TCP using the proper training set
tcp.fit(iris.data[idx_train, :], iris.target[idx_train])

# Produce predictions for the test set, with confidence 95%
prediction = tcp.predict(iris.data[idx_test, :], significance=0.05)

# Print the first 5 predictions
print(prediction[:5, :])

[[ True False False]
 [ True False False]
 [ True False False]
 [False False  True]
 [False False  True]]


We obtain a result that is conceptually identical as in the previous example (although the particular output values might differ).

### Example 1c: Simple ICP (regression)

In this example, we construct a simple inductive conformal predictor for regression, this time using a random forest regression model as the underlying model.

In [6]:
from sklearn.datasets import load_boston
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from nonconformist.cp import IcpRegressor
from nonconformist.nc import NcFactory
    
boston = load_boston()
idx = np.random.permutation(boston.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:300], idx[300:399], idx[399:]

model = RandomForestRegressor()	# Create the underlying model
nc = NcFactory.create_nc(model)	# Create a default nonconformity function
icp = IcpRegressor(nc)			# Create an inductive conformal classifier

# Fit the ICP using the proper training set
icp.fit(boston.data[idx_train, :], boston.target[idx_train])

# Calibrate the ICP using the calibration set
icp.calibrate(boston.data[idx_cal, :], boston.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
prediction = icp.predict(boston.data[idx_test, :], significance=0.05)

# Print the first 5 predictions
print(prediction[:5, :])

[[  8.8   21.6 ]
 [  8.5   21.3 ]
 [ 11.87  24.67]
 [  7.98  20.78]
 [  2.88  15.68]]


This time the result is a numerical numpy.array with shape (n_test, 2), where each row is a vector signifying the lower and upper bounds of an interval, denoting the prediction region at the specified significance level.

For this particular example, we might obtain, for a given test object, a numerical vector [ 8.8  21.6 ], meaning that the $1-\epsilon$ confidence prediction region is the interval $[8.8, 21.6]$ (i.e., with 95% probability, the correct output value lies somehwere on this interval).

## 2. Nonconformity functions

Nonconformist has built-in support for the most common nonconformity functions

### Example 2a: Choosing your underlying model

The simplest way of defining a nonconformity function based on a classification or regression algorithm, is to simply import the algorithm you want to use from sklearn, and create a nonconformity function using nonconformist's NcFactory.

In [23]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from nonconformist.nc import NcFactory

nc_dt = NcFactory.create_nc(DecisionTreeClassifier(min_samples_leaf=5))
nc_rf = NcFactory.create_nc(RandomForestClassifier(n_estimators=500))
nc_knn = NcFactory.create_nc(KNeighborsClassifier(n_neighbors=11))

In [24]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor

nc_dt = NcFactory.create_nc(DecisionTreeRegressor(min_samples_leaf=5))
nc_rf = NcFactory.create_nc(RandomForestRegressor(n_estimators=500))
nc_knn = NcFactory.create_nc(KNeighborsRegressor(n_neighbors=11))

Alternatively, you can construct your nonconformity functions manually in the following manner:

In [26]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from nonconformist.nc import ClassifierNc
from nonconformist.base import ClassifierAdapter

nc_dt = ClassifierNc(ClassifierAdapter(DecisionTreeClassifier(min_samples_leaf=5)))
nc_rf = ClassifierNc(ClassifierAdapter(RandomForestClassifier(n_estimators=500)))
nc_knn = ClassifierNc(ClassifierAdapter(KNeighborsClassifier(n_neighbors=11)))

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from nonconformist.nc import RegressorNc
from nonconformist.base import RegressorAdapter

nc_dt = RegressorNc(RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5)))
nc_rf = RegressorNc(RegressorAdapter(RandomForestRegressor(n_estimators=500)))
nc_knn = RegressorNc(RegressorAdapter(RegressorNc(n_neighbors=11)))

### Example 2b: Choosing your error function

In [41]:
from sklearn.neighbors import KNeighborsClassifier
from nonconformist.nc import NcFactory, InverseProbabilityErrFunc, MarginErrFunc

nc_proba = NcFactory.create_nc(KNeighborsClassifier(n_neighbors=11), InverseProbabilityErrFunc())
nc_margin = NcFactory.create_nc(KNeighborsClassifier(n_neighbors=11), MarginErrFunc())

In [42]:
from sklearn.neighbors import KNeighborsRegressor
from nonconformist.nc import NcFactory, AbsErrorErrFunc, SignErrorErrFunc

nc_abs = NcFactory.create_nc(KNeighborsRegressor(n_neighbors=11), AbsErrorErrFunc())
nc_sign = NcFactory.create_nc(KNeighborsRegressor(n_neighbors=11), SignErrorErrFunc())

Again, you can construct these manually without leveraging NcFactory as such:

In [43]:
from sklearn.neighbors import KNeighborsClassifier
from nonconformist.nc import ClassifierNc, InverseProbabilityErrFunc, MarginErrFunc
from nonconformist.base import ClassifierAdapter

model = ClassifierAdapter(KNeighborsClassifier(n_neighbors=11))

nc_proba = ClassifierNc(model, InverseProbabilityErrFunc())
nc_margin = ClassifierNc(model, MarginErrFunc())

In [45]:
from sklearn.neighbors import KNeighborsRegressor
from nonconformist.nc import RegressorNc, AbsErrorErrFunc, SignErrorErrFunc
from nonconformist.base import RegressorAdapter

model = RegressorAdapter(KNeighborsRegressor(n_neighbors=11))

nc_abs = RegressorNc(model, AbsErrorErrFunc())
nc_sign = RegressorNc(model, SignErrorErrFunc())