# Conformal prediction for binary classification

Here we show how a binary classifier can be applied to produce one of four classifications:

- *Positive*: The instance reaches the threshold for the positive class, but not the negative class.

- *Negative*: The instance reaches the threshold for the negative class, but not the positive class.

- *Both*: The instance reaches the threshold for both positive and negative classes.

- *None*: The instance does not reach the threshold for either class.

Adjusting $\alpha$ (*coverage*) will adjust the balance of classes. A high $\alpha$ is more likely to classify na instance in both classes, whereas a low $\alpha$ is more likely to fail to classify in either class.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from mapie.classification import MapieClassifier
from mapie.metrics import classification_coverage_score
from mapie.metrics import classification_mean_width_score

## Create data

Example data will be produced using SK-Learn's `make_blobs` method.

In [2]:
n_classes = 2
# Make train and test data using sklearn's make_classification
X, y = make_classification(n_samples=10000, n_classes=2, n_features=5,
                           n_informative=5, n_redundant=0, n_clusters_per_class=1,
                           class_sep=0.7, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

## Train standard model

In [3]:
model = LogisticRegression()
model.fit(X_train, y_train)
predicted_proba = model.predict_proba(X_test)

## Train model and MAPIE classifier

MAPIE wraps around any model. It fits the model and gets the prediction sets.

The `score` method is the same as used above. This method may be used with k-fold MAPIE fitting which fits a model and then gives us prediction sets across the whole test set by using k-fold calibration/test. 

In [4]:
# Define classifier
classifier = LogisticRegression(random_state=42)
mapie_score = MapieClassifier(estimator=classifier, method='score', cv=5)
# Fit classifier and get predictions
mapie_score.fit(X_train, y_train)
y_pred, y_set = mapie_score.predict(X_test, alpha=0.05)
# Remove redundant dimension (used if more than one alpha is used)
y_set = np.squeeze(y_set)
# Show first 5 instances
print(y_set[0:5])

[[ True False]
 [ True False]
 [ True  True]
 [ True False]
 [ True False]]


## Analyse classifications

Full coverage

In [5]:
# Full coverage
cov = classification_coverage_score(y_test, y_set)
setsize = classification_mean_width_score(y_set)
print(f'Coverage: {cov:0.2f}')
print(f'Avg. set size: {setsize:.2f}')

Coverage: 0.95
Avg. set size: 1.10


Class-wise performance

In [6]:
def class_wise_performance(y_new, y_set, n_classes):
    df = pd.DataFrame()
    # Loop through the classes
    for i in range(n_classes):
    # Calculate the coverage and set size for the current class
        ynew = y_new[y_new == i]
        yscore = y_set[y_new == i]
        cov = classification_coverage_score(ynew, yscore)
        size = classification_mean_width_score(yscore)
        # Create a new dataframe with the calculated values
        temp_df = pd.DataFrame({
            "class": [i],
            "coverage": [cov],
            "avg. set size": [size]
            }, index = [i])
        # Concatenate the new dataframe with the existing one
        df = pd.concat([df, temp_df])
    df.set_index('class', inplace=True)
    return(df)

In [7]:
# Get class-wise performance
print('Class wise performance')
print(class_wise_performance(y_test, y_set, n_classes))

Class wise performance
       coverage  avg. set size
class                         
0      0.980648       1.113744
1      0.918152       1.089951


Count how many instances are in the following classes:

- *Positive*: The instance reaches the threshold for the positive class, but not the negative class.

- *Negative*: The instance reaches the threshold for the negative class, but not the positive class.

- *Both*: The instance reaches the threshold for both positive and negative classes.

- *None*: The instance does not reach the threshold for either class.

In [8]:
def count_classification(row):
    if (row[0] == False) & (row[1] == True):
        return 'Positive'    
    elif (row[0] == True) & (row[1] == False):
        return 'Negative'
    elif (row[0] == True) & (row[1] == True):
        return 'Both'
    else:
        return 'None'

# Map count_classification function to each row in y_set
results = pd.DataFrame(y_set)
results['observed'] = y_test
results['predicted probability'] = predicted_proba[:, 1]
results['class'] = results.apply(count_classification, axis=1)
# Show first 5 instances
print(results[0:5])

      0      1  observed  predicted probability     class
0  True  False         0               0.078708  Negative
1  True  False         0               0.030581  Negative
2  True   True         0               0.444927      Both
3  True  False         0               0.114512  Negative
4  True  False         0               0.097502  Negative


In [9]:
# Show the proportion of instances in each 'count'
results['class'].value_counts()/len(results)


Negative    0.4794
Positive    0.4186
Both        0.1020
Name: class, dtype: float64

Show breakdown by conformal prediction class

In [10]:
results.groupby('class').mean()

Unnamed: 0_level_0,0,1,observed,predicted probability
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Both,1.0,1.0,0.435294,0.481244
Negative,1.0,0.0,0.084272,0.089611
Positive,0.0,1.0,0.976589,0.956953


## Predictions conditioned on class

In [11]:
# Split off model training set
X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.5, random_state=42)
# Split rest into calibration and test
X_Cal, X_test, y_cal, y_test = train_test_split(X_rest, y_rest, test_size=0.5, random_state=42)

In [12]:
# Train model
classifier = LogisticRegression(random_state=42)
classifier.fit(X_train, y_train)

In [13]:
alpha = 0.05
thresholds = []
# Get predicted probabilities for calibration set
y_cal_prob = classifier.predict_proba(X_Cal)
# Get 95th percentile score
for class_label in [0, 1]:
    mask = y_cal == class_label
    y_cal_prob_class = y_cal_prob[mask][:, class_label]
    s_scores = 1 - y_cal_prob_class
    q = (1 - alpha) * 100
    class_size = mask.sum()
    correction = (class_size + 1) / class_size
    q *= correction
    threshold = np.percentile(s_scores, q)
    thresholds.append(threshold)

print(thresholds)

[0.5265440547581445, 0.7379156747101008]


In [14]:
predicted_proba = classifier.predict_proba(X_test)
si_scores = 1 - predicted_proba

class_0 = si_scores[:, 0] <= thresholds[0]
class_1 = si_scores[:, 1] <= thresholds[1]

In [15]:
prediction_sets = []
for i in range(len(class_0)):
    prediction_set = [class_0[i], class_1[i]]
    prediction_sets.append(prediction_set)
prediction_sets = np.array(prediction_sets)


In [16]:
# Get class-wise performance
print('Class wise performance')
print(class_wise_performance(y_test, prediction_sets, 2))

Class wise performance
       coverage  avg. set size
class                         
0      0.945643       1.135891
1      0.937550       1.059247


In [19]:
# Map count_classification function to each row in y_set
results = pd.DataFrame(prediction_sets)
results['observed'] = y_test
results['predicted probability'] = predicted_proba[:, 1]
results['class'] = results.apply(count_classification, axis=1)
# Show the proportion of instances in each 'count'
results['class'].value_counts()/len(results)

Positive    0.4660
Negative    0.4364
Both        0.0976
Name: class, dtype: float64

In [20]:
results.groupby('class').mean()

Unnamed: 0_level_0,0,1,observed,predicted probability
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Both,1.0,1.0,0.303279,0.375845
Negative,1.0,0.0,0.071494,0.083788
Positive,0.0,1.0,0.941631,0.920002
