This notebook has been modified from imagenet-smallest-sets.ipynb from conformal-prediction (https://github.com/aangelopoulos/conformal-prediction)

In [None]:
import os
import json
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread
!pip install -U --no-cache-dir gdown --pre

# Image Classification Example Using ImageNet

In this tutorial, we will apply conformal prediction to an image classification problem to obtain prediction sets of plausible labels using a pre-trained model. This tutorial is based on the following paper: https://arxiv.org/abs/1609.00451.

In [None]:
# Requires 1.31G space!!
# Download the data to the local hard drive and load
if not os.path.exists('../data'):
    os.system('gdown 1h7S6N_Rx7gdfO3ZunzErZy6H7620EbZK -O ../data.tar.gz')
    os.system('tar -xf ../data.tar.gz -C ../')
    os.system('rm ../data.tar.gz')
if not os.path.exists('../data/imagenet/human_readable_labels.json'):
    !wget -nv -O ../data/imagenet/human_readable_labels.json -L https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json

data = np.load('../data/imagenet/imagenet-resnet152.npz') # softmax scores-label pairs
example_paths = os.listdir('../data/imagenet/examples') # path to actual image files


In [None]:
smx = data['smx'] # softmax scores of images from a pre-trained model
labels = data['labels'].astype(int) # true labels

In [None]:
smx.shape, labels.shape

In [None]:
smx[0,]

In [None]:
smx[0,].min(), smx[0,].max(), np.sum(smx[0,])

Each row of $\texttt{smx}$ is the *softmax scores* of an image, which we can think of as estimated probabilities that the given image belongs to each of $K = 1000$ possible classes. Symbolically, $$\texttt{smx}[i,] = \hat{f}(\text{image}_i) \in [0,1]^{K}, \quad \texttt{smx}[i, j] \approx \mathbb{P}\{\text{image}_i \text{ has Label } j \text{ (according to $\hat{f}$)}\}.$$

For this example, we are going to use the nonconformity score $$s(x, y) = 1-\hat{f}(x)_y,$$ i.e., 1-the softmax score for the *true* class, which has the interpretation of the probability of the image $x$ *not* belonging to the true class *according* to the model $\hat{f}$.

Note that in contrast to the demo from the previous class, we are *not* actually training $\hat{f}$ from scratch. We are using $\hat{f}$ that someone else has already trained using some other data. Also, to compute this nonconformity score, we don't actually need to run $\hat{f}$ ourselves (which we can't, because we don't have access). It is enough to know the output of $\hat{f}$ for the calibration data and the test data.

In [None]:
alpha = 0.1 # 1-alpha is the desired coverage
m = 1000 # number of calibration points

In [None]:
# Split the data into a calibration set and a test set
idx = np.array([1] * m + [0] * (smx.shape[0]-m)) > 0 # creates a 50000-vector of 1000 T's followed by F's
np.random.shuffle(idx) # shuffle to randomly select calibration points
smx_cal, smx_te = smx[idx,:], smx[~idx,:]
labels_cal, labels_te = labels[idx], labels[~idx]

In [None]:
S_cal = 1-smx_cal[np.arange(m), labels_cal]
S_cal = np.sort(S_cal)
qhat = S_cal[int(np.ceil((1-alpha)*(m+1)))-1] # subtract 1, because Python indices start from 0 not 1

Now, applying the same idea as in the previous class, we have that for a test image $X'$ with label $Y'$, which is yet unknown, $$\mathbb{P}\left\{S' = 1-\hat{f}(X')_{Y'} \leq \hat{q}_{1-\alpha} \right\} \geq 1-\alpha.$$ Therefore, the set $$\hat{C}(x) = \left\{y: \hat{f}(x)_y \geq 1-\hat{q}_{1-\alpha}\right\}$$ must satisfy $$\mathbb{P}\left\{Y' \in \hat{C}(X')\right\} \geq 1-\alpha.$$ This is how we construct prediction sets in this case.

In [None]:
Chat = smx_te >= 1-qhat # this produces nte-by-K Boolean matrix with T if the label is in the prediction set and F else.

In [None]:
Chat[0,]
#np.sum(Chat[0,]), np.sum(~Chat[0,])

Now, let's check that this is doing what we expected, i.e., on average, the prediction sets contain the true label about 90% of the time.

In [None]:
# Check the empirical coverage
empirical_coverage = Chat[np.arange(Chat.shape[0]), labels_te].mean()
print(f"The empirical coverage is: {empirical_coverage}")

Now, let's take a look at some of the test images and see what conformal prediction gave us

In [None]:
with open('../data/imagenet/human_readable_labels.json') as f:
    label_strings = np.array(json.load(f))

example_paths = os.listdir('../data/imagenet/examples')
for i in range(10):
    rand_path = np.random.choice(example_paths) # pick an image at random
    img = imread('../data/imagenet/examples/' + rand_path )
    img_index = int(rand_path.split('.')[0])
    prediction_set = smx[img_index] > 1-qhat
    plt.figure()
    plt.imshow(img)
    plt.axis('off')
    plt.show()
    print("The prediction set is: %s" % ", ".join(map(str, list(label_strings[prediction_set]))))

Although this method is quite simple and leads to small prediction sets, there are rooms for improvement. The nonconformity score $$s(x, y) = 1-\hat{f}_y$$ is just one score among many. In particular, this score does *not* take the difficulty of each example. Later, we will see methods that are better able to account for heterogeneity across the feature space.