## Image classification


All of the examples before have been tabular data.
Now it's time to try a different type of data: image data.

So the general setup here is:

- Input is image data
- Output is a score, which can also be multi-dimensional as in multi-class classification

Image classification is a common task and it's commonly solved with deep learning.
Give an image to the model, get a class back, usually based on what's visible on the image.

We aren't going to train our own image classifier, but instead will load a ResNet model (TODO:CITE) which was trained on Imagenet data.

The ImageNet task is a large-scale image classification challenge that involves recognizing and categorizing objects within digital images. The challenge uses a dataset of over 1 million images, each of which belongs to one of 1000 different object categories. The task is to develop a machine learning model that can accurately classify each image into its correct category.

The ImageNet challenge has been an important driver of progress in the field of computer vision and deep learning, and has led to the development of new and more accurate machine learning models. The challenge has also spurred research into related computer vision tasks such as object detection and image segmentation.


In [1]:
import json
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
import shap

In [2]:
# load pre-trained model and data
model = ResNet50(weights='imagenet')
X, y = shap.datasets.imagenet50()

So these are 50 images

We also need the class names for them:


In [3]:
import json
import os
import urllib.request

# Path to the JSON file on disk (change this to the desired location)
json_file_path = 'imagenet_class_index.json'

# Check if the JSON file exists on disk
if os.path.exists(json_file_path):
    with open(json_file_path) as file:
        class_names = [v[1] for v in json.load(file).values()]
else:
    url = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
    with urllib.request.urlopen(url) as response:
        json_data = response.read().decode()
    with open(json_file_path, 'w') as file:
        file.write(json_data)
    class_names = [v[1] for v in json.loads(json_data).values()]

Let's see if the model works by trying to classify this corgi:

![This is a cute Corgi](images/corgi.jpg)

Look how happy this one is.
Pure joy.
Not aware of the danger on the tracks behind this true friend.


In [30]:
image = tf.keras.preprocessing.image.load_img('corgi.jpg', target_size=(224, 224))
image = tf.keras.preprocessing.image.img_to_array(image)

In [31]:
image.shape

(224, 224, 3)

In [32]:
first_element = np.expand_dims(image, axis=0)

In [33]:
first_element.shape

(1, 224, 224, 3)

In [21]:
image.shape

(224, 224, 3)

In [18]:
img = tf.expand_dims(image, axis=0)


In [19]:
img

<tf.Tensor: shape=(1, 224, 224, 3), dtype=float32, numpy=
array([[[[ 63.,  87.,  29.],
         [ 61.,  86.,  44.],
         [ 71.,  99.,  59.],
         ...,
         [ 84., 113.,  59.],
         [ 75., 104.,  50.],
         [ 78., 106.,  57.]],

        [[ 61.,  83.,  44.],
         [ 64.,  89.,  59.],
         [ 78., 105.,  74.],
         ...,
         [ 76., 105.,  51.],
         [ 75., 104.,  50.],
         [ 74., 102.,  51.]],

        [[137., 171.,  59.],
         [ 55.,  84.,  18.],
         [ 69.,  92.,  63.],
         ...,
         [ 76., 105.,  51.],
         [ 74., 103.,  49.],
         [ 86., 114.,  63.]],

        ...,

        [[ 83.,  87.,  60.],
         [ 83.,  89.,  63.],
         [ 83.,  89.,  63.],
         ...,
         [ 48.,  43.,  47.],
         [ 55.,  51.,  52.],
         [ 55.,  50.,  47.]],

        [[ 77.,  81.,  56.],
         [ 77.,  81.,  58.],
         [ 76.,  81.,  59.],
         ...,
         [ 42.,  37.,  41.],
         [ 51.,  47.,  48.],
         

In [4]:
# Load and preprocess an example image
image = tf.keras.preprocessing.image.load_img('corgi.jpg', target_size=(224, 224))
image = tf.keras.preprocessing.image.img_to_array(image)
image = preprocess_input(image)
# Add an extra dimension to the image to match the expected input shape of the model
img = tf.expand_dims(image, axis=0)



print(model(img))

tf.Tensor(
[[5.74184789e-09 8.36561867e-06 5.56487443e-08 1.93993941e-08
  3.61792796e-09 1.44186325e-08 3.44323361e-08 1.75497098e-06
  3.74992305e-06 1.88506544e-08 4.73724911e-04 2.90214371e-06
  7.12317558e-07 1.69044517e-06 2.16493063e-08 1.90132050e-04
  1.29755577e-07 2.03338232e-06 1.22069736e-07 5.70589805e-07
  3.47577065e-06 1.29292104e-07 2.81221553e-08 7.90762556e-07
  1.35672437e-07 2.40378517e-06 1.80879681e-06 1.38869621e-06
  1.19909458e-07 2.88645879e-07 1.91073866e-07 1.91662548e-06
  4.30167233e-07 8.45033128e-08 4.41368790e-08 2.50221831e-07
  3.53940572e-06 3.26003942e-07 1.04034868e-07 7.90741126e-08
  2.41061730e-06 4.13807513e-08 8.35286642e-07 1.39342760e-07
  2.56132921e-07 3.20279781e-07 1.76816101e-07 8.36645953e-09
  1.14954979e-07 2.37871859e-06 2.20709957e-08 3.28356542e-09
  1.53820309e-07 1.70635971e-07 1.01006911e-07 4.36824088e-08
  8.52373091e-07 4.67787800e-08 1.05527349e-07 5.48735244e-08
  6.33528430e-07 5.50786581e-07 1.87940117e-07 7.99144715e-

In [7]:
img.shape

TensorShape([1, 224, 224, 3])

In [6]:
X[0].shape

(224, 224, 3)

Now we only have to combine the labels with the network outputs and return the most likely classes.


In [None]:
# wrap the model
def f(x):
    tmp = x.copy()
    preprocess_input(tmp)
    return model(tmp)



def get_top_classes(probs, class_names, num_classes=1):
    # Get the indices to sort the probabilities in descending order
    sorted_indices = np.argsort(probs.detach().numpy()[0])[::-1]
    # Get the top num_classes class names
    top_classes = [class_names[i] for i in sorted_indices[:num_classes]]
    return top_classes
get_top_classes(f(img), class_names, 5)

Seems like the network "thinks" this image shows a whippet.
But what's the explanation for this classification?

To answer this, we will use Shapley values and explain the image classification.

## SHAP for image classification

To create a shap explanation, we need three things:

- the prediction function
- the masker, which is also a function
- the class names

Then we are finally ready to estimate the Shapley values.


In [None]:
# Number of top classes for which to compute the SHAP explanations
topk = 5

# The masker blurs out parts of the image 
masker = shap.maskers.Image(
  "blur(128,128)", shape = X[0].shape
)

explainer = shap.Explainer(
  predict, masker, output_names=class_names
)

shap_values = explainer(
  img, max_evals=100,
  outputs=shap.Explanation.argsort.flip[:topk]
)

In [None]:
shap.image_plot(shap_values=shap_values.values,
                pixel_values=shap_values.data,
                labels=shap_values.output_names,
                true_labels=[class_names[output.argmax()]])

## Effect of Different Inpainting Methods

- TODO: make a list of the masks
- iterate through the lists
- show top 5 classes
- TODO: look for 5 papers


In [None]:
# define a masker that is used to mask out partitions of the input image. 
mask_names = ["inpaint_telea", "inpaint_ns", "blur(128, 128)", "blur(16, 16)"]

masks = [shap.maskers.Image(m, shape = img_tensor2[0].shape) for m in mask_names]

In [None]:
topk = 2

for mask in masks:
    # create an explainer with model and image masker 
    explainer = shap.Explainer(predict, mask, output_names=class_names)
    # here we explain two images using 500 evaluations of the underlying model to estimate the SHAP values
    shap_values = explainer(img_tensor2, max_evals=100, batch_size=50, outputs=shap.Explanation.argsort.flip[:topk])
    shap_values.data = inv_transform(shap_values.data).cpu().numpy()[0]
    shap_values.values = [val for val in np.moveaxis(shap_values.values[0],-1, 0)]
    shap.image_plot(shap_values=shap_values.values,
            pixel_values=shap_values.data,
            labels=shap_values.output_names,
            true_labels=[class_names[output.argmax()]])

We have two options of what we want to see as input features:

- individual pixels
- larger collections of pixels

The network gets as input the individual pixels, but that doesn't mean we have to use the same granularity for the explanations.

The second choice is about how we mask absent (sets of) pixels:

- We could replace them from background data
- Or we could replace them with some reference, which could be blurring them or replacing them with grey pixels (or some other "neutral" color)



# DeepExplainer

We can do all the same but on a pixel level. Depending on your application, this can make sense. But it's expensive.

It makes sense if you need a really fine-grained explanation.

Instead of an "Explainer" object, we create a "DeepExplainer". This also means we need no masker, but we will again work with the background data.

But here's the thing: Which data to use.
Since I haven't trained the model myself, I have to think hard on what the background data is.
Usually it would be from the same distribution as my usual data. 

But that's super slow, I see no point in using it.


## The Correlation problem for images


We have talked about the correlation problem in the [correlation chapter](#correlation).
And there it was for the background data.
But does the same occur if we have image data?

There is a similar problem, but there it makes more sense to speak about extrapolation: Leaving the distribution of training data by creating new images.
And that's depending on the masker.
The masker creates new images, which might be far removed from the input data.

TODO: check out research on maskers and their effect on explanation

### Problems and TODOs:
    
- When changing the masker, the topk classes change, but they shouldn't?
- 
