<img width="150" alt="Logo_ER10" src="https://user-images.githubusercontent.com/3244249/151994514-b584b984-a148-4ade-80ee-0f88b0aefa45.png">

### Model Interpretation for Pretrained ImageNet Model using RISE

This notebook demonstrates how to apply the RISE explainability method on pretrained ImageNet model using a bee image. It visualizes the relevance scores for all pixels/super-pixels by displaying them on the image.<br>

[RISE](http://bmvc2018.org/contents/papers/1064.pdf) is short for Randomized Input Sampling for Explanation of Black-box Models. It estimates importance empirically by probing the model with randomly masked versions of the input image and obtaining the corresponding outputs.<br>


#### Requirments:

Install the required packages as:

`pip install python<3.11 dianna mexca[all] opencv-python mediapipe`

Download the `test_mediapipe.py` script from https://github.com/mexca/mexca/tree/dianna-demo-experiments/dianna-demo

#### Colab Setup

In [None]:
running_in_colab = 'google.colab' in str(get_ipython())
if running_in_colab:
    # install dianna
    !python3 -m pip install dianna[notebooks]

#### 0 -  Libraries

In [1]:
import warnings
warnings.filterwarnings('ignore') # disable warnings related to versions of tf
import numpy as np

# keras model and preprocessing tools
# from keras import backend as K

# dianna library for explanation
import dianna
from dianna import visualization

# for plotting
%matplotlib inline
from matplotlib import pyplot as plt

import torch

from mexca.video.extraction import MEFARG
import torchvision.transforms as transforms

# for face detection and cropping
import cv2
from test_mediapipe import FaceDetector

# for loading the AUs codes
import yaml

2024-11-22 17:13:24.724104: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-22 17:13:24.757239: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-22 17:13:24.757272: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-22 17:13:24.758165: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-22 17:13:24.764548: I tensorflow/core/platform/cpu_feature_guar

#### 1 - Loading the model and the dataset
Loads pretrained ImageNet model and the image to be explained.

Initialize the pretrained model.

In [2]:
class Model():
    def __init__(self, device = torch.device("cpu")):
        # K.set_learning_phase(0)
        self.model = MEFARG.from_pretrained(
            "mexca/mefarg-open-graph-au-resnet50-stage-2"
        )#.to(device)
        self.model.eval()
        self.input_size = (224, 224)
        self.transform = transforms.Compose(
            [
                transforms.ToPILImage(),
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(
                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                ),
            ]
        )
        self.device = device

    def run_on_batch(self, x):
        if len(x.shape) == 4:
            x_trans = torch.stack([self.transform(img) for img in x])
        elif len(x.shape) == 3:
            x_trans = self.transform(x)[None, :, :, :]
        with torch.no_grad():
            predictions = self.model(x_trans)
        return predictions.detach().squeeze()

In [3]:
model = Model()

##### 1.2 - Read an image and crop it using FaceDetector

In [4]:
path_to_photo = "/data/mexca_dianna_storage/demo_mexca.png"
frame = cv2.imread(path_to_photo)
detector = FaceDetector(confidence_threshold = 0.8, device = "cuda")
faces, detection_time, inference_time, was_processed = detector.process_frame(frame, 1)

x = faces[0]["crop"]
print(f"The photo of shape {frame.shape} is cropped to a photo of shape {x.shape}")

Using device: cuda


I0000 00:00:1732292007.302469 3171081 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1732292007.377442 3171177 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 535.183.01), renderer: NVIDIA A10/PCIe/SSE2
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1732292007.381526 3171174 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.


The photo of shape (1155, 1239, 3) is cropped to a photo of shape (397, 397, 3)


Run the model on the cropped photo

In [5]:
y = model.run_on_batch(x)

#### 2 - Compute and visualize the relevance scores
Compute the pixel relevance scores using RISE and visualize them on the input image. 

RISE masks random portions of the input image and passes the masked image through the model — the masked portion that decreases accuracy the most is the most “important” portion.<br>
To call the explainer and generate relevance scores map, the user need to specifiy the number of masks being randomly generated (`n_masks`), the resolution of features in masks (`feature_res`) and for each mask and each feature in the image, the probability of being kept unmasked (`p_keep`).

In [6]:
torch.cuda.is_available()

True

In [7]:
%%time
relevances = dianna.explain_image(model.run_on_batch, x, method="RISE",
                                labels=[i for i in range(41)],
                                n_masks=1000, feature_res=6, p_keep=.1,
                                axis_labels={2: 'channels'}, batch_size=10)

Explaining: 100%|██████████| 100/100 [01:45<00:00,  1.05s/it]

CPU times: user 12min 3s, sys: 5min 21s, total: 17min 25s
Wall time: 1min 52s





Make predictions and select the top prediction.


In [8]:
def class_name(idx):
    au_list = np.array(
        [1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 32, 38, 39]
    )
    with open('./AUs_codes.yaml') as f:
        au_codes = yaml.load(f, Loader=yaml.FullLoader)["facial_action_units"]
    return au_codes.get(au_list[idx])['facs_name']

Visualize the relevance scores for the predicted class on top of the input image.

In [9]:
predictions = model.run_on_batch(x).numpy()
prediction_ids = np.array([4, 9, 2, 5])
for idx in prediction_ids:
    print(f"prediction id {idx}: facs_name {class_name(idx)}")

prediction id 4: facs_name Cheek raiser
prediction id 9: facs_name Lip corner puller
prediction id 2: facs_name Brow lowerer
prediction id 5: facs_name Lid tightener


In [10]:
model.transform(x).numpy().shape

(3, 224, 224)

In [None]:
for class_idx in prediction_ids:
    print(f'Explanation for `{class_name(class_idx)}` ({predictions[class_idx]})')
    visualization.plot_image(relevances[class_idx], x/255.)
    plt.show()

#### 3 - Conclusions
The relevance scores are generated by passing multiple randomly masked inputs to the black-box model and averaging their scores. The idea behind this is that whenever a mask preserves important parts of the image it gets higher score. <br>

The example here shows that the RISE method evaluates the relevance of each pixel/super pixel to the classification. Pixels characterizing the bee are highlighted by the XAI approach, which gives an intuition on how the model classifies the image. The results are reasonable, based on the human visual preception of the image.

#### 4 - Repeat the experiment

In [12]:
import time

In [13]:
first_relevances = relevances

for i in range(5):
    start_time = time.time()
    relevances = dianna.explain_image(model.run_on_batch, x, method="RISE",
                                      labels=[i for i in range(41)],
                                      n_masks=1000, feature_res=6, p_keep=.1,
                                      axis_labels={2: 'channels'}, batch_size=10)
    
    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Iteration {i} took {elapsed_time:.3f} seconds")
    # compare the MAE of each iteration compared to the first one above
    for class_idx in prediction_ids:
        diff_relevances = np.mean(np.abs(relevances[class_idx] - first_relevances[class_idx]))
        print(f'Differences for {class_name(class_idx)} is {diff_relevances:.3f}')
    print("==========================================")

Explaining: 100%|██████████| 100/100 [01:45<00:00,  1.06s/it]


Iteration 0 took 112.438 seconds
Differences for Cheek raiser is 0.039
Differences for Lip corner puller is 0.035
Differences for Brow lowerer is 0.034
Differences for Lid tightener is 0.033


Explaining: 100%|██████████| 100/100 [01:45<00:00,  1.06s/it]


Iteration 1 took 112.555 seconds
Differences for Cheek raiser is 0.047
Differences for Lip corner puller is 0.042
Differences for Brow lowerer is 0.040
Differences for Lid tightener is 0.040


Explaining: 100%|██████████| 100/100 [01:44<00:00,  1.04s/it]


Iteration 2 took 111.210 seconds
Differences for Cheek raiser is 0.036
Differences for Lip corner puller is 0.033
Differences for Brow lowerer is 0.026
Differences for Lid tightener is 0.028


Explaining: 100%|██████████| 100/100 [01:45<00:00,  1.05s/it]


Iteration 3 took 112.270 seconds
Differences for Cheek raiser is 0.038
Differences for Lip corner puller is 0.034
Differences for Brow lowerer is 0.030
Differences for Lid tightener is 0.030


Explaining: 100%|██████████| 100/100 [01:45<00:00,  1.06s/it]

Iteration 4 took 113.037 seconds
Differences for Cheek raiser is 0.037
Differences for Lip corner puller is 0.034
Differences for Brow lowerer is 0.029
Differences for Lid tightener is 0.031



