## DT8060
## Raffaello Baluyot

# Lab 5 - Neural Networks

In this lab we will use a pre-trained neural network that is designed for image classification. We will show you a number of different visualization options and allow you to play around with them yourself to gain a greater understanding.
Likewise, we will show you how LIME is used for images as well.

You are then tasked to use the techniques on another model to try and explain how it is working, and to investigate if there is any wrong doings in its predictions. Remember to also look into the dataset!

### Package import

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as c_map
import matplotlib
from IPython.display import Image, display
import tensorflow as tf
from tensorflow import keras
import os
import PIL
#import os
#os.environ["TF_USE_LEGACY_KERAS"] = "1"

from tensorflow.keras.applications.vgg16 import VGG16 as Model
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.applications.vgg16 import preprocess_input


from tensorflow.keras import backend as K
from tf_keras_vis.saliency import Saliency

### Intializations and loading data

The easiest approach to using the data for the lab, is to download the data folder (lab5_data) from the lab assignment page on blackboard. Reupload them to your root folder on your google drive. The cell below mounts your google drive to reach those uploaded files later. If you choose to do something else, you are free to do so but remember to update the file paths.

In [None]:
basepath = "../data/lab5_data"

## Neural Network Image Activation visualizations

In [None]:
model = Model(weights='imagenet', include_top=True)
model.summary()

# Image titles
image_titles = ['Goldfish', 'Bear', 'Triceratops']
GDRIVE_DIR = '/content/gdrive'
# Load images and Convert them to a Numpy array
img1 = load_img(os.path.join(basepath, 'goldfish.jpg'), target_size=(224, 224))
img2 = load_img(os.path.join(basepath, 'bear.jpg'), target_size=(224, 224))
img3 = load_img(os.path.join(basepath, 'triceratops.jpg'), target_size=(224, 224))
#img1 = load_img('goldfish.JPG', target_size=(224, 224))
#img2 = load_img('bear.jpg', target_size=(224, 224))
#img3 = load_img('triceratops.jpg', target_size=(224, 224))
images = np.asarray([np.array(img1), np.array(img2), np.array(img3)])

# Preparing input data for VGG16
X = preprocess_input(images)

# Rendering
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    ax[i].set_title(title, fontsize=16)
    ax[i].imshow(images[i])
    ax[i].axis('off')
plt.tight_layout()
plt.show()

Boilerplate code for the tf-Keras-vis package.

In [None]:
from tf_keras_vis.utils.model_modifiers import ReplaceToLinear
from tf_keras_vis.utils.scores import CategoricalScore

def score_function(output):
    # This version returns the result for the corresponding class from the
    # prediction of each image we are to show.
    # output[x][y] -> x = which image, y = which class in the imagenet labels
    return (output[0][1], output[1][294], output[2][51])

# Alters the softmax output to a linear output
replace2linear = ReplaceToLinear()

# The actual labels to each category that we investigate, i.e. the goldfish,
# bear, and triceratops label indices.
score = CategoricalScore([1, 294, 51])

Saliency map

In [None]:
from tensorflow.keras import backend as K
from tf_keras_vis.saliency import Saliency

# Create Saliency object.
saliency = Saliency(model,
                    model_modifier=replace2linear,
                    clone=True)

# Generate saliency map
saliency_map = saliency(score, X)

# Render
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    ax[i].set_title(title, fontsize=16)
    ax[i].imshow(saliency_map[i], cmap='jet')
    ax[i].axis('off')
plt.tight_layout()
plt.show()

Smooth Saliency map.

In [None]:
# Generate saliency map with smoothing that reduce noise by adding noise
saliency = Saliency(model,
                    model_modifier=replace2linear,
                    clone=True)

saliency_map = saliency(score,
                        X,
                        smooth_samples=20, # The number of calculating gradients iterations.
                        smooth_noise=0.20) # noise spread level.

# Render
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    ax[i].set_title(title, fontsize=14)
    ax[i].imshow(saliency_map[i], cmap='jet')
    ax[i].axis('off')
plt.tight_layout()
plt.show()

GRAD-Cam

In [None]:
from matplotlib import cm
from tf_keras_vis.gradcam import Gradcam

# Create Gradcam object
gradcam = Gradcam(model,
                  model_modifier=replace2linear,
                  clone=True)

# Generate heatmap with GradCAM
cam = gradcam(score,
              X,
              penultimate_layer=-1)

# Render
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
    ax[i].set_title(title, fontsize=16)
    ax[i].imshow(images[i])
    ax[i].imshow(heatmap, cmap='jet', alpha=0.5) # overlay
    ax[i].axis('off')
plt.tight_layout()
plt.show()

GRAD-CAM++

In [None]:
from tf_keras_vis.gradcam_plus_plus import GradcamPlusPlus

# Create GradCAM++ object
gradcam = GradcamPlusPlus(model,
                          model_modifier=replace2linear,
                          clone=True)

# Generate heatmap with GradCAM++
cam = gradcam(score,
              X,
              penultimate_layer=-1)

# Render
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
    ax[i].set_title(title, fontsize=16)
    ax[i].imshow(images[i])
    ax[i].imshow(heatmap, cmap='jet', alpha=0.5)
    ax[i].axis('off')
plt.tight_layout()
plt.show()

(Faster) ScoreCAM

In [None]:
# Fast ScoreCAM
from tf_keras_vis.scorecam import Scorecam

# Create ScoreCAM object
scorecam = Scorecam(model, model_modifier=replace2linear)

# Generate heatmap with Faster-ScoreCAM
cam = scorecam(score,
               X,
               penultimate_layer=-1,
               max_N=10)

# Render
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(12, 4))
for i, title in enumerate(image_titles):
    heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
    ax[i].set_title(title, fontsize=16)
    ax[i].imshow(images[i])
    ax[i].imshow(heatmap, cmap='jet', alpha=0.5)
    ax[i].axis('off')
plt.tight_layout()
plt.show()

These visualization techniques try to highlight which pixels provide the most importance to the prediction of the image.

What's you view of the techniques? Are they useful?

The visualization of each results does explain which parts does the model look more. It is a very intuitive explanation of what happens with the image. This can help when it comes to seeing if the predictor has a whole view of the image. For example, most techniques seems to show the whole fish as being considered while the bear and triceratops only covers certain parts depending on the technique

## Neural Network Layer Dissections/visualizations
We change the scoring from before to work in this context. The following score, when used, returns the value from the 3rd filter in the **block5_conv3** layer.

In [None]:
from tf_keras_vis.utils.model_modifiers import ExtractIntermediateLayer, ReplaceToLinear

layer_name = 'block5_conv3' # The target layer that is the last layer of VGG16.

# This instance constructs new model whose output is replaced to `block5_conv3` layer's output.
extract_intermediate_layer = ExtractIntermediateLayer(index_or_name=layer_name)
# This instance modify the model's last activation function to linear one.
replace2linear = ReplaceToLinear()

filter_number = 3
score = CategoricalScore(filter_number)

Create an activation maximum instance

In [None]:
from tf_keras_vis.activation_maximization import ActivationMaximization

activation_maximization = ActivationMaximization(model,
                                                 # Please note that `extract_intermediate_layer` has to come before `replace2linear`.
                                                 model_modifier=[extract_intermediate_layer, replace2linear],
                                                 clone=False)

Visualizes the 63rd convolutional layer. This takes a while to run, so if you're in a hurry you can skip it for now.

In [None]:
from tf_keras_vis.activation_maximization.callbacks import Progress

# Generate maximized activation
activations = activation_maximization(score,
                                      callbacks=[Progress()])

In [None]:
# Render the filter from above
f, ax = plt.subplots(figsize=(4, 4))
ax.imshow(tf.cast(activations[0], np.uint8))
ax.set_title('filter[{:03d}]'.format(filter_number), fontsize=16)
ax.axis('off')
plt.tight_layout()
plt.show()

These techniques instead tries to visualize the features each layer is seeing from an image. How would you describe the above? Is it useful for you to understand this filter? Can you understand it?

The results is very hard to understand. There's not a single pattern recognizable to the human eye. It is closer to an abstract art.

## LIME
The LIME library also works well on images, to see where the model puts its interest in the images it predicts upon.


In [None]:
import lime
from lime import lime_image

# creating the explainer object
explainer = lime_image.LimeImageExplainer()

# You can view each individual image we have looked at before by
# changing the index in the images[] variable below.
explanation = explainer.explain_instance(images[2], model.predict
                                         , top_labels=5,
                                         hide_color=0, num_samples=1000)



It is important that you run a large amount of samples, otherwise you could get misleading results. Try and lower the num_samples substantially above and see what happens.

You can visualize with both the image as the background

In [None]:
from skimage.segmentation import mark_boundaries
temp, mask = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=False)
plt.imshow(mark_boundaries(temp, mask))

Or just the mask itself.

In [None]:
temp, mask = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp, mask))

The result does not seem to match the image. The mask is showing the background rather than the foreground.

# Assignment
In your assignment, you are to use the given blackbox model on a dataset and explore if you can identify what the model has actually learned. The dataset is based on the Dogs and Wolves dataset from [Kaggle](https://www.kaggle.com/datasets/harishvutukuri/dogs-vs-wolves?resource=download). It has been significantly reduced and picked out images to only contain huskies and wolves. Some additional images have been added manually.

Investigate the model using **some** of the techniques shown from **tf-keras-vis** and using the **LIME** library and try to explain what the model have actually learned. Using tf-keras-vis and its visualizations might require you to improve your tensorflow/keras skills, here is their website for guidande and examples if needed [website](https://keisen.github.io/tf-keras-vis-docs/index.html)

Look into which images are correctly classified in the validation set and which aren't. Visualize which parts of the images that are of importance for the model.

What conclusions can you draw?

What does the model really look for in the images?

Show your steps, explain them, and reflect upon your results in cells below.

You will find an example of how you can use and visualize the Saliency Maps on this dataset and model. Adapt accordingly for the remainder of the techniques.


# Assignment

## Black box training
Training the model to be analyzed by the students.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPool2D, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import os
import sys
import matplotlib.pyplot as plt
import numpy as np

training_path = os.path.join(basepath, 'husky_wolves/training/')
validation_path = os.path.join(basepath, 'husky_wolves/validation/')

image_gen = ImageDataGenerator(rescale=1./255)
train_generator = image_gen.flow_from_directory(directory=training_path,
                                                target_size=(256,256),
                                                batch_size=20,
                                                shuffle=True,
                                                seed=101)

In [None]:
def get_model(input_shape):
    """
    This function should build and compile a CNN model according to the above specification,
    using the functional API. Your function should return the model.
    """
    input_layer = Input(input_shape)
    h = Conv2D(filters=8, kernel_size=(8,8), padding='SAME', activation='relu')(input_layer)
    h = MaxPool2D((2,2))(h)
    h = Conv2D(4, (4,4), padding='SAME', activation='relu')(h)
    h = MaxPool2D((2,2))(h)
    Flatten_layer = Flatten()(h)
    h = Dense(16, activation='relu')(Flatten_layer)
    output_layer = Dense(2, activation='softmax')(h)
    model = Model(inputs= input_layer, outputs = output_layer)
    opt = tf.keras.optimizers.Adam(learning_rate=0.0005)
    model.compile(optimizer=opt, loss = 'categorical_crossentropy', metrics = [tf.keras.metrics.CategoricalAccuracy(name='accuracy')])
    return model

model = get_model((256, 256, 3))
model.summary()

In [None]:
model.fit(train_generator, epochs=10)

## Loading validation data, for you to use.

In [None]:
images = np.asarray([load_img(os.path.join(validation_path, 'husky', 'husky_validation_{0}.jpg'.format(x)), target_size=(256,256)) for x in range(1,9)] + [load_img(os.path.join(validation_path, 'wolf', 'wolf_validation_{0}.jpg'.format(x)), target_size=(256,256)) for x in range(1,9)])
labels = np.array([0 for x in range(8)] + [1 for x in range(8)])
class_names = ['husky', 'wolf']
images = images/255

The result of the model is already pretty bad. It's no better than random chance. From here it must be noted that the expectation is that the model will probably produce random results.

In [None]:
count = 0
i = 0
predictions = model.predict(images)
for i in range(len(labels)):
  if np.argmax(predictions[i]) == labels[i]:
    count += 1
print('validation accuracy %.3f'%(count/len(labels)))

## Visualizing validation data

In [None]:
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1

  plt.imshow((images[i]*255).astype('uint8'))
  plt.title(class_names[labels[i]], color='green')
  plt.text(128,300, class_names[np.argmax(predictions[i])], color='red', fontsize='large', horizontalalignment='center')
  plt.text(128,350, 'validation[{0}]'.format(i), horizontalalignment='center')
  plt.axis('off')


## Visualizing saliency maps

In [None]:
# Generate saliency map with smoothing that reduce noise by adding noise

score = CategoricalScore([0]*8+[1]*8)

saliency = Saliency(model,
                    model_modifier=replace2linear,
                    clone=True)

saliency_map = saliency(score,
                        images)#,
                        #smooth_samples=20, # The number of calculating gradients iterations.
                        #smooth_noise=0.20) # noise spread level.

# Render
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1
    #ax[i].set_title(title, fontsize=14)
  plt.imshow(saliency_map[i], cmap='jet')
  plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
# Generate saliency map with smoothing that reduce noise by adding noise

score = CategoricalScore([0]*8+[1]*8)

saliency = Saliency(model,
                    model_modifier=replace2linear,
                    clone=True)

saliency_map = saliency(score,
                        images,
                        smooth_samples=20, # The number of calculating gradients iterations.
                        smooth_noise=0.20) # noise spread level.

# Render
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1
    #ax[i].set_title(title, fontsize=14)
  plt.imshow(saliency_map[i], cmap='jet')
  plt.axis('off')
plt.tight_layout()
plt.show()

As seen from the saliency maps, the network does not entirely focus on the actual difference between the two images. The focus looks to be randomly distributed across the image per each label.

In [None]:
# Create Gradcam object
gradcam = Gradcam(model,
                  model_modifier=replace2linear,
                  clone=True)

# Generate heatmap with GradCAM
cam = gradcam(score,
              images,
              penultimate_layer=-1)

# Render
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1
    #ax[i].set_title(title, fontsize=14)
  heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
  plt.imshow(images[i])
  plt.imshow(heatmap, cmap='jet', alpha=0.5)
  plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
# Create Gradcam object
gradcam = GradcamPlusPlus(model,
                          model_modifier=replace2linear,
                          clone=True)

# Generate heatmap with GradCAM
cam = gradcam(score,
              images,
              penultimate_layer=-1)

# Render
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1
    #ax[i].set_title(title, fontsize=14)
  heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
  plt.imshow(images[i])
  plt.imshow(heatmap, cmap='jet', alpha=0.5)
  plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
# Create ScoreCAM object
scorecam = Scorecam(model, model_modifier=replace2linear)

# Generate heatmap with Faster-ScoreCAM
cam = scorecam(score,
               images,
               penultimate_layer=-1,
               max_N=4)

# Render
j = 1
k = 10
plt.figure(figsize=(18,6))
for i in range(16):
  if labels[i] == 0:
    ax = plt.subplot(2,9,k)
    k += 1
  else:
    ax = plt.subplot(2,9,j)
    j+=1
    #ax[i].set_title(title, fontsize=14)
  heatmap = np.uint8(cm.jet(cam[i])[..., :3] * 255)
  plt.imshow(images[i])
  plt.imshow(heatmap, cmap='jet', alpha=0.5)
  plt.axis('off')
plt.tight_layout()
plt.show()

The CAM results are also very varied. Some do not even have results, while some do have results but they are mostly in the background. The model is not really good at differentiating the two labels.

In [None]:
# creating the explainer object
explainer = lime_image.LimeImageExplainer(verbose=False)

results = []
for i in range(len(images)):
    explanation = explainer.explain_instance(images[i], model.predict
                                            , top_labels=5,
                                            hide_color=0, num_samples=1000)

    temp, mask = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=False)
    results.append((temp, mask))

In [None]:
for temp, mask in results:
    plt.imshow(mark_boundaries(temp, mask))
    plt.show()

The LIME results are also the same, some do focus on the background and some focus on the foreground. That being said, given the results from the initial model (the demo), LIME seems to give uninuitive results regardless. So it is hard to draw conclusions.