# Saliency Maps

In this notebook we will use the [Cats vs Dogs](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs) to generate saliency.

A saliency map highlights the pixels that significantly impact the classification of an image.
- This is done by calculating the gradient of the loss with respect to changes in the pixel values and then plotting the results.
- By analyzing the saliency map, we can see if the model is focusing on the correct features when classifying an image.
  - For example, if we're building a dog breed classifier, the saliency map should show strong pixels on the dog itself rather than irrelevant features like the sky, grass, or dog house.


### Download test files and weights

Let's begin by first downloading files we will be using for this lab.

In [None]:
# Download the same test files from the Cats vs Dogs ungraded lab
!wget -O cat1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
!wget -O cat2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
!wget -O catanddog.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/catanddog.jpeg
!wget -O dog1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog1.jpeg
!wget -O dog2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog2.jpeg

# Download prepared weights
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kipXTxesGJKGY1B8uSPRvxROgOH90fih' -O 0_epochs.h5
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1oiV6tjy5k7h9OHGTQaf0Ohn3FmF-uOs1' -O 15_epochs.h5

--2024-06-12 00:52:23--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.202.207, 74.125.69.207, 64.233.181.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.202.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 414826 (405K) [image/jpeg]
Saving to: ‘cat1.jpg’


2024-06-12 00:52:23 (92.4 MB/s) - ‘cat1.jpg’ saved [414826/414826]

--2024-06-12 00:52:23--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.202.207, 74.125.69.207, 64.233.181.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.202.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 599639 (586K) [image/jpeg]
Saving to: ‘cat2.jpg’


2024-06-12 00:52:23 (88.3 MB/s) - ‘cat2.jpg’ saved [599639/599639]



### Import the required packages

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import cv2

import keras
from keras.models import Sequential, Model
from keras.layers import Dense,Conv2D,Flatten,MaxPooling2D,GlobalAveragePooling2D
from keras.utils import plot_model

### Download and prepare the dataset.



#### Load Cats vs Dogs

* Load the `cats_vs_dogs` dataset using Tensorflow Datasets.
  * Use the first 80% of the *train* split of the said dataset to create training set.
  * From the remaining 20%, use 10% to create validation set.
  * Use the last 10% to create the test set.
  * Set the `as_supervised` flag to create `(image, label)` pairs.
    


In [None]:
# Load the data and create the train, validation and test sets
train_data = tfds.load('cats_vs_dogs', split='train[:80%]', as_supervised=True)
validation_data = tfds.load('cats_vs_dogs', split='train[80%:90%]', as_supervised=True)
test_data = tfds.load('cats_vs_dogs', split='train[-10%:]', as_supervised=True)

Downloading and preparing dataset 786.67 MiB (download: 786.67 MiB, generated: 1.04 GiB, total: 1.81 GiB) to /root/tensorflow_datasets/cats_vs_dogs/4.0.1...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/23262 [00:00<?, ? examples/s]



Shuffling /root/tensorflow_datasets/cats_vs_dogs/incomplete.PFOXGU_4.0.1/cats_vs_dogs-train.tfrecord*...:   0%…

Dataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.1. Subsequent calls will reuse this data.


#### Create preprocessing function

Here's the function that will preprocess the image by casting it to float32, normalizing the pixel values to the range [0, 1], and resizing the image to 300 x 300

In [None]:
def augmentimages(image, label):
  image = tf.cast(image, tf.float32)
  image = image/255
  image = tf.image.resize(image, (300,300))
  return image, label

#### Preprocess the training set

To preprocess the training set using the `map()` method and the preprocess_image function.

In [None]:
augmented_training_data = train_data.map(augmentimages)

#### Create batches of the training set.

We will create the `train_batches` and also shuffle the data. Shuffling the training set is an essential step in training machine learning models. Here are the reasons why we want to shuffle the training set:

1. **Prevent Overfitting to Sequence**: If the training data is not shuffled, the model might learn the sequence of the data rather than the actual patterns. For example, if all the images of cats are followed by all the images of dogs, the model might learn to recognize dogs as "not cats" rather than understanding the features that distinguish them.

2. **Ensure Generalization**: Shuffling ensures that each mini-batch of training data is representative of the overall dataset. This helps the model to generalize better to new, unseen data by learning from diverse samples in each training step.

3. **Reduce Bias**: By shuffling, we reduce the risk of introducing bias into the training process. If certain patterns or anomalies exist in the order of the data, shuffling helps to mitigate their impact.

4. **Stochastic Gradient Descent (SGD)**: When using SGD or its variants (e.g., mini-batch SGD), shuffling the data ensures that the updates to the model parameters are more varied and less correlated, leading to better convergence properties.

5. **Avoid Local Minima**: Shuffling helps in avoiding local minima during the training process by ensuring that the gradient updates are more dynamic and less predictable.

Overall, shuffling the training data improves the robustness and performance of the machine learning model.

In [None]:
train_batches = augmented_training_data.shuffle(1024).batch(32)

### Build the Cats vs Dogs classifier

We will define a simple CNN model with three convolutional blocks (conv2D -> maxpooling2D). This will be followed by another convolutional block that is slightly different (conv2D -> globalaveragepooling2D), and then a dense layer. One key aspect to note is that the last dense layer will have 2 neurons instead of 1, as we are working with one-hot encoded labels. This is done by setting the `units` argument of the output Dense layer to 2, with one neuron for each class (cats and dogs).

Furthermore, we will use a softmax activation function for our dense layer to output a probability for each of the 2 classes, where the sum of the probabilities adds up to 1.

In [None]:
model = Sequential(name='SimpleCNN')
model.add(Conv2D(16,input_shape=(300,300,3),kernel_size=(3,3),activation='relu',padding='same', name='conv2d_1'))
model.add(MaxPooling2D(pool_size=(2,2), name='max_pooling2d_1'))

model.add(Conv2D(32,kernel_size=(3,3),activation='relu',padding='same', name='conv2d_2'))
model.add(MaxPooling2D(pool_size=(2,2), name='max_pooling2d_2'))

model.add(Conv2D(64,kernel_size=(3,3),activation='relu',padding='same', name='conv2d_3'))
model.add(MaxPooling2D(pool_size=(2,2), name='max_pooling2d_3'))

model.add(Conv2D(128,kernel_size=(3,3),activation='relu',padding='same', name='conv2d_4'))
model.add(GlobalAveragePooling2D(name='global_average_pooling2d'))

model.add(Dense(2,activation='softmax', name='predictions'))

model.summary()

Model: "SimpleCNN"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_1 (Conv2D)           (None, 300, 300, 16)      448       
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 150, 150, 16)      0         
 g2D)                                                            
                                                                 
 conv2d_2 (Conv2D)           (None, 150, 150, 32)      4640      
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 75, 75, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_3 (Conv2D)           (None, 75, 75, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 37, 37, 64)        0 

### Saliency Map Function

The `do_salience()` function saves the **normalized_tensor** image.

In [None]:
def do_salience(image, model, label, prefix):
    ''' Generates the saliency map of a given image.

    Args:
        image (file) -- picture that the model will classify
        model (keras Model) -- the cats and dogs classifier
        label (int) -- ground truth label of the image
        prefix (string) -- prefix to add to the filename of the saliency map
    '''

    # Read the image and convert channel order from BGR to RGB
    img = tf.io.read_file(image)
    img = tf.image.decode_png(img, channels=3)

    # Resize the image to 300 x 300 and normalize pixel values to the range [0, 1]
    img, label = augmentimages(img, label)

    # Add an additional dimension (for the batch), and save this in a new variable
    inputs = tf.expand_dims(img, axis=0)

    # Declare the number of classes
    num_classes = 2

    # Define the expected output array by one-hot encoding the label. The length of the array is equal to the number of classes
    expected_output = tf.one_hot([label] * inputs.shape[0], num_classes)

    # Witin the GradientTape block:
    '''
    Cast the image as a tf.float32
    Use the tape to watch the float32 image
    Get the model's prediction by passing in the float32 image
    Compute an appropriate loss between the expected output and model predictions.
    '''
    with tf.GradientTape() as tape:
        tape.watch(inputs)
        predictions = model(inputs)
        loss = tf.keras.losses.categorical_crossentropy(expected_output,
                                                        predictions)

    # Get the gradients of the loss with respect to the model's input image
    gradients = tape.gradient(loss, inputs)

    # Generate the grayscale tensor\
    grayscale_tensor = tf.reduce_sum(tf.abs(gradients), axis=-1)

    # Normalize the pixel values to be in the range [0, 255]. The max value in the grayscale tensor will be pushed to 255 and the min value will be pushed to 0.
    # For this we will use the formula: 255 * (x - min) / (max - min) and use tf.reduce_max, tf.reduce_min
    # Finally, cast the tensor as a tf.uint8
    tensor_min = tf.reduce_min(grayscale_tensor)
    tensor_max = tf.reduce_max(grayscale_tensor)
    normalized_tensor = 255 * (grayscale_tensor - tensor_min) / (tensor_max - tensor_min)
    normalized_tensor = tf.cast(normalized_tensor, dtype=tf.uint8)

    # Remove dimensions that are size 1
    normalized_tensor = tf.squeeze(normalized_tensor)

    # Plot the normalized tensor by setting the figure size to 8 by 8, not displaying the axis and useing the 'gray' colormap
    plt.figure(figsize=(8, 8))
    plt.axis('off')
    plt.imshow(normalized_tensor, cmap='gray')
    plt.show()

    # Superimpose the saliency map with the original image, then display it to visualize the results better
    gradient_color = cv2.applyColorMap(normalized_tensor.numpy(), cv2.COLORMAP_HOT)
    gradient_color = tf.cast(gradient_color / 255.0, dtype=tf.float32)
    super_imposed = cv2.addWeighted(img.numpy(), 0.5, gradient_color.numpy(), 0.5, 0.0)
    plt.figure(figsize=(8, 8))
    plt.axis('off')
    plt.imshow(super_imposed)

    # Save the normalized tensor image to a file
    salient_image_name = prefix + image
    normalized_tensor = tf.expand_dims(normalized_tensor, -1)
    normalized_tensor = tf.io.encode_jpeg(normalized_tensor, quality=100, format='grayscale')
    writer = tf.io.write_file(salient_image_name, normalized_tensor)

### Generate saliency maps with untrained model

As a sanity check, we will load initialized (i.e. untrained) weights and use the function we just implemented.
- This will check if we built the model correctly and are able to create a saliency map.

We will apply our `do_salience()` function on the following image files:

* `cat1.jpg`
* `cat2.jpg`
* `catanddog.jpg`
* `dog1.jpg`
* `dog2.jpg`

Cats will have the label `0` while dogs will have the label `1`.
- For the catanddog, we will use `0`.
- For the prefix of the salience images that will be generated, we will use the prefix `epoch0_salient`.

In [None]:
# Load initial weights
model.load_weights('0_epochs.h5')

# Generate the saliency maps for the 5 test images
do_salience('cat1.jpg', model, 0, 'epoch0_salient')
do_salience('cat2.jpg', model, 0, 'epoch0_salient')
do_salience('catanddog.jpg', model, 0, 'epoch0_salient')
do_salience('dog1.jpg', model, 0, 'epoch0_salient')
do_salience('dog2.jpg', model, 0, 'epoch0_salient')

Output hidden; open in https://colab.research.google.com to view.

The untrained weights will generate an output that looks something similar to the following :
- We will see strong pixels outside the cat that the model uses that when classifying the image.
- After training that these will slowly start to localize to features inside the pet.

<img src='https://drive.google.com/uc?export=view&id=1h5wP52lwbBUMVLlsgyb-tQl_I9eu42X7' alt='saliency'>


### Configure the model for training

We will compile the model using `SparseCategoricalCrossentropy` as the loss, `accuracy` as the metric and `RMSProp` as optimizer with the default learning rate of `0.001`.

In [None]:
model.compile(loss='SparseCategoricalCrossentropy',metrics=['accuracy'],optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001))

### Train the model

Next, we will pass in the training batches and train our model for 5 epochs.

We have pre-loaded weights trained for 15 epochs to get better outputs when we visualize the saliency maps.

In [None]:
# Load pre-trained weights
model.load_weights('15_epochs.h5')

# Train the model for just 5 epochs
model.fit(train_batches, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7873c0557bb0>

### Generate saliency maps at 20 epochs

We will now use the `do_salience()` function again on the same test images, making sure to use the same parameters as before but this time, using the prefix `salient`.

In [None]:
do_salience('cat1.jpg', model, 0, "salient")
do_salience('cat2.jpg', model, 0, "salient")
do_salience('catanddog.jpg', model, 0, "salient")
do_salience('dog1.jpg', model, 1, "salient")
do_salience('dog2.jpg', model, 1, "salient")

Output hidden; open in https://colab.research.google.com to view.

We should now be able to see that the strong pixels are significantly reduced compared to the earlier ones. Moreover, most of them are now found on features within the pet.

### Saliency Maps at 95 epochs

We have pre-trained weights generated at 95 epochs and we can see the difference between the maps generated at 20 epochs.

In [None]:
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=14vFpBJsL_TNQeugX8vUTv8dYZxn__fQY' -O 95_epochs.h5

model.load_weights('95_epochs.h5')

do_salience('cat1.jpg', model, 0, "epoch95_salient")
do_salience('cat2.jpg', model, 0, "epoch95_salient")
do_salience('catanddog.jpg', model, 0, "epoch95_salient")
do_salience('dog1.jpg', model, 1, "epoch95_salient")
do_salience('dog2.jpg', model, 1, "epoch95_salient")

Output hidden; open in https://colab.research.google.com to view.