# Ungraded Lab: Cats vs. Dogs Class Activation Maps

<a target="_blank" href="https://colab.research.google.com/github/LuisAngelMendozaVelasco/TensorFlow-Advanced_Techniques_Specialization/blob/master/Advanced_Computer_Vision_with_TensorFlow/Week4/Labs/C3_W4_Lab_2_CatsDogs_CAM.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png">Run in Google Colab</a>

You will again practice with CAMs in this lab and this time there will only be two classes: Cats and Dogs. You will be revisiting this exercise in this week's programming assignment so it's best if you become familiar with the steps discussed here, particularly in preprocessing the image and building the model.

## Imports

In [1]:
import tensorflow_datasets as tfds
import tensorflow as tf
from keras import Sequential, layers, Input, optimizers, Model
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
import cv2

## Download and Prepare the Dataset

We will use the [Cats vs Dogs](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs) dataset and we can load it via Tensorflow Datasets. The images are labeled 0 for cats and 1 for dogs.

In [2]:
train_data = tfds.load('cats_vs_dogs', split='train[:80%]', as_supervised=True)
validation_data = tfds.load('cats_vs_dogs', split='train[80%:90%]', as_supervised=True)
test_data = tfds.load('cats_vs_dogs', split='train[-10%:]', as_supervised=True)

Downloading and preparing dataset 786.67 MiB (download: 786.67 MiB, generated: 1.04 GiB, total: 1.81 GiB) to /root/tensorflow_datasets/cats_vs_dogs/4.0.1...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/23262 [00:00<?, ? examples/s]



Shuffling /root/tensorflow_datasets/cats_vs_dogs/incomplete.GRDU21_4.0.1/cats_vs_dogs-train.tfrecord*...:   0%…

Dataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.1. Subsequent calls will reuse this data.


The cell below will preprocess the images and create batches before feeding it to our model.

In [3]:
def augment_images(image, label):
    # Cast to float
    image = tf.cast(image, tf.float32)
    # Normalize the pixel values
    image = (image / 255)
    # Resize to 300 x 300
    image = tf.image.resize(image, (300, 300))

    return image, label

# Use the utility function above to preprocess the images
augmented_training_data = train_data.map(augment_images)

# Shuffle and create batches before training
train_batches = augmented_training_data.shuffle(1024).batch(32)

## Build the classifier

This will look familiar to you because it is almost identical to the previous model we built. The key difference is the output is just one unit that is sigmoid activated. This is because we're only dealing with two classes.

In [4]:
model = Sequential()
model.add(Input(shape=(300, 300, 3)))
model.add(layers.Conv2D(16, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(1, activation='sigmoid'))

model.summary()

The loss can be adjusted from last time to deal with just two classes. For that, we pick `binary_crossentropy`.

In [5]:
# Training will take around 30 minutes to complete using a GPU. Time for a break!

model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=optimizers.RMSprop(learning_rate=0.001))
model.fit(train_batches, epochs=25)

Epoch 1/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 96ms/step - accuracy: 0.5464 - loss: 0.6810
Epoch 2/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 87ms/step - accuracy: 0.6197 - loss: 0.6461
Epoch 3/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 88ms/step - accuracy: 0.6437 - loss: 0.6258
Epoch 4/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 79ms/step - accuracy: 0.6683 - loss: 0.6051
Epoch 5/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m89s[0m 91ms/step - accuracy: 0.6784 - loss: 0.5989
Epoch 6/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 78ms/step - accuracy: 0.6919 - loss: 0.5920
Epoch 7/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 90ms/step - accuracy: 0.6976 - loss: 0.5783
Epoch 8/25
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 78ms/step - accuracy: 0.7042 - loss: 0.5712
Epoch 9/25
[1m582/582[

<keras.src.callbacks.history.History at 0x7edd17a96b00>

## Building the CAM model

You will follow the same steps as before in generating the class activation maps.

In [7]:
gap_weights = model.layers[-1].get_weights()[0]
gap_weights.shape

cam_model = Model(inputs=model.inputs, outputs=(model.layers[-3].output, model.layers[-1].output))
cam_model.summary()

In [14]:
def show_cam(image_value, features, results):
    '''
    Displays the class activation map of an image

    Args:
        image_value (tensor) -- preprocessed input image with size 300 x 300
        features (array) -- features of the image, shape (1, 37, 37, 128)
        results (array) -- output of the sigmoid layer
    '''

    # There is only one image in the batch so we index at `0`
    features_for_img = features[0]
    prediction = results[0]

    # There is only one unit in the output so we get the weights connected to it
    class_activation_weights = gap_weights[:, 0]

    # Upsample to the image size
    class_activation_features = ndimage.zoom(features_for_img, (300 / 37, 300 / 37, 1), order=2)

    # Compute the intensity of each feature in the CAM
    cam_output = np.dot(class_activation_features, class_activation_weights)

    # Visualize the results
    print(f'Sigmoid output: {results}')
    print(f"Prediction: {'dog' if round(results[0][0]) else 'cat'}")
    plt.figure(figsize=(8,8))
    plt.imshow(cam_output, cmap='jet', alpha=0.5)
    plt.imshow(tf.squeeze(image_value), alpha=0.5)
    plt.axis("off")
    plt.show()

## Testing the Model

Let's download a few images and see how the class activation maps look like.

In [9]:
!wget -O cat1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
!wget -O cat2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
!wget -O catanddog.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/catanddog.jpeg
!wget -O dog1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog1.jpeg
!wget -O dog2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog2.jpeg

--2024-09-11 17:55:17--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 64.233.170.207, 142.251.175.207, 74.125.24.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|64.233.170.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 414826 (405K) [image/jpeg]
Saving to: ‘cat1.jpg’


2024-09-11 17:55:19 (575 KB/s) - ‘cat1.jpg’ saved [414826/414826]

--2024-09-11 17:55:19--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.4.207, 172.253.118.207, 74.125.200.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.4.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 599639 (586K) [image/jpeg]
Saving to: ‘cat2.jpg’


2024-09-11 17:55:20 (717 KB/s) - ‘cat2.jpg’ saved [599639/599639]

-

In [15]:
# Utility function to preprocess an image and show the CAM
def convert_and_classify(image):
    # Load the image
    img = cv2.imread(image)

    # Preprocess the image before feeding it to the model
    img = cv2.resize(img, (300, 300)) / 255.0

    # Add a batch dimension because the model expects it
    tensor_image = np.expand_dims(img, axis=0)

    # Get the features and prediction
    features,results = cam_model.predict(tensor_image)

    # Generate the CAM
    show_cam(tensor_image, features, results)

convert_and_classify('cat1.jpg')
convert_and_classify('cat2.jpg')
convert_and_classify('catanddog.jpg')
convert_and_classify('dog1.jpg')
convert_and_classify('dog2.jpg')

Output hidden; open in https://colab.research.google.com to view.

Let's also try it with some of the test images before we make some observations.

In [16]:
# Preprocess the test images
augmented_test_data = test_data.map(augment_images)
test_batches = augmented_test_data.batch(1)

for img, lbl in test_batches.take(5):
    print(f"Ground truth: {'dog' if lbl else 'cat'}")
    features,results = cam_model.predict(img)
    show_cam(img, features, results)

Output hidden; open in https://colab.research.google.com to view.

If your training reached 80% accuracy, you may notice from the images above that the presence of eyes and nose play a big part in determining a dog, while whiskers and a colar mostly point to a cat. Some can be misclassified based on the presence or absence of these features. This tells us that the model is not yet performing optimally and we need to tweak our process (e.g. add more data, train longer, use a different model, etc).