In [1]:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras import layers

# Assignment 6: Cat and Dog recognition

In this assignment you will build a convolutional neural network (CNN) that is able to tell whether an image shows a cat or a dog. Furthermore, since we don’t have that many data to train our mode with, we will train another model that uses as a base the existing CNN model VGG16 that has frequently shown a very good performance in classifying images in datasets with hundreds or even thousands of possible classes. Lastly, you will evaluate the performance of the model and classify your own pet images. 

## Exercise 06.1: Training the CNN

What we want to do first is load all the images into arrays in RAM. The final goal should be a training set numpy array with a shape (23000,64, 64, 3) and a test set numpy array with a shape (2000, 64, 64, 3) containing all the images, where 23000 and 2000 are the total number of cat and dog images, 64x64 is the pixel size of each image and 3 is the number of color channels. You might not end up with a total of 25000 images because some images are corrupted. In any case, the test set should have 1000 cat images and 1000 dog images and the rest goes into the training set. Prepare these arrays, by resizing all of the images to 64x64 pixels and converting them into numpy arrays containing the RGB values of each pixel (convert the grey scale images accordingly). For this you might want to install and use the pillow library together with its Image functionalities such as resize and convert. Additionally, create two arrays with shapes (23000, 1) and (2000, 1) containing the corresponding labels (1 if the image shows a cat and 0 if it shows a dog).

Finally, normalize the RGB values that go from 0 to 255 to values between 0 and 1. Loading and preparing the arrays might take some time (if you want to monitor the progress, you can add progress bars with the tqdm library)!

Checking which images are corrupted:

In [2]:
for i in range(12500):
    try:
        Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Cat/{i}.jpg")
    except:
        print(f"Cat image {i} is corrupted")

for i in range(12500):
    try:
        Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Dog/{i}.jpg")
    except:
        print(f"Dog image {i} is corrupted")

Cat image 666 is corrupted




Dog image 11702 is corrupted


Loading the images, resizing them, converting their RGB values into numpy arrays and normalizing by 250 such that the RGB values lie between 0 and 1.

In [61]:
cats_training = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Cat/{n}.jpg").resize((64, 64)).convert("RGB")) for n in range(11500) if n != 666]) /250
cats_test = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Cat/{n}.jpg").resize((64, 64)).convert("RGB")) for n in range(11500,12500)]) / 250
dogs_training = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Dog/{n}.jpg").resize((64, 64)).convert("RGB")) for n in range(11499)]) / 250
dogs_test = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Dog/{n}.jpg").resize((64, 64)).convert("RGB")) for n in range(11499,12500) if n != 11702]) / 250
training_data = np.concatenate((cats_training, dogs_training), axis=0)
training_labels = np.concatenate((np.ones(11499), np.zeros(11499)))
test_data = np.concatenate((cats_test, dogs_test), axis=0)
test_labels = np.concatenate((np.ones(1000), np.zeros(1000)))
training_data = np.expand_dims(training_data, -1)
test_data = np.expand_dims(test_data, -1)



In [62]:
num_classes = 2
input_shape = (64, 64, 3)
training_labels = keras.utils.to_categorical(training_labels, num_classes)
test_labels = keras.utils.to_categorical(test_labels, num_classes)

In [63]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),        
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(num_classes, activation="sigmoid")
    ]
)

model.summary()


Training the model defined above for 100 epochs (1 epoch takes around 15ms)

In [64]:
batch_size = 64
epochs = 1

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(training_data, training_labels, batch_size=batch_size, epochs=epochs, validation_split=0.1)

[1m324/324[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 35ms/step - accuracy: 0.6092 - loss: 0.6501 - val_accuracy: 0.7617 - val_loss: 0.5079


<keras.src.callbacks.history.History at 0x31d4ee7e0>

In [48]:
score = model.evaluate(test_data, test_labels, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 1.633297085762024
Test accuracy: 0.8454999923706055


Test the model with pictures of Merlino the cat

In [47]:
merlino = np.expand_dims(np.array([Image.open("/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Merlino.jpeg").resize((64, 64)).convert("RGB")]), -1)
model.predict(merlino)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step


array([[0., 1.]], dtype=float32)

## Exercise 06.2: Transfer Learning

We can easily improve our accuracy by using parts of a pretrained neural network and training only a few layers before the output layer. This is called transfer learning. For our example we are going to use VGG16, which is a deep CNN used for image classification tasks.

Loading the VGG16 model

In [49]:
vgg16_model = keras.applications.vgg16.VGG16(include_top=False, input_shape=(224, 224, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


In [54]:
cats_training_pre = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Cat/{n}.jpg").resize((224, 224)).convert("RGB")) for n in range(11500) if n != 666]) 
cats_test_pre = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Cat/{n}.jpg").resize((224, 224)).convert("RGB")) for n in range(11500,12500)]) 
dogs_training_pre = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Dog/{n}.jpg").resize((224, 224)).convert("RGB")) for n in range(11499)]) 
dogs_test_pre = np.array([np.array(Image.open(f"/Users/jercheal/Documents/Physics/CPIV_ML/PetImages/Dog/{n}.jpg").resize((224, 224)).convert("RGB")) for n in range(11499,12500) if n != 11702]) 
training_data_pre = np.concatenate((cats_training, dogs_training), axis=0)
training_labels_pre = np.concatenate((np.ones(11499), np.zeros(11499)))
test_data_pre = np.concatenate((cats_test, dogs_test), axis=0)
test_labels_pre = np.concatenate((np.ones(1000), np.zeros(1000)))
#training_data = np.expand_dims(training_data, -1)
#test_data = np.expand_dims(test_data, -1)



Initialize the data for the VGG16 model 

In [58]:
training_data_vgg16 = keras.applications.vgg16.preprocess_input(np.copy(training_data_pre))
test_data_vgg16 = keras.applications.vgg16.preprocess_input(np.copy(test_data_pre))

Train the model using VGG16 (takes around 16mins)

In [59]:
training_data_after_vgg16 = vgg16_model.predict(training_data_vgg16, verbose=1)

[1m 23/719[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m15:44[0m 1s/step

KeyboardInterrupt: 