# Overview

The project was part of Udacity’s Data Scientist nanodegree and is one of the most popular Udacity projects across machine learning and artificial intelligence nanodegree programs.

The aim of the project in the Data Scientist nanodegree is to create a web application that is able to identify a breed of dog if given a photo or image as input. If the photo or image contains a human face (or alien face), then the application will return the breed of dog that most resembles this person or alien (see Chewbacca in the photo above).

The strategy laid out for solving this problem, given in the notebook provided by Udacity, is as follows:

    Step 0: Import Datasets
    Step 1: Detect Humans
    Step 2: Detect Dogs
    Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
    Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
    Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
    Step 6: Write your Algorithm
    Step 7: Test Your Algorithm

In this project we have used Keras to build our Convolutional Neural Network (CNN) to make the dog predictions, though it would also be possible to use PyTorch, which we have used in a previous project in this Data Science nanodegree to classify flower species from a given image.

Ideally, we would like to create a CNN that can achieve results of over 90%, that is it correctly identifies the dog breed 9 times out of 10. Though the criteria set by Udacity was at least 60%. We will be using the accuracy metric on the testing dataset to measure the performance of our models.

To follow along with the steps you can download or clone the notebook from my repository on github here.

Step 0: Import Datasets

The datasets are provided by Udacity through the following links.

    dog images for training the models
    human faces for detector
    all other downloads to ensure smooth running of the notebook are available in the repository.

The first thing we do is load all the libraries and packages that have been used throughout the notebook.

In [2]:
# import libraries for notebook

Collecting tensorflow
  Downloading tensorflow-2.10.0-cp39-cp39-win_amd64.whl (455.9 MB)
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting wrapt>=1.11.0
  Downloading wrapt-1.14.1-cp39-cp39-win_amd64.whl (35 kB)
Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting h5py>=2.9.0
  Downloading h5py-3.7.0-cp39-cp39-win_amd64.whl (2.6 MB)
Collecting astunparse>=1.6.0
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting numpy>=1.20
  Downloading numpy-1.23.4-cp39-cp39-win_amd64.whl (14.7 MB)
Collecting keras-preprocessing>=1.1.1
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Downloading tensorflow_io_gcs_filesystem-0.27.0-cp39-cp39-win_amd64.whl (1.5 MB)
Collecting packaging
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
Collecting tensorboard<2.11,>=2.10
  Downloading tensorboard-2.10.1-py3-none-any.whl (

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 5.1.5 requires pyqt5<5.13, which is not installed.
spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.
daal4py 2021.5.0 requires daal==2021.4.0, which is not installed.
conda-repo-cli 1.0.4 requires pathlib, which is not installed.
anaconda-project 0.10.2 requires ruamel-yaml, which is not installed.
scipy 1.7.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.23.4 which is incompatible.
numba 0.55.1 requires numpy<1.22,>=1.18, but you have numpy 1.23.4 which is incompatible.
jupyter-server 1.13.5 requires pywinpty<2; os_name == "nt", but you have pywinpty 2.0.2 which is incompatible.
google-cloud-storage 1.31.0 requires google-auth<2.0dev,>=1.11.0, but you have google-auth 2.13.0 which is incompatible.
google-cloud-core 1.7.1 requires google-auth<2.0dev,>=1.24.0, but you have google-auth 

In [None]:
from sklearn.datasets import load_files       
import numpy as np
from glob import glob
from tqdm import tqdm

!pip install opencv-python
import cv2
from PIL import ImageFile
import matplotlib.pyplot as plt                        
%matplotlib inline 

!pip install keras
!pip install --ignore-installed --upgrade tensorflow
import keras
from keras.utils import np_utils
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential
from keras.callbacks import ModelCheckpoint
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions 
from keras.preprocessing import image                  


print(keras.__version__)

In [3]:
!pip install tqdm



In [None]:
# https://github.com/paulstancliffe/Dog-Breed-Classifier/blob/master/dog_app_v2.ipynb

As you can see, we have used extensively from keras for creating the CNN, we have also used sklearn for dataset loading, OpenCV and PIL for image work, matplotlib for viewing the images and numpy for processing tensors.

tqdm provides a smart progress meter so you can see how your for loops are progressing and glob is used to find all pathnames matching a specified pattern

In [None]:
# define function to load train, test, and validation datasets

In [None]:

def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets# load train, test, and validation datasets
train_files, train_targets = load_dataset('../../../data/dog_images/train')
valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')
test_files, test_targets = load_dataset('../../../data/dog_images/test')# load list of dog names
dog_names = [item[35:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]

After loading our libraries, we use the load dataset function from sklearn to import our datasets for our dog breed model training.

The dog_names variable stores a list of the names for the classes which we will use in our final prediction model. Depending on the path for where you have the images you may need to change the 35 in the item[35:-1] to a bigger or smaller number.

If everything has worked you will see 133 different dog breeds and ~8.3k dog images.

# Step 1: Detect Humans

In [None]:
# load filenames in shuffled human dataset
human_files = np.array(glob("../../../data/lfw/*/*"))

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors. The code below instantiates the Haar Cascade Classifier from OpenCV and is then used in the face detector function to determine if a supplied image contains a face or not.

In [None]:
# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

    Number of humans correctly identified: 100
    Number of dogs recognised has humans: 11

The results are not perfect but acceptable.

# Step 2: Detect Dogs

For the dog detector we have used the pretrained Resnet50 network. The weights used were the standard ones for the dataset imagenet.

In [None]:
# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

To use this model with our images, we need to process our images into the correct tensor size for the model. This is often one of the most challenging parts in the image classification process, the preprocessing of the images.

For keras, the images need to be in four dimensions, formatted like this (number of images, row size of image in pixels, column size of image in pixels, number of color channels).

In the path_to_tensor function we are processing a single image, so the output is (1,224,224,3), where 1 image, 224 pixels wide, 224 pixels high, and 3 colours red, green and blue. The image is loaded using the PIL library, and converted to the size 224x224. the img_to_array method separates the colors to (224x224x3) and finally we add a dimension at the front using the numpy expand_dims function to obtain our (1,224,224,3).

The function paths_to_tensor then stacks the images returned from path_to_tensor into a 4D tensor with the number of images from training, validation or test datasets depending on which img_path is called.

In [None]:
# Functions that process images before sending to CNNdef path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

One final preprocessing step is made by a built-in function in keras called preprocess_input. This function takes the output from image processing steps above and then reverses the colors to blue green red, which is the order keras expects and then normalizes the pixels based on standards for use with pretrained imagenet models. The exact numbers for this can be seen in the notebook.

Now we are ready to make predictions. The function shown below, after completing the preprocessing steps above, uses the predict function to obtain an array for imagenet’s 1000 classes. We then use numpy’s argmax function to isolate the class with the highest probability and use imagenet’s dictionary to identify the name of the class.

In [None]:
def ResNet50_predict_labels(img_path):
    # returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(ResNet50_model.predict(img))

You will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, and this explains the return on the dog_detector function below. If the prediction is within this range (151 to 268), return True.

In [None]:
### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151))

# Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

This is a very interesting exercise to do and although it is not used in the final web application it is very useful for understanding CNNs and how they work. We will be utilising transfer learning for the final image detector.

The full code is in the notebook and you can follow along experimenting and creating your own code from scratch.

The network I chose was close to that presented in the class which consisted of 3 convolutional layers with 3 max pooling layers to reduce the dimensionality and increase the depth. The filters used were 16, 32, 64 respectively.

I added a couple of fully connected layers with the final layer having 133 nodes to match our classes of dog breeds and a softmax activation function to obtain probabilities for each of the classes.

Dropouts were added to reduce the possibility of overfitting. The default settings with Adam were used as the optimizer for the loss function.

The target was to achieve a CNN with >1% accuracy. The network described above achieved 8.6% without any fine-tuning of parameters and without any augmentation on the data.

# Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)

In this section of the Jupyter notebook, we are walked through using one of the pretrained networks available for use with keras.

The most important concept to understand, especially if you have been used to using PyTorch for transfer learning is the use of bottleneck features.

Bottleneck features is the concept of taking a pre-trained model in our case here VGG16 and chopping off the top classifying layer, and then inputing this “chopped” VGG16 as the first layer into our model.

The bottleneck features are the last activation maps in the VGG16, (the fully-connected layers for classifying has been cut off) thus making it now an effective feature extractor.

I haven’t included any code here as we will follow the same process in step 5 with the Resnet50 pretrained model used for transfer learning. Check out the code in the Jupyter notebook.

# Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

Ok, so let’s build our model in keras.

I decided to use the Resnet50 model, as I have used it in the past both with PyTorch and Fast.ai and it has proved to be both accurate and relatively quick with training.

Udacity has prepared in advance the extraction of bottleneck features for each of the pretrained networks on the dog images. So it is only necessary to download the file for our network.

In [None]:
bottleneck_features = np.load('bottleneck_features/DogResnet50Data.npz')
train_resnet50 = bottleneck_features['train']
valid_resnet50 = bottleneck_features['valid']
test_resnet50 = bottleneck_features['test']

The above variables contain images that have already been put through the bottleneck extractor. This will make the training of our model very quick, as we are applying images where the main features for identification have already been isolated. This means we will have only a small number of parameters or weights to backpropogate through.

Our fully-connected layers will now just correlate the patterns for our 133 classes coming from the bottleneck features in order to train, validate and test.

The shape of train_resnet50 of (6680,1,1,2048) and we use this has the input shape into the first layer of our model, shown by the GlobalAveragePooling2D(input_shape=train_resnet50.shape[1:]), the index is [1:] as the number of images(6680) is not an input to the neural network.

The model architecture is shown below. I tried several different combinations with 1 and 2 fully connected layers with 512 and 1024 nodes, different rates of dropout(0.15,0.25 and 0.4) and different optimizers (SGD, Adam, rmsprop). All the different architectures gave me testing accuracies of between 81% and 85%.

The final layer is 133 nodes to match the number of our classes.

The architecture below has a testing accuracy of 84.8%.

In [None]:
# Define your architecture.resnet50_model = Sequential()
resnet50_model.add(GlobalAveragePooling2D(input_shape=train_resnet50.shape[1:]))
resnet50_model.add(Dense(1024, activation='relu'))
resnet50_model.add(Dropout(0.4))
resnet50_model.add(Dense(133, activation='softmax'))resnet50_model.summary()

The total trainable parameters or weights for this model is 2.2 million. We then compile the model, i.e select loss optimization function and loss measurement, SGD and categorical_crossentropy respectively, in our case and indicate what performance metric we would like to score the model with. We are going to use the accuracy score.

In [None]:
# Compile the model.
resnet50_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

Next, we train our specified model with the fit function, validating it for backpropagation updating of parameters. Set the number of epochs to run through all the training images and how many images to train at a time through the batch size and finally we use the callback parameter to save our model whilst training for the lowest validation loss.

In [None]:
# Train the model.
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.resnet50.hdf5', 
                               verbose=1, save_best_only=True)resnet50_model.fit(train_resnet50, train_targets, 
          validation_data=(valid_resnet50, valid_targets),
          epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)

And voila, we load back our best parameters and we have a trained model for classifying breeds of dog.

In [None]:
# Load the model weights with the best validation loss.
resnet50_model.load_weights('saved_models/weights.best.resnet50.hdf5')

We test this to check its accuracy.

In [None]:
# Calculate classification accuracy on the test dataset.
# get index of predicted dog breed for each image in test set
resnet50_predictions = [np.argmax(resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_resnet50]

In [None]:
# report test accuracy
test_accuracy = 100*np.sum(np.array(resnet50_predictions)==np.argmax(test_targets, axis=1))/len(resnet50_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 84.8086%

The next step is implementing the model into a function that can be used in our web application.

In [None]:
### Write a function that takes a path to an image as input
### and returns the dog breed that is predicted by the model.
from extract_bottleneck_features import *def resnet50_predict_breed(img_path):
    """
    INPUT: path to an image
    OUTPUT: returns a prediction of dog breed
    """
    # extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    
    # obtain predicted vector
    predicted_vector = resnet50_model.predict(bottleneck_feature)
    
    # get class with highest probability and match to label for class
    predicted_index = np.argmax(predicted_vector)
    label = dog_names[predicted_index]
    
    return label #predicted_index

We input an image path, the bottleneck features for our pretrained model are applied to the image, this is then processed through our trained fully-connected model to give a prediction vector.

To this we apply the numpy argmax function to extract the highest probability class/index and use our labels right from the beginning of the notebook to get the name of the dog breed.

# Step 6: Write your Algorithm

Here we are creating our algorithm to analyze any image. The algorithm accepts a file path and:

    if a dog is detected in the image, return the predicted breed.
    if a human is detected in the image, return the resembling dog breed.
    if neither is detected in the image, provide output that indicates an error.

The algorithm collects together functions we used previously to create a final output and shows the image.

In [None]:
def predictor(img_path):
    """
    This function takes in an image and returns a prediction on dog breed.
    INPUT: the path to the image to be classified
    OUTPUT: returns either dog breed, human dog breed or neither dog or human
    """
    # check if image is a dog
    dog = dog_detector(img_path)
    
    # check if image contains a human face
    human = face_detector(img_path)
    
    # make a prediction of dog_breed based on image
    dog_breed = resnet50_predict_breed(img_path)
    
    # plot image with comment
    img = cv2.imread(img_path)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.imshow(cv_rgb)
    plt.show()
    
    if dog:
        return("This photo looks like a {}".format(dog_breed))
    elif human:
        return("This human resembles a {}".format(dog_breed))
    else:
        return("This is neither dog beast or human!")

# Step 7: Test Your Algorithm

The results were pretty good for the images the model was shown. Only Chewbacca from Star Wars was identified as a dog and given a dog breed. All the other images were correctly identified, but a more robust testing of dog breeds would be required.

# Reflection

At the start of the article our objective was to create a CNN with 90% testing accuracy. Our final model obtained only 85% testing accuracy.

However, given more time, (I’m working against Udacity’s Project Deadlines!) I would have experimented with the ImageDataGenerator class in keras (link in the references below) to augment the images.

I would have done some hyperparameter fine tuning with the optimizer. For example SGD has the following parameters.
keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False).
Lowering the learning rate with a Learning Rate Scheduler, using weight decay, momentum and nesterov would all likely improve the accuracy of our model.

We could also have looked more deeply into the images themselves that we used for training the dog breeds. We could have looked at a confusion matrix to see which images were giving the biggest errors in the validation data in order to identify possible noise. Maybe some of these images were too blurred and the model was generalizing well but being deceived by unclear images.

We could check the training images with random sampling to see the quality of the images and delete images that were badly focused or with more than one breed of dog, ie reduce noise in the training data.

We could check to see if there were sufficient training image of each breed of dog and that the image classes were balanced overall in terms of training numbers.

Following the above areas I’m sure we could increase the testing accuracy of the model to above 90%.

To summarise possible improvements are:

    analysis of images used to train and validate the model
    data augmentation
    fine tune hyperparameters