<a href="https://colab.research.google.com/github/PaulToronto/AI-and-Machine-Learning-for-Coders---Book/blob/main/3_2_Building_a_CNN_to_Distinguish_Between_Horses_and_Humans.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building a CNN to Distinguish Between Horses and Humans

## Imports

In [1]:
import numpy as np

import urllib.request
import zipfile

import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import RMSprop

## The Dataset

- In this dataset, the location of a feature is not always in the same place
- Contains over 1000 $300 \times 300$ images, approximately half each of horses and humans, rendered in different poses
- The images have different lighting, different poses, different skin tones, different zoom levels, different backgrounds
    - The classifier will have to determinee which parts of the image are important features for the classification, without being affected by the background
- These images are all computer generated

### The Keras ImageDataGenerator

- Many image-based datasets don't have labels, the images are sorted into subdirectories of each type
- In Keras, a tool called `ImageDataGenerator` can use this directory structure to automatically assign labels to images

#### Code to get the training data and extract it into appropriate named subdirectories

In [2]:
training_url = 'https://storage.googleapis.com/learning-datasets/horse-or-human.zip'
training_dir = 'horse-or-human/training'
training_file_name = 'horses-or-human.zip'

In [3]:
urllib.request.urlretrieve(training_url, training_file_name)

('horses-or-human.zip', <http.client.HTTPMessage at 0x7ccf2eb8c2e0>)

In [4]:
zip_ref = zipfile.ZipFile(training_file_name, 'r')
zip_ref.extractall(training_dir)
zip_ref.close()

In [5]:
validation_url = 'https://storage.googleapis.com/learning-datasets/validation-horse-or-human.zip'
validation_dir = 'horse-or-human/validation'
validation_file_name = 'validation-horse-or-human.zip'

In [6]:
urllib.request.urlretrieve(validation_url, validation_file_name)

('validation-horse-or-human.zip', <http.client.HTTPMessage at 0x7cd043b71fc0>)

In [7]:
zip_ref = zipfile.ZipFile(validation_file_name, 'r')
zip_ref.extractall(validation_dir)
zip_ref.close()

#### `ImageDataGenerator`

In [8]:
# all images will be rescaled by 1./255
image_data_gen = ImageDataGenerator(rescale=1/255)
print(type(image_data_gen))

<class 'keras.src.preprocessing.image.ImageDataGenerator'>


In [9]:
training_generator = image_data_gen.flow_from_directory(
    training_dir,
    target_size=(300, 300),
    class_mode='binary'
)

validation_generator = image_data_gen.flow_from_directory(
    validation_dir,
    target_size=(300, 300),
    class_mode='binary'
)

Found 1027 images belonging to 2 classes.
Found 256 images belonging to 2 classes.


In [10]:
print(type(training_generator))
print(type(validation_generator))

<class 'keras.src.preprocessing.image.DirectoryIterator'>
<class 'keras.src.preprocessing.image.DirectoryIterator'>


In [11]:
training_generator.class_indices, validation_generator.class_indices

({'horses': 0, 'humans': 1}, {'horses': 0, 'humans': 1})

In [12]:
(np.unique(training_generator.classes, return_counts=True),
 np.unique(validation_generator.classes, return_counts=True))

((array([0, 1], dtype=int32), array([500, 527])),
 (array([0, 1], dtype=int32), array([128, 128])))

In [13]:
(len(training_generator.filenames),
 training_generator.filenames[:5],
 training_generator.filepaths[:5])

(1027,
 ['horses/horse01-0.png',
  'horses/horse01-1.png',
  'horses/horse01-2.png',
  'horses/horse01-3.png',
  'horses/horse01-4.png'],
 ['horse-or-human/training/horses/horse01-0.png',
  'horse-or-human/training/horses/horse01-1.png',
  'horse-or-human/training/horses/horse01-2.png',
  'horse-or-human/training/horses/horse01-3.png',
  'horse-or-human/training/horses/horse01-4.png'])

In [14]:
(len(validation_generator.filenames),
 validation_generator.filenames[:5],
 validation_generator.filepaths[:5])

(256,
 ['horses/horse1-000.png',
  'horses/horse1-105.png',
  'horses/horse1-122.png',
  'horses/horse1-127.png',
  'horses/horse1-170.png'],
 ['horse-or-human/validation/horses/horse1-000.png',
  'horse-or-human/validation/horses/horse1-105.png',
  'horse-or-human/validation/horses/horse1-122.png',
  'horse-or-human/validation/horses/horse1-127.png',
  'horse-or-human/validation/horses/horse1-170.png'])

## CNN Architecture for Horses or Humans

There are several differences between this dataset and the Fashion MNIST one that you have to take into account when designing the architecture for classifying images:

- images are much larger, $300 \times 300$, so more layers may be needed
- images are full colour, so each image has 3 channels instead of one
- only two image types, so we have a binary classifier that can be implemented with just a single output neuron

## The Model

In [15]:
model = tf.keras.models.Sequential([

    tf.keras.layers.Input(shape=(300, 300, 3)),

    # 3 * 3 * 3 * 16 * 16 = 448 param
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu'),

    # 0 param
    tf.keras.layers.MaxPooling2D(2, 2),

    # 3 * 3 * 32 * 16 + 32 = 4640 param
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),

    # 0 param
    tf.keras.layers.MaxPooling2D(2, 2),

    # 3 * 3 * 64 * 32 + 64 = 18496 params
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    # 0 param
    tf.keras.layers.MaxPooling2D(2, 2),

    # 3 * 3 * 64 * 64 + 64 = 36928 params
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    # 0 param
    tf.keras.layers.MaxPooling2D(2, 2),

    # 3 * 3 * 64 * 64 + 64 = 36928 params
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    # 0 param
    tf.keras.layers.MaxPooling2D(2, 2),

    # 0 param
    tf.keras.layers.Flatten(),

    # 512 * 3136 + 512 = 1606144 param
    tf.keras.layers.Dense(512, activation='relu'),

    # 1 * 512 + 1 = 513 param
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 298, 298, 16)      448       
                                                                 
 max_pooling2d (MaxPooling2  (None, 149, 149, 16)      0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 147, 147, 32)      4640      
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 73, 73, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_2 (Conv2D)           (None, 71, 71, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 35, 35, 64)        0

- Things to note about this code
 - First Layer:
   - we are definining 16 filters of size $3 \times 3$
   - the input shape is $(300, 300, 3)$
     - images have size $300 \times 300$ plus 3 colour channels
 - Output Layer:
   - a single nueron for a binary classifier that is activated by the `sigmoid` function
 - Stacked several convolutional layers
    - image source is quite large
    - we want, over time, to have many smaller images, each with features highlighted
    - after all the convolutional and pooling layers, it ends up with items of shape $7 \times 7$
      - these 49 pixels are ready to be flattened and passed to the dense neural network for matching with appropriate labels
  - We end up with more than 1.7 million parameters
   - slower to train

## Compilation

- **loss function**: `binary_crossentropy` since there are only two classes
- **optimizer**: *root mean square propagation,* `RMSProp`
 - it takes a learning rate parameter than allows us to tweak the learning

In [16]:
model.compile(
    loss='binary_crossentropy',
    optimizer=RMSprop(learning_rate=0.001),
    metrics=['accuracy']
)

## Train the Model

In [17]:
history = model.fit(training_generator, epochs=15)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


- After just 15 epochs we are getting a very impressive accuracy on the training set
    - This is not an indication of performance on data that the network hasn't previously seen

## Adding Validation to the Horses or Humans Dataset

In [18]:
history = model.fit(
    training_generator,
    epochs=15,
    validation_data=validation_generator
)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


- Still impressive accuracy on the training data
- Just over 83% on the validation set
    - Overfitting
- The performance isn't bad considering how few images it was trained on