<a href="https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/master/C2/W2/ungraded_labs/C2_W2_Lab_2_horses_v_humans_augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!wget https://raw.githubusercontent.com/doantronghieu/DEEP-LEARNING/main/helper_DL.py
!pip install colorama
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size':15})
import seaborn           as sns
sns.set()
import helper_DL as helper

# Ungraded Lab: Data Augmentation on the Horses or Humans Dataset

In the previous lab, you saw how data augmentation helped improve the model's performance on unseen data. By tweaking the cat and dog training images, the model was able to learn features that are also representative of the validation data. However, applying data augmentation requires good understanding of your dataset. Simply transforming it randomly will not always yield good results. 

In the next cells, you will apply the same techniques to the `Horses or Humans` dataset and analyze the results.

In [None]:
# Download the training set
!wget https://storage.googleapis.com/tensorflow-1-public/course2/week3/horse-or-human.zip
# Download the validation set
!wget https://storage.googleapis.com/tensorflow-1-public/course2/week3/validation-horse-or-human.zip

In [None]:
import zipfile

# Unzip training set
local_zip = 'horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('./horse-or-human')


# Unzip validation set
local_zip = './validation-horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('./validation-horse-or-human')

zip_ref.close()

import os

# Directory with the training horse pictures
train_horse_dir = os.path.join('./horse-or-human/horses')

# Directory with the training human pictures
train_human_dir = os.path.join('./horse-or-human/humans')

# Directory with the validation horse pictures
validation_horse_dir = os.path.join('./validation-horse-or-human/horses')

# Directory with the validation human pictures
validation_human_dir = os.path.join('./validation-horse-or-human/humans')

print(train_horse_dir)
print(train_human_dir)
print(validation_horse_dir)
print(validation_human_dir)

In [None]:
import tensorflow as tf
import tensorflow.keras as tfk
from tensorflow.keras import layers, optimizers, models, losses
from tensorflow import nn

model = models.Sequential([
    # Input shape: Desired size of the image 300x300 with 3 bytes color

    # First convolution
    layers.Conv2D(16, (3, 3), activation=nn.relu, input_shape=(300, 300, 3)),
    layers.MaxPooling2D(2, 2),

    # Second convolution
    layers.Conv2D(32, (3, 3), activation=nn.relu),
    layers.MaxPooling2D(2, 2),

    # Third convolution
    layers.Conv2D(64, (3, 3), activation=nn.relu),
    layers.MaxPooling2D(2, 2),

    # Fourth convolution
    layers.Conv2D(64, (3, 3), activation=nn.relu),
    layers.MaxPooling2D(2, 2),

    # Fifth convolution
    layers.Conv2D(64, (3, 3), activation=nn.relu),
    layers.MaxPooling2D(2, 2),       

    # Flatten the results to feed into a DNN
    layers.Flatten(),
    layers.Dense(512, activation=nn.relu),  # 512 neuron hidden layer
    # Only 1 output neuron
    # It will contains a value from 0-1 where 0 for 1 class ('horses') and 1
    #  for the other ('humans')
    layers.Dense(1, activation=nn.sigmoid)                    
])

In [None]:
model.compile(loss=losses.binary_crossentropy,
              optimizer=optimizers.RMSprop(learning_rate=0.001),
              metrics=['accuracy'])

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Apply data augmentation
train_datagen = ImageDataGenerator(rescale = 1. / 255,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   fill_mode = 'nearest')

validation_datagen = ImageDataGenerator(rescale = 1. / 255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
    './horse-or-human/', # The source directory for training images
    target_size=(300, 300), # All images will be resized to 300x300
    batch_size=128,
    class_mode='binary') # Since we use binary_crossentropy loss, we need binary labels

# Flow validation images in batches of 128 using validation_datagen generator
validation_generator = validation_datagen.flow_from_directory(
    './validation-horse-or-human/', # The source directory for validation images
    target_size=(300, 300), # All images will be resized to 300x300
    batch_size=32,
    class_mode='binary')

In [None]:
# Constant for epochs
EPOCHS = 20

# Train the model
history = model.fit(train_generator,
                    steps_per_epoch = 8,
                    epochs = EPOCHS,
                    verbose = 1,
                    validation_data = validation_generator,
                    validation_steps = 8)

In [None]:
helper.plot_history_curves(history)

As you can see in the results, the preprocessing techniques used in augmenting the data did not help much in the results. The validation accuracy is fluctuating and not trending up like the training accuracy. This might be because the additional training data generated still do not represent the features in the validation data. For example, some human or horse poses in the validation set cannot be mimicked by the image processing techniques that `ImageDataGenerator` provides. It might also be that the background of the training images are also learned so the white background of the validation set is throwing the model off even with cropping. Try looking at the validation images in the `tmp/validation-horse-or-human` directory (note: if you are using Colab, you can use the file explorer on the left to explore the images) and see if you can augment the training images to match its characteristics. If this is not possible, then at this point you can consider other techniques and you will see that in next week's lessons.