# Binary Classification Model for Cat and Dog Classification

This is a binary classification model for the famous cat and dog classification problem in Kaggle. Different methods are used to improve the validation accuracy and avoid overfitting.Though the dataset is around 3000 images, it provides the necessary steps for a much larger, complex CNN. Tensorflow library will be used for preprocessing, training and validating the CNN. Different methods that will be applied are:

- **Data Augmentation**
- **Dropout Layers**
- **Transfer Learning**

Import the libraries to be used. I have downloaded and unzipped the dataset to the local machine. You can download the dataset and
use the zip functions to unzip and then use it

In [None]:
import os
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import tensorflow as tf

Copy the directory path to the base_dir and view the subdirectories for the training and validating datasets

In [None]:
base_dir = 'cats_and_dogs_filtered'

print('Contents of base directory')
print(os.listdir(base_dir))

print('Contents of train directory')
print(os.listdir(f'{base_dir}/train'))

print('Contents of validation directory')
print(os.listdir(f'{base_dir}/validation'))

Seperate the paths for the training and validating datasets for both classes

In [None]:
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

You can view the total size of the dataset and view samples of the images for both classes

In [None]:
train_cat_fnames = os.listdir(train_cats_dir)
train_dog_fnames = os.listdir(train_dogs_dir)

print('Total train cat images: ', len(os.listdir(train_cats_dir)))
print('Total train dog images: ', len(os.listdir(train_dogs_dir)))

print('Total validation cat images: ', len(os.listdir(validation_cats_dir)))
print('Total validation dog images: ', len(os.listdir(validation_dogs_dir)))

In [None]:
nrows = 4
ncols = 4

img_index = 0

fig = plt.gcf()
fig.set_size_inches(ncols*4, nrows*4)

img_index+=8

next_cat_img = [os.path.join(train_cats_dir,fname) for fname in train_cat_fnames[img_index-8:img_index]]

next_dog_img = [os.path.join(train_dogs_dir,fname) for fname in train_dog_fnames[img_index-8:img_index]]

for i, img_path in enumerate(next_cat_img+next_dog_img):
    
    sp = plt.subplot(nrows, ncols, i+1)
    sp.axis('Off')

    img = mpimg.imread(img_path)
    plt.imshow(img)

plt.show()

Create the function that creates the sequential model for the CNN

- **Relu** is used for the upper layers while sigmoid works best for the output layer of binary problems
- Size of the image is defined for the **input_shape** parameter
- Features, window sizes, and the number of layers are specified. It can be changed according to the project's specifications

In [None]:
def create_model():
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150,150,3)),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    
    return model

In [None]:
model = create_model()

model.summary()

Specify and compile the model for the optimizer, loss function, and the evaluation metrics according to the project. **Binary
crossentropy** works best for binary problems. The evaluation metrics will be the **accuracy**

In [None]:
from tensorflow.keras.optimizers import RMSprop

model.compile(optimizer=RMSprop(learning_rate=0.005),
              loss='binary_crossentropy',
              metrics=['accuracy'])

The **ImageDataGenerator** will be used to preprocess the dataset to be fed to the CNN. Normalized data works better for NN since it
eases the calculations. The range is usually [0,1] for the 8-bit images

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1.0/255.0)
test_datagen = ImageDataGenerator(rescale=1.0/255.0)

train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size=20,
                                                    class_mode='binary',
                                                    target_size=(150,150))

validation_generator = test_datagen.flow_from_directory(validation_dir,
                                                        batch_size=20,
                                                        class_mode='binary',
                                                        target_size=(150,150))

## Training the model

Train the model on the datasets preprocessed by the ImageDataGenerator. The number of epochs is specified and can be tuned as needed. The results are stored for visualization

In [None]:
EPOCHS = 15

history = model.fit(train_generator,
                    epochs=EPOCHS,
                    validation_data=validation_generator,
                    verbose=2
                   )

Visualization of the model's results

In [None]:
def plot_acc_loss(history):
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']

    epochs = range(len(acc))

    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    ax[0].plot(epochs, acc, 'bo', label='Training_Accuracy')
    ax[0].plot(epochs, val_acc, 'b', label='Validation_Accuracy')
    ax[0].set_title('Training and Validation Accuracy')
    ax[0].set_xlabel('epochs')
    ax[0].set_ylabel('accuracy')
    ax[0].legend()

    ax[1].plot(epochs, loss, 'bo', label='Training_Loss')
    ax[1].plot(epochs, val_loss, 'b', label='Validation_Loss')
    ax[1].set_title('Training and Validation Loss')
    ax[1].set_xlabel('epochs')
    ax[1].set_ylabel('loss')
    ax[1].legend()

    plt.show()

In [None]:
plot_acc_loss(history)

## **Data Augmentation**

**Data Augmentation** is used to introduce more features to the dataset that helps the model reduce overfitting. Data augmentation is not the reason to overcome overfitting, it is the diversity in the inputs. The variation in the validating curve is that the validation dataset is too sparse and the dataset may be poorly designed. This means that the data augmentation introduces randomness in the training but if the validation dataset lacks this, then this will result in the curve fluctuationing.
**Overcome: Broad set of images in both the training and validation datasets**

In [None]:
train_dataset = tf.keras.utils.image_dataset_from_directory(
    train_dir,
    image_size=(150,150),
    batch_size=20,
    label_mode='binary')

validation_dataset = tf.keras.utils.image_dataset_from_directory(
    validation_dir,
    image_size=(150,150),
    batch_size=20,
    label_mode='binary')

SHUFFLE_BUFFER_SIZE = 1000
PREFETCH_BUFFER_SIZE = tf.data.AUTOTUNE

train_dataset_final = (train_dataset
                       .cache()
                       .shuffle(SHUFFLE_BUFFER_SIZE)
                       .prefetch(buffer_size=PREFETCH_BUFFER_SIZE))

validation_dataset_final = (validation_dataset
                       .cache()
                       .shuffle(SHUFFLE_BUFFER_SIZE)
                       .prefetch(buffer_size=PREFETCH_BUFFER_SIZE))

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.Input(shape=(150,150,3)),
    tf.keras.layers.RandomFlip('horizontal'),
    # The amount mentioned is a fraction of pi that limits the rotation of the images
    # nearest argument tells the model how to fill any pixels that might have been lost by the operations (fill_mode='nearest')
    tf.keras.layers.RandomRotation(0.4),
    # Center of the image is translated by a fraction horizontally and vertically
    tf.keras.layers.RandomTranslation(0.2, 0.2),
    # Modify the contrast of the images
    tf.keras.layers.RandomContrast(0.4),
    # Random zooming through a ceratin factor
    tf.keras.layers.RandomZoom(0.2)
    ])

In [None]:
model_without_aug = create_model()

EPOCHS = 15

# Joining the original dataset without augmentation and the augmentation sequence model
model_with_aug = tf.keras.models.Sequential([
    data_augmentation,
    model_without_aug
])

model_with_aug.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=1e-4),
    metrics=['accuracy']
)

history = model_with_aug.fit(
    train_dataset_final,
    epochs=EPOCHS,
    validation_data=validation_dataset_final,
    verbose=2)

## **Transfer Learning**

As the data was small (3000 images is still insufficient), features trained in another model will be used instead as a fast and 
an efficent method to train our model, this is done using **transfer learning**. This methods works by utilizing the weights from 
another model that has been trained for an extensive amount of time and data and instead of retraining them on our data, the other model will extract the features from our data using the **convolutions** that they already learned. Then, you can take the model and use the convolutions that it learned when classifying its data. Then retrain the dense layers with your data

The model used is the inception model. It is already supported by Keras but the weights can be obtained from the following link:
https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

In [None]:
inception_weights = 'inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

pre_trained_model = tf.keras.applications.inception_v3.InceptionV3(
    input_shape = (150,150,3),
    # The model has a fully connected layer at the top. False is used to specify that you want to ignore this and get straight to
    # the convolutions
    include_top = False,
    weights = None)

pre_trained_model.load_weights(inception_weights)

In [None]:
# The instantiated model will have its layers locked to not be trained
for layer in pre_trained_model.layers:
    layer.trainable = False

In [None]:
pre_trained_model.summary()

Layers can be accessed using the function **get_layer()**. This helps determine which layer is suitable for the project as the model
can be copied and utilized for the desired project

In [None]:
last_layer = pre_trained_model.get_layer('mixed7')

last_output = last_layer.output

## Dropout Layers

Layers in a NN can sometimes end up having **similar** weights and possibly impact eachother leading to overfitting. For neighbours
to not affect each other too much and potentially remove overfitting, **dropout layers** are used. When validation is diverging away from the training overtime, an overcome is to try to use a dropout layer

In [None]:
# Define a new model taking the output from the inception models mix seven layer

x = tf.keras.layers.Flatten()(last_output)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(pre_trained_model.input,x)

model.summary()

Since each model has its requirements for the inputs, the inception model requires the input to be processed in a certain way to
work correctly

In [None]:
# It is required to preprocess the data according to what is mentioned in the documentation of the used model. This model requires
# the input to be scaled in the range [-1, 1] with its function preprocess_input()

def preprocess(image, label):
    image = tf.keras.applications.inception_v3.preprocess_input(image)
    return image, label

trained_dataset_scaled = train_dataset.map(preprocess)
validation_dataset_scaled = validation_dataset.map(preprocess)

In [None]:
train_dataset_final = (trained_dataset_scaled
                       .cache()
                       .shuffle(SHUFFLE_BUFFER_SIZE)
                       .prefetch(buffer_size=PREFETCH_BUFFER_SIZE))

validation_dataset_final = (validation_dataset_scaled
                       .cache()
                       .prefetch(buffer_size=PREFETCH_BUFFER_SIZE))

In [None]:
inputs = tf.keras.Input(shape=(150, 150, 3))
x = data_augmentation(inputs)
x = model(x)

# The input and the augmented models are joined
final_model = tf.keras.Model(inputs, x)

final_model.summary()

In [None]:
final_model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.0001),
    metrics=['accuracy'])

In [None]:
EPOCHS = 15

history = final_model.fit(
    train_dataset_final,
    validation_data=validation_dataset_final,
    epochs=EPOCHS,
    verbose=2)

In [None]:
plot_acc_loss(history)

The sync between Training and Validation result is a sign that overfitting is being avoided. Thus, leading to a more accurate model
for binary classification 

# Summary

The model initially created had the training and validation accuracy greatly varying, which led to overfitting. Several methods were applied to help reduce overfitting and to provide a good classifier. However, without transfer learning, the model needs a very large dataset and a large amount of epochs to provide accurate results for large scale projects