# Dog or Cat (Image classification)

## Contents

### [1. Introduction](#intro)

### [2. Data Preparation](#data)
   * **Import the required libraries**
   * **Download and unzip the dataset**
   * **Load the data**
   * **Split the data**

### [3. Model Architecture](#cnn)
   * **Define the model**
   * **Set hyperparameters**
   * **Set optimizer** 
   * **Compile model**
   * **Data Augmentation**
   * **Train model**

### [4. Model Evaluation](#eval)
   * **Training Accuracy vs Validation Accuracy**
   * **Training Loss vs Validation Loss**
   * **Model Accuracy**

### [5. Test the model](#test)
   * **Load the test image**
   * **Predict**

<a id="intro"></a>
### 1. Introduction

#### About the dataset
The dataset is comprised of photos of dogs and cats provided as a subset of photos from a much larger dataset of 3 million manually annotated photos. The dataset was developed as a partnership between Petfinder.com and Microsoft.
The dataset was originally used as a CAPTCHA (or Completely Automated Public Turing test to tell Computers and Humans Apart), that is, a task that it is believed a human finds trivial, but cannot be solved by a machine, used on websites to distinguish between human users and bots. Specifically, the task was referred to as “ASIRRA” or Animal Species Image Recognition for Restricting Access, a type of CAPTCHA.<br>
The dataset used for in this notebook comprises of 12500 images of cats and 12500 images of dogs, totaling to 25000 images.

#### Problem statement
Given a set of images of dogs and cats. The challenge is to investigate the available data and develop an algorithm to classify whether images contain either a dog or a cat and to develop a robust test harness for estimating the performance of the model, to explore improvements to the model. Finally, load the saved model and use it to make a prediction on a single image. A final model should typically fit on all available data, such as the combination of all train and test datasets.


<a id="data"></a>
### 2. Data Preparation

#### Import the required libraries

In [1]:
print("[INFO] Importing libraries ... ")
import os
import zipfile
import random
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from shutil import copyfile
print("[INFO] Import complete!")

[INFO] Importing libraries ... 
[INFO] Import complete!


#### Download and unzip the dataset

In [2]:
local_zip = os.getcwd()+'/cats-and-dogs.zip' # Path to the local zip file.
image_dir = os.getcwd()+'/images' # Make a directory to store the extracted images.

os.listdir(os.getcwd())
# Check if the directory exists, if not then download the zip file.
if os.path.exists(local_zip) == False:
    print("[INFO] Downloading dataset ... ")
    !wget --no-check-certificate \
        "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip" \
        -O "cats-and-dogs.zip"
    print("[INFO] Download complete!")

else: 
    print("[INFO] Dataset already downloaded!")
# Unzip.
print("[INFO] Unzipping dataset ... ")
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall(image_dir)
zip_ref.close()
print("[INFO] Unzipping complete!")

[INFO] Dataset already downloaded!
[INFO] Unzipping dataset ... 
[INFO] Unzipping complete!


In [3]:
# Verify download and extraction.
print(os.listdir(image_dir))
print(len(os.listdir(image_dir+'/PetImages/Cat')))
print(len(os.listdir(image_dir+'/PetImages/Dog')))
# The image_dir should contain the following files/directories: PetImages, readme[1].txt, MSR-LA - 3467.docx
# There should be 12501 images in each of the directories.

['PetImages', 'testing', 'readme[1].txt', 'MSR-LA - 3467.docx', 'training']
12501
12501


#### Load the data

In [4]:
# Make training and testing directories both for cat images and dog images.
try:
    os.mkdir(image_dir+'/training')
    os.mkdir(image_dir+'/testing')
    os.mkdir(image_dir+'/training/cats')
    os.mkdir(image_dir+'/training/dogs')
    os.mkdir(image_dir+'/testing/dogs')
    os.mkdir(image_dir+'/testing/cats')
except:
    pass

# Assign variables with path names to respective directories.
cat_source = image_dir+'/PetImages/Cat/'
cat_train = image_dir+'/training/cats'
cat_test = image_dir+'/testing/cats'
dog_source = image_dir+'/PetImages/Dog/'
dog_train = image_dir+'/training/dogs'
dog_test = image_dir+'/testing/dogs'

#### Split the data

In [5]:
# Function to split the dataset.
def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):
    files = []
    for filename in os.listdir(SOURCE):
        file = SOURCE + filename
        if os.path.getsize(file) > 0:
            files.append(filename)
        else:
            print("[INFO] "+filename + " is of zero length, so ignoring.")

    training_length = int(len(files) * SPLIT_SIZE)
    testing_length = int(len(files) - training_length)
    shuffled_set = random.sample(files, len(files)) # Randomly shuffle the test and training set.
    training_set = shuffled_set[0:training_length]
    testing_set = shuffled_set[-testing_length:]

    for filename in training_set:
        this_file = SOURCE + filename
        destination = TRAINING + filename
        copyfile(this_file, destination)

    for filename in testing_set:
        this_file = SOURCE + filename
        destination = TESTING + filename
        copyfile(this_file, destination)
        
split_size = 0.9 # 90% train set, 10% test set.
print("[INFO] Splitting dataset ...")
split_data(cat_source, cat_train, cat_test, split_size)
split_data(dog_source, dog_train, dog_test, split_size)
print("[INFO] Splitting complete!")

[INFO] Splitting dataset ...
[INFO] 666.jpg is of zero length, so ignoring.
[INFO] 11702.jpg is of zero length, so ignoring.
[INFO] Splitting complete!


<a id="cnn"></a>
### 3. Model Architecture

#### Define the model

#### Set hyperparameters

#### Set optimizer 

#### Compile model

#### Data Augmentation

#### Train model

<a id="eval"></a>
## 4. Model Evaluation

#### Training Accuracy vs Validation Accuracy

#### Training Loss vs Validation Loss

#### Model Accuracy

<a id="test"></a>
### 5. Test the model

#### Load the test image

#### Predict