# Cats vs Dogs

### 1. Introduction & Objectives

In this project, we will build a Convolutional Neural Network (CNN) to classify images of cats and dogs. The goal is to develop a model that can accurately differentiate between these two types of animals. The dataset we will be using contains 25,000 images in total, with 12,500 images of cats and 12,500 images of dogs. We will preprocess and divide the images into training, validation, and test sets to ensure robust model performance.

We will use Keras, with Tensorflow as the backend, to implement and train the CNN. After training, we will evaluate the model’s performance using separate test data to measure its ability to generalize to unseen images.

### 2. Data Understanding

The dataset consists of 25,000 images, with an equal split between cats and dogs (12,500 images of each). To facilitate model training and evaluation, we will use the following subsets:

    Training set: 1,000 images of cats and 1,000 images of dogs.
    Validation set: 500 images of cats and 500 images of dogs.
    Test set: 1,000 images of cats and 1,000 images of dogs.



We will use a Python script to create these subsets from the original dataset, as shown below:

In [9]:
import os
import pathlib
import shutil

original_dir = pathlib.Path("Inputs/dogs-vs-cats/train")
new_base_dir = pathlib.Path("Inputs/cats_vs_dogs_small")


def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir / subset_name / category
        os.makedirs(dir)
        fnames = [f"{category}.{i}.jpg"
                  for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir / fname)


make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)

This ensures the dataset is structured for training, validation, and testing phases, with a balanced distribution of cat and dog images in each subset.

#### 2.1 Importing Required Libraries and Loading the Data

We will start by importing the libraries we will be using, and loading the data. We will also resize the images to 180x180px ad pack them into batches of 32.

In [10]:
import keras
from keras import layers
from keras.src.utils import image_dataset_from_directory
import tensorflow as tf

In [11]:
train_dataset = image_dataset_from_directory(new_base_dir / 'train', image_size=(180, 180), batch_size=32)
validation_dataset = image_dataset_from_directory(new_base_dir / 'validation', image_size=(180, 180), batch_size=32)
test_dataset = image_dataset_from_directory(new_base_dir / 'test', image_size=(180, 180), batch_size=32)

Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 2000 files belonging to 2 classes.


The data has been loaded, resized and packed into batches. We can now proceed to creating the model.

### Creating the model

We will build a CNN model using Keras to classify the images of cats and dogs. The model will consist of three convolutional layers, each followed by a max-pooling layer. We will also add a dropout layer to prevent overfitting. The final layer will be a dense layer with a single neuron and a sigmoid activation function to output the probability of the image being a dog.