# Dog vs Cat Image Detection
Learning how to train a neural network to be able to differentiate between images of dogs and cats.

Built with the help of this article: https://msalamiitd.medium.com/how-to-pass-image-datasets-to-cnn-models-using-image-data-generations-b2d9497c7a35



Dataset from Kaggle

## Importing necessary dependencies

In [1]:
!pip install split-folders



In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import splitfolders
import os

2025-01-10 16:48:16.480192: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Rule of thumb for splitting data:
- training and test => 80 to 20 split
- training and validation => 80 to 20 split over training data from the previous split

In [3]:
splitfolders.ratio('train', output='data', seed=1337, ratio=(.64, 0.16, 0.20)) 

Copying files: 25000 files [00:28, 873.44 files/s] 


In [4]:
print('train, dog: ' + str(len(os.listdir('data/train/dog'))))
print('train, cat: ' + str(len(os.listdir('data/train/cat'))))

train, dog: 8000
train, cat: 8000


In [5]:
print('val, dog: ' + str(len(os.listdir('data/val/dog'))))
print('val, cat: ' + str(len(os.listdir('data/val/cat'))))

val, dog: 2000
val, cat: 2000


In [6]:
print('test, dog: ' + str(len(os.listdir('data/test/dog'))))
print('test, cat: ' + str(len(os.listdir('data/test/cat'))))

test, dog: 2500
test, cat: 2500


In [7]:
print(8000/(25000/2))
print(2000/(25000/2))
print(2500/(25000/2))

0.64
0.16
0.2


## Transforming the data and creating the generators

In [8]:
# simply scaling original pixel values to be 0-1, for faster convergence during training
train_datagen = ImageDataGenerator(rescale=1./255)

val_datagen = ImageDataGenerator(rescale=1./255)

test_datagen = ImageDataGenerator(rescale=1./255)

In [9]:
# load images from directories and apply transformations specified by data generators
train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

val_generator = val_datagen.flow_from_directory(
    'data/val',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

test_generator = test_datagen.flow_from_directory(
    'data/test',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

Found 16000 images belonging to 2 classes.
Found 4000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.


## Training the Convolutional Neural Network

In [23]:
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

In [25]:
model = models.Sequential([
        layers.Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=[224, 224, 3]),
        layers.MaxPooling2D(pool_size=(2,2)), 
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(1, activation='sigmoid')
])

The above layers are necessary in order to preprocess the data and then make accurate predictions
- The Convolutional layer help with feature extraction
- The Max Pooling layer allows us to reduce the dimensionality of the data without losing critical information
- The Flatten layer is necessary to transform the output into a format that is suitable for the fully-connectdd dense layers
- The Dense layers map the extracted features into classifications, with the sigmoid activation function on the output layer giving us the probability of an image being of a dog vs a cat

In [26]:
model.summary()

In [27]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [30]:
model.fit(
    train_generator,
    epochs=10,
    validation_data=val_generator,
    verbose=1
)

Epoch 1/10
[1m274/500[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m1:58[0m 525ms/step - accuracy: 0.9906 - loss: 0.0423

KeyboardInterrupt: 

## Testing the model

In [31]:
test_loss, test_accuracy = model.evaluate(test_generator)
print(f'Test accuracy: {test_accuracy}')

[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 139ms/step - accuracy: 0.7090 - loss: 1.1450
Test accuracy: 0.7038000226020813


## Notes and Takeaways
- The ImageDataGenerator class makes passing image datasets to CNN models efficient
    - Augmenting the data allows the model to generalize better to unseen data and reduces the risk of overfitting
- flow_from_directory() allows us to read images directly from the directory and augment them
    - A specific hierarchal directory structure is required for the method to properly execute
- Additional augmentation is possible for the training data, but due to the large volume of images we are already using, this step was skipped
- The validation set is used during the training of the model to provide an unbiased evaluation of it's performance, thus fine-tuning it's parameters

This project was a good introduction to techniques used in computer vision and training neural networks on more complex data. The ImageDataGenerator class significantly eased the entire process.