# Cats vs. Dogs - CNN Image Classifier

This notebook is my solution to the image classification coding challenge from [FreeCodeCamp "Machine Learning with Python"](https://www.freecodecamp.org/learn/machine-learning-with-python) course.

Here is my attempt to solve this assignment using Tensorflow and Keras.

## The Assignment

_"For this challenge, you will complete the code to classify images of dogs and cats. You will use TensorFlow 2.0 and Keras to create a convolutional neural network that correctly classifies images of cats and dogs at least 63% of the time. (Extra credit if you get it to 70% accuracy!)_

_Some of the code is given to you but some code you must fill in to complete this challenge. Read the instruction in each text cell so you will know what you have to do in each code cell._

_The first code cell imports the required libraries. The second code cell downloads the data and sets key variables. The third cell is the first place you will write your own code._

_The structure of the dataset files that are downloaded looks like this (You will notice that the test directory has no subdirectories and the images are not labeled):_

```
cats_and_dogs
|__ train:
    |______ cats: [cat.0.jpg, cat.1.jpg ...]
    |______ dogs: [dog.0.jpg, dog.1.jpg ...]
|__ validation:
    |______ cats: [cat.2000.jpg, cat.2001.jpg ...]
    |______ dogs: [dog.2000.jpg, dog.2001.jpg ...]
|__ test: [1.jpg, 2.jpg ...]
```

_You can tweak epochs and batch size if you like, but it is not required."_

## Building the Dataframes

In this section, we download the provided files and build the dataframes for training and testing de model. The provided data is already split in train/validation/test folders and each one has subfolders for cats and dogs: the two categories we have to work with.

Since the course is a bit outdated, it suggests you work with Tensorflow's __ImageDataGenerator__. Since this method is deprecated and not recommended anymore, we will use Keras' __image_dataset_from_directory__ for loading the images in the form of _tf.data.Dataframe_.

In [1]:
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import os
import shutil
from zipfile import ZipFile

import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Download project files
!wget https://cdn.freecodecamp.org/project-data/cats-and-dogs/cats_and_dogs.zip

# Extract zip and clean unwanted files
with ZipFile("cats_and_dogs.zip", "r") as zObject:
    zObject.extractall()

# Remove the remaining files we don't need
os.remove("cats_and_dogs.zip")
shutil.rmtree("__MACOSX")
PATH = "cats_and_dogs"

# Save directories for training, validation and test folders.
# Train/validation split is already done in the folder structure.
train_dir = os.path.join(PATH, "train")
validation_dir = os.path.join(PATH, "validation")
test_dir = os.path.join(PATH, "test")

--2024-01-05 17:14:15--  https://cdn.freecodecamp.org/project-data/cats-and-dogs/cats_and_dogs.zip
Resolving cdn.freecodecamp.org (cdn.freecodecamp.org)... 172.67.70.149, 104.26.3.33, 104.26.2.33
Connecting to cdn.freecodecamp.org (cdn.freecodecamp.org)|172.67.70.149|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 70702765 (67M) [application/zip]
Saving to: 'cats_and_dogs.zip'

     0K .......... .......... .......... .......... ..........  0% 34,1M 2s
    50K .......... .......... .......... .......... ..........  0%  136M 1s
   100K .......... .......... .......... .......... ..........  0%  150M 1s
   150K .......... .......... .......... .......... ..........  0%  193M 1s
   200K .......... .......... .......... .......... ..........  0% 54,2M 1s
   250K .......... .......... .......... .......... ..........  0% 3,38M 4s
   300K .......... .......... .......... .......... ..........  0% 77,0M 4s
   350K .......... .......... .......... .......... .........

In [4]:
# Variables for pre-processing and training.
batch_size = 32
epochs = 30
IMG_HEIGHT = 250
IMG_WIDTH = 250

We build three different dataframes: one for training the model and one for validating each epoch results. The third one only has a few samples and is meant for manual testing. The images will be packed in three _tf.data.Dataframe_ objects, with batching and automatic labels based in the folder structure.

In [5]:
train_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=train_dir,
    labels="inferred",
    label_mode="binary",
    batch_size=batch_size,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    color_mode="rgb",
    shuffle=True
)

validation_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=validation_dir,
    labels="inferred",
    label_mode="binary",
    batch_size=batch_size,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    color_mode="rgb",
    shuffle=True
)

test_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=test_dir,
    labels=None,
    #batch_size=None,
    color_mode="rgb",
    image_size=(IMG_HEIGHT, IMG_WIDTH)
)

Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 50 files belonging to 1 classes.
