# Practice Exercise on Convolutional Neural Networks (CNN)

Welcome to the Practice Exercise on Convolutional Neural Networks (CNN). In this exercise, we will focus on an image classification task where the goal is to predict whether an image contains a cat or a dog. We will work with a dataset of labeled images and build, train, and evaluate a CNN model. This practice will allow you to apply your understanding of CNNs to achieve high accuracy in image classification.

---

## Dataset Overview

### **Dataset Name:** Cats and Dogs Image Dataset

### **Description:**  
The dataset contains images of cats and dogs labeled for classification purposes. Each image belongs to one of the two classes: 'Cat' or 'Dog'. The goal is to classify the images correctly based on the content (i.e., whether the image is of a cat or a dog). The dataset is often used to test image classification models.

### **Features:**
There are two main folders which are:
- `Cat`: Images labeled as containing a cat.
- `Dog`: Images labeled as containing a dog.

### **Target Variable:**
- The goal is to predict whether an image contains a cat or a dog.


## Data Loading and Preprocessing


We will start by loading the dataset and preprocessing the images. This includes:
- Resizing images .
- Normalizing pixel values.

Add more if needed!


In [2]:
pip install ImageDataGenerator

[31mERROR: Could not find a version that satisfies the requirement ImageDataGenerator (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for ImageDataGenerator[0m[31m
[0m

In [14]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, Dropout,Rescaling
import tensorflow as tf

from tensorflow.keras.utils import image_dataset_from_directory
from tensorflow.keras.preprocessing.image import ImageDataGenerator


import numpy as np

In [4]:
from google.colab import files
import zipfile
import os
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator


In [12]:
# uploaded = files.upload()

In [90]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [91]:
zip_path = '/content/drive/MyDrive/T5_notebooks/Week4/Tasks training for the exam/Dataset/CatsVsDogs.zip'

In [92]:
extract_to = '/content/dataset'

In [93]:
os.makedirs(extract_to, exist_ok=True)

In [94]:
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_to)
    print('Files extracted')

Files extracted


In [13]:
#base_dir = os.path.join(extract_to, 'CatsVsDogs')

In [95]:
base_dir = '/content/dataset/content/PetImages'

## Data Splitting
In this section, we will split our dataset into three parts:

* Training set (70%): This portion of the dataset is used to train the CNN model.
* Validation set (15%): This portion is used to validate the model during training, helping us tune hyperparameters and avoid overfitting.
* Test set (15%): This portion is used to evaluate the model after training, to check its generalization to unseen data.

In [96]:
batch_size = 32
image_size = (64, 64)
seed = 123

In [97]:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    directory=base_dir,
    labels='inferred',
    label_mode='binary',
    class_names=['Cat', 'Dog'],
    color_mode='rgb',
    batch_size=32,
    image_size=(64, 64),
    shuffle=True,
    seed=123,
    validation_split=0.15,
    subset='training'
)

Found 25000 files belonging to 2 classes.
Using 21250 files for training.


In [98]:
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    directory=base_dir,
    labels='inferred',
    label_mode='binary',
    class_names=['Cat', 'Dog'],
    color_mode='rgb',
    batch_size=32,
    image_size=(64, 64),
    shuffle=True,
    seed=123,
    validation_split=0.15,
    subset='validation'
)

Found 25000 files belonging to 2 classes.
Using 3750 files for validation.


In [99]:
print(f"Train Dataset Size: {tf.data.experimental.cardinality(train_ds).numpy()}")
print(f"Validation Dataset Size: {tf.data.experimental.cardinality(val_ds).numpy()}")

Train Dataset Size: 665
Validation Dataset Size: 118


In [100]:
train_ds = train_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=tf.data.AUTOTUNE)

In [106]:
full_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    base_dir,
    labels='inferred',
    label_mode='binary',  # since you have two classes, binary label mode is used
    class_names=['Cat', 'Dog'],
    color_mode='rgb',
    batch_size=batch_size,
    image_size=image_size,
    shuffle=True,
    seed=seed
)

Found 24998 files belonging to 2 classes.


In [107]:
total_batches = tf.data.experimental.cardinality(full_dataset).numpy()
train_batches = int(total_batches * 0.7)
val_batches = int(total_batches * 0.15)
test_batches = total_batches - train_batches - val_batches

In [108]:
train_ds = full_dataset.take(train_batches)
test_ds = full_dataset.skip(train_batches).take(test_batches)
val_ds = full_dataset.skip(train_batches + test_batches)

In [54]:
# print(f"Train Dataset Size: {tf.data.experimental.cardinality(train_ds).numpy()}")
# print(f"Validation Dataset Size: {tf.data.experimental.cardinality(val_ds).numpy()}")
# print(f"Test Dataset Size: {tf.data.experimental.cardinality(test_ds).numpy()}")

Train Dataset Size: 547
Validation Dataset Size: 117
Test Dataset Size: 118


In [19]:
# def create_dataset(folder_name, image_size, batch_size):
#     return tf.keras.preprocessing.image_dataset_from_directory(
#         base_dir,
#         labels='inferred',
#         label_mode='binary',
#         class_names=['Cat', 'Dog'],
#         color_mode='rgb',
#         batch_size=batch_size,
#         image_size=image_size,
#         shuffle=True,
#         seed=seed,
#         validation_split=0.15,
#         subset=folder_name
#     )

In [20]:
# train_ds = create_dataset('training', image_size, batch_size)
# val_ds = create_dataset('validation', image_size, batch_size)

Found 25000 files belonging to 2 classes.
Using 21250 files for training.
Found 25000 files belonging to 2 classes.
Using 3750 files for validation.


In [None]:
# train = image_dataset_from_directory(
#     'Datasets/archive/train',
#     validation_split=0.15,
#     # shuffle=True,
#     subset="training",
#     seed=123,
#     image_size=(64, 64)
# )

# val = image_dataset_from_directory(
#     'Datasets/archive/train',
#     validation_split=0.15,
#     # shuffle=True,
#     subset="validation",
#     seed=123,
#     image_size=(64, 64)
# )

# test = image_dataset_from_directory(
#     'Datasets/archive/test',
#     seed=123,
#     image_size=(64, 64)
# )

## Building the CNN Model


Now, we will define our CNN architecture using `tensorflow.keras`. The architecture will consist of:
- Convolutional layers followed by max-pooling layers
- Flatten layer
- Dense layers
- Output layer


In [25]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Model configuration
input_shape = (64, 64, 3)  # Input shape (height, width, channels)
num_classes = 1  # Binary classification
dropout_rate = 0.5

# Build the CNN model
model = Sequential([
    # First convolutional layer
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
    MaxPooling2D(pool_size=(2, 2)),

    # Second convolutional layer
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),

    # Third convolutional layer
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),

    # Flatten layer
    Flatten(),

    # Dense Layer
    Dense(512, activation='relu'),
    Dropout(dropout_rate),  # Dropout layer for regularization

    # Output layer
    Dense(num_classes, activation='sigmoid')  # Sigmoid for binary classification
])

# Model summary
model.summary()

# Compile the model
model.compile(loss='binary_crossentropy',  # Suitable loss for binary classification
              optimizer='adam',  # Optimizer
              metrics=['accuracy'])  # Metric to monitor


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Training the Model


Train the CNN model using the `fit` function. We will use the training and validation we created earlier.

Fill in the code to train the model for a specified number of epochs.


In [109]:
import os

# Directory containing the images
image_dirs = ['/content/dataset/content/PetImages/Cat', '/content/dataset/content/PetImages/Dog']

# Check for empty files
empty_files = []
for dir_path in image_dirs:
    for filename in os.listdir(dir_path):
        file_path = os.path.join(dir_path, filename)
        if os.path.getsize(file_path) == 0:
            empty_files.append(file_path)

print("Empty Files:", empty_files)

# Optionally, remove the empty files
for file_path in empty_files:
    os.remove(file_path)
    print("Removed:", file_path)


Empty Files: []


In [110]:
from PIL import Image
import os

def verify_images(image_dir):
    broken_files = []
    for subdir, dirs, files in os.walk(image_dir):
        for file in files:
            filepath = os.path.join(subdir, file)
            try:
                with Image.open(filepath) as img:
                    img.verify()  # Verify the integrity of the image
            except (IOError, SyntaxError) as e:
                print('Bad file:', filepath)  # Print out the names of corrupt files
                broken_files.append(filepath)
    return broken_files

# Path to the directory containing images
image_dir = '/content/dataset/content/PetImages'

# Check images and remove if necessary
corrupt_files = verify_images(image_dir)
for file_path in corrupt_files:
    os.remove(file_path)
    print("Removed corrupt file:", file_path)

# Re-check total files after removal
remaining_files = verify_images(image_dir)
print("Remaining corrupt files:", remaining_files)


Remaining corrupt files: []


In [111]:
from tensorflow.keras.optimizers import Adam

model = Sequential([
    Conv2D(filters=32, kernel_size=(3,3), activation='relu',input_shape=(64, 64, 3)),
    #Rescaling(1./255, input_shape=(64, 64, 3)),
    Conv2D(12, (3,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Conv2D(20, (3,3), activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(6, activation='relu'),
    Dropout(0.25),  # Reactivated Dropout
    Dense(12, activation='relu'),
    Dropout(0.25),  # Reactivated Dropout
    Dense(1, activation='sigmoid')
])

model.summary()


In [112]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [113]:
history = model.fit(train_ds, epochs=10, validation_data=val_ds)

Epoch 1/10
[1m278/547[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m1:05[0m 244ms/step - accuracy: 0.5054 - loss: 0.9313

InvalidArgumentError: Graph execution error:

Detected at node decode_image/DecodeImage defined at (most recent call last):
<stack traces unavailable>
Input size should match (header_size + row_size * abs_height) but they differ by 2
	 [[{{node decode_image/DecodeImage}}]]
	 [[IteratorGetNext]] [Op:__inference_one_step_on_iterator_60034]

In [None]:
# import tensorflow as tf

# # Training configuration
# epochs = 10  # Number of epochs to train for

# # Callbacks for training (optional)
# callbacks = [
#     tf.keras.callbacks.ModelCheckpoint('best_model.keras', save_best_only=True, monitor='val_loss', mode='min'),
#     tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, verbose=1)
# ]

# # Train the model
# history = model.fit(
#     train_ds,  # Training data
#     validation_data=val_ds,  # Validation data
#     epochs=epochs,  # Number of epochs
#     callbacks=callbacks,  # Callbacks to use during training
#     verbose=1  # Verbosity mode
# )



In [None]:
# def plot_history(history):
#     plt.figure(figsize=(12, 4))

#     plt.subplot(1, 2, 1)
#     plt.plot(history.history['accuracy'], label='Training Accuracy')
#     plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
#     plt.title('Accuracy over epochs')
#     plt.xlabel('Epochs')
#     plt.ylabel('Accuracy')
#     plt.legend()

#     plt.subplot(1, 2, 2)
#     plt.plot(history.history['loss'], label='Training Loss')
#     plt.plot(history.history['val_loss'], label='Validation Loss')
#     plt.title('Loss over epochs')
#     plt.xlabel('Epochs')
#     plt.ylabel('Loss')
#     plt.legend()

#     plt.show()

# # Call function to plot the training history
# plot_history(history)


In [None]:
model.save('final_model.keras')

## Evaluating the Model


After training, evaluate the model on the validation data to check its performance.


## Testing with New Images

Finally, let's test the model with some new images. Preprocess the images and use the trained model to predict whether the image is of a cat or a dog.
