AI Deep Learning – Simon Stijnen – May 2025

---

# Dinosaur Species Classification using Convolutional Neural Networks

This notebook implements a CNN model to classify dinosaur species using image data from Kaggle.

## Project Overview

In this project, we aim to build a deep learning model capable of distinguishing between 15 different dinosaur species using the [Dinosaur Image Dataset from Kaggle](https://www.kaggle.com/datasets/larserikrisholm/dinosaur-image-dataset-15-species).

The main objectives include:

1. Splitting the dataset into appropriate training, validation, and test sets
2. Selecting an appropriate CNN architecture
3. Tuning hyperparameters for optimal performance
4. Preventing overfitting with proper regularization techniques
5. Using Keras' Functional API to build the model
6. Evaluating the model with accuracy metrics and confusion matrices
7. Achieving an accuracy greater than 70%

## Dataset Preparation and Splitting

We'll use the `split-folders` library to properly split our dataset into training, validation, and test sets with a 70%-15%-15% ratio. This ensures we have proper separation for model evaluation and prevents data leakage.

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models, Input, applications
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import random
from pathlib import Path
import shutil


In [None]:
!pip install kagglehub

import kagglehub

# Download latest version
path = kagglehub.dataset_download("larserikrisholm/dinosaur-image-dataset-15-species")

print("Path to dataset files:", path)

In [None]:
!pip install split-folders

import splitfolders

# Dataset pad (pas dit aan indien nodig)
input_folder = os.path.join(path, "dinosaur_dataset")  # Pad naar de dataset
# Create absolute path for output directory
output_split_dir = os.path.abspath(os.path.join("data", "dinosaur_dataset_split"))  # Pad naar de output directory
os.makedirs(output_split_dir, exist_ok=True)

print("output_split_dir:", output_split_dir)

# Split dataset in train (70%), val (15%), test (15%)
splitfolders.ratio(
    input_folder,
    output=output_split_dir,
    seed=42,
    ratio=(0.7, 0.15, 0.15),
    group_prefix=None,
    move=False,
)

print("Dataset successfully split into train, validation, and test sets!")

Define the path to the dataset and set the image size and batch size for training, validation, and testing.

In [None]:
# Dataset and model parameters
img_height, img_width = 192, 192
batch_size = 32

Define the training, validation and test sets.

In [None]:
# Data preprocessing and augmentation for training data
train_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    preprocessing_function=applications.mobilenet_v2.preprocess_input
)

# Only preprocessing for validation and test data (no augmentation)
val_test_datagen = ImageDataGenerator(
    preprocessing_function=applications.mobilenet_v2.preprocess_input
)

# Load data from the split directories
train_data = train_datagen.flow_from_directory(
    f'{output_split_dir}/train',
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical'
)

val_data = val_test_datagen.flow_from_directory(
    f'{output_split_dir}/val',
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical'
)

test_data = val_test_datagen.flow_from_directory(
    f'{output_split_dir}/test',
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=False,  # Don't shuffle for consistent evaluation
    class_mode='categorical'
)

print(f"Number of training samples: {train_data.samples}")
print(f"Number of validation samples: {val_data.samples}")
print(f"Number of test samples: {test_data.samples}")
print(f"Number of classes: {len(train_data.class_indices)}")

## Transfer Learning with MobileNetV2

Instead of building a CNN from scratch, we'll use transfer learning with MobileNetV2 as our base model. This model has been pre-trained on ImageNet, which means it has already learned to extract useful features from images.

In [None]:
# Create a base model from MobileNetV2 - a lightweight, powerful CNN architecture
base_model = applications.MobileNetV2(
    input_shape=(img_height, img_width, 3),
    include_top=False,  # Exclude the classification layer
    weights='imagenet'  # Use pre-trained weights from ImageNet
)

# Freeze the base model to prevent its weights from being updated during initial training
base_model.trainable = False

# Create our model by adding custom layers on top of the base model
inputs = Input(shape=(img_height, img_width, 3))
x = base_model(inputs, training=False)  # Pass the inputs through the base model
x = layers.GlobalAveragePooling2D()(x)  # Global average pooling reduces params & prevents overfitting
x = layers.Dense(256, activation='relu')(x)  # Add a fully connected layer
x = layers.Dropout(0.5)(x)  # Add dropout for regularization
outputs = layers.Dense(15, activation='softmax')(x)  # 15 classes output layer

model = models.Model(inputs, outputs)

# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

In [None]:
# First phase: train just the top layers with the base model frozen
history = model.fit(
    train_data,
    epochs=10,
    validation_data=val_data
)

## Fine-tuning the model

After initial training with the base model frozen, we can unfreeze some of the deeper layers of the base model and train them along with our custom top layers. This allows the model to fine-tune the pre-trained features to our specific dataset.

In [None]:
# Unfreeze the last few layers of the base model
base_model.trainable = True

# Freeze all the layers except the last 4 layers
for layer in base_model.layers[:-4]:
    layer.trainable = False

# Recompile the model with a lower learning rate for fine-tuning
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),  # Use a lower learning rate
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Print which layers are trainable
for layer in model.layers:
    print(f"{layer.name}: {layer.trainable}")
    
# For the base model, print detailed trainable status
if hasattr(layer, 'layers'):  # Check if the layer has sub-layers
    for sublayer in layer.layers[-5:]:  # Show the last 5 layers
        print(f"  {sublayer.name}: {sublayer.trainable=}")

In [None]:
# Second phase: fine-tuning with unfrozen layers
fine_tune_history = model.fit(
    train_data,
    epochs=5,
    validation_data=val_data
)

## Visualizing Training History

Let's plot the training and validation accuracy/loss to see how our model performed during training.

In [None]:
# Combine histories from both training phases
acc = history.history['accuracy'] + fine_tune_history.history['accuracy']
val_acc = history.history['val_accuracy'] + fine_tune_history.history['val_accuracy']
loss = history.history['loss'] + fine_tune_history.history['loss']
val_loss = history.history['val_loss'] + fine_tune_history.history['val_loss']

plt.figure(figsize=(14, 6))

# Plot training & validation accuracy
plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.plot([9, 9], [0, 1], 'r--', label='Start Fine Tuning')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0, 1])
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

# Plot training & validation loss
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.plot([9, 9], [0, 1], 'r--', label='Start Fine Tuning')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim([0, 1.0])
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')

plt.show()

## Model Evaluation on Test Set

Now we'll evaluate our model on the completely separate test set to get a true measure of its performance.

In [None]:
# Evaluate the model on the test data
test_loss, test_accuracy = model.evaluate(test_data)
print(f"Test accuracy: {test_accuracy:.4f}")
print(f"Test loss: {test_loss:.4f}")

In [None]:
# Make predictions on the test data
test_data.reset()  # Reset before predictions
y_pred_probs = model.predict(test_data)
y_pred = np.argmax(y_pred_probs, axis=1)

# Extract the true labels
# Since the test generator doesn't shuffle, labels align with class indices
y_true = test_data.classes
class_labels = list(test_data.class_indices.keys())

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)

# Plot raw counts confusion matrix
plt.figure(figsize=(14, 12))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=class_labels, yticklabels=class_labels, cmap='Blues')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title('Confusion Matrix (Counts)')
plt.xticks(rotation=45, ha='right')

plt.tight_layout()
plt.show()

# Generate classification report
report = classification_report(y_true, y_pred, target_names=class_labels, digits=3)
print("Classification Report:")
print(report)

# Calculate overall metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print(f"Test Accuracy: {accuracy_score(y_true, y_pred):.4f}")
print(f"Macro Precision: {precision_score(y_true, y_pred, average='macro'):.4f}")
print(f"Macro Recall: {recall_score(y_true, y_pred, average='macro'):.4f}")
print(f"Macro F1-Score: {f1_score(y_true, y_pred, average='macro'):.4f}")

## Sample Predictions

Let's visualize some sample predictions to see how our model performs on individual images.

In [None]:
import random
from tensorflow.keras.preprocessing import image

def predict_and_display_image(img_path, model, class_labels):
    # Load and preprocess the image
    img = image.load_img(img_path, target_size=(img_height, img_width))
    img_array = image.img_to_array(img)
    img_batch = np.expand_dims(img_array, axis=0)
    preprocessed_img = applications.mobilenet_v2.preprocess_input(img_batch)
    
    # Make prediction
    predictions = model.predict(preprocessed_img)
    predicted_class = np.argmax(predictions[0])
    confidence = predictions[0][predicted_class]
    
    # Display image and prediction
    plt.figure(figsize=(6, 6))
    plt.imshow(img)
    plt.title(f"Predicted: {class_labels[predicted_class]}\nConfidence: {confidence:.2f}")
    plt.axis('off')
    plt.show()
    
    # Show top 3 predictions
    top_3_idx = np.argsort(predictions[0])[-3:][::-1]
    top_3_classes = [class_labels[i] for i in top_3_idx]
    top_3_confidences = [predictions[0][i] for i in top_3_idx]
    
    for cls, conf in zip(top_3_classes, top_3_confidences):
        print(f"{cls}: {conf:.4f}")

# Get a list of dinosaur classes
dino_classes = list(train_data.class_indices.keys())

# Define the test directory path
test_dir = os.path.join(output_split_dir, 'test')

# Sample a few images from different classes for prediction
for dino_class in random.sample(dino_classes, 3):
    class_dir = os.path.join(test_dir, dino_class)
    image_files = os.listdir(class_dir)
    sample_image = os.path.join(class_dir, random.choice(image_files))
    print(f"\nSample from class: {dino_class}")
    predict_and_display_image(sample_image, model, class_labels)

## Analyze Misclassifications

Let's examine some of the misclassified images to understand what might be confusing the model.

In [None]:
# Get file paths and true labels from the test directory
test_image_paths = []
test_labels = []

for class_idx, class_name in enumerate(test_data.class_indices):
    class_dir = os.path.join(test_dir, class_name)
    for img_name in os.listdir(class_dir):
        if img_name.endswith('.jpg'):
            img_path = os.path.join(class_dir, img_name)
            test_image_paths.append(img_path)
            test_labels.append(class_idx)

# Convert to numpy arrays
test_image_paths = np.array(test_image_paths)
test_labels = np.array(test_labels)

# Find misclassified indices
misclassified_indices = np.where(y_pred != y_true)[0]

# Display random misclassified images
n_display = min(6, len(misclassified_indices))
sample_indices = np.random.choice(misclassified_indices, n_display, replace=False)

plt.figure(figsize=(15, 10))
for i, idx in enumerate(sample_indices):
    # Get the file path for this test image
    img_idx = idx % len(test_image_paths)  # Handle case where idx is out of range
    img_path = test_image_paths[img_idx]
    
    # Load and preprocess the image
    img = image.load_img(img_path, target_size=(img_height, img_width))
    img_array = image.img_to_array(img)
    img_batch = np.expand_dims(img_array, axis=0)
    preprocessed_img = applications.mobilenet_v2.preprocess_input(img_batch)
    
    # Get predictions
    predictions = model.predict(preprocessed_img)
    predicted_class = np.argmax(predictions[0])
    confidence = predictions[0][predicted_class]
    
    # Get true class
    true_class = y_true[idx]
    
    # Plot
    plt.subplot(2, 3, i+1)
    plt.imshow(img)
    plt.title(f"True: {class_labels[true_class]}\nPred: {class_labels[predicted_class]}\nConf: {confidence:.2f}")
    plt.axis('off')
    
plt.tight_layout()
plt.show()

## Save the Model

Let's save our trained model for future use.

In [None]:
# Save the model architecture and weights
model.save('model/dinosaur_classifier_transfer_learning.h5')

# Save the class labels mapping
import json
with open('model/dinosaur_class_mapping.json', 'w') as f:
    json.dump(test_data.class_indices, f)

# Save test performance metrics
test_metrics = {
    'accuracy': float(accuracy_score(y_true, y_pred)),
    'precision': float(precision_score(y_true, y_pred, average='macro')),
    'recall': float(recall_score(y_true, y_pred, average='macro')),
    'f1_score': float(f1_score(y_true, y_pred, average='macro')),
    'classes': test_data.class_indices
}

with open('model/dinosaur_model_performance.json', 'w') as f:
    json.dump(test_metrics, f, indent=2)

print("Model, class mapping, and performance metrics saved successfully!")