AI Deep Learning – Simon Stijnen – May 2025

---

# Dinosaur Species Classification using Convolutional Neural Networks

This notebook implements a CNN model to classify dinosaur species using image data from Kaggle.

## Project Overview

In this project, we aim to build a deep learning model capable of distinguishing between 15 different dinosaur species using the [Dinosaur Image Dataset from Kaggle](https://www.kaggle.com/datasets/larserikrisholm/dinosaur-image-dataset-15-species).

The main objectives include:

1. Splitting the dataset into appropriate training, validation, and test sets
2. Selecting an appropriate CNN architecture
3. Tuning hyperparameters for optimal performance
4. Preventing overfitting with proper regularization techniques
5. Using Keras' Functional API to build the model
6. Evaluating the model with accuracy metrics and confusion matrices
7. Achieving an accuracy greater than 70%

Importing tensorflow and other necessary libraries

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os


Define the path to the dataset and set the image size and batch size for training and validation.

In [None]:
# Adjust this path to your extracted dataset location
base_dir = 'data/dinosaur_dataset'  # e.g., '/content/drive/MyDrive/dino_data'
img_height, img_width = 196, 196
batch_size = 32


Define the training and validation sets.

In [None]:
datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.3,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

train_data = datagen.flow_from_directory(
    base_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    subset='training',
    class_mode='categorical'
)

val_data = datagen.flow_from_directory(
    base_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    subset='validation',
    class_mode='categorical'
)

In [None]:
inputs = Input(shape=(img_height, img_width, 3))
x = layers.Conv2D(32, (3, 3), activation='relu')(inputs)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(512, activation='relu')(x)
outputs = layers.Dense(15, activation='softmax')(x)

model = models.Model(inputs, outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
history = model.fit(
    train_data,
    epochs=15,
    validation_data=val_data
)

In [None]:
val_data.reset()
preds = model.predict(val_data)
predicted_classes = np.argmax(preds, axis=1)
true_classes = val_data.classes
class_labels = list(val_data.class_indices.keys())

# Confusion matrix
cm = confusion_matrix(true_classes, predicted_classes)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=class_labels, yticklabels=class_labels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

# Classification report
print(classification_report(true_classes, predicted_classes, target_names=class_labels))