# CIFAR-10 Image Classification with CNNs and Transfer Learning (VGG16)

**Author:** Oghenevurie Lauretta  
**Focus:** Applied Deep Learning (Computer Vision) — portfolio project

## Executive summary
This notebook builds and evaluates image classification models on the CIFAR-10 dataset (10 object categories).  
It starts with a **baseline CNN** and then uses **transfer learning (VGG16)** to demonstrate how pre-trained feature extractors can improve performance and training efficiency.

## Why this matters (healthcare / applied AI angle)
The workflow here (data pipelines → model training → robust evaluation → error analysis) maps directly to real-world image classification problems, including **medical imaging** tasks (e.g., triage, quality control, and decision-support), where **class-level performance and error patterns** are as important as overall accuracy.


## 1. Setup and reproducibility
We set seeds for reproducibility and import required libraries.

In [None]:
# Core
import os
import random
import numpy as np
import matplotlib.pyplot as plt

# ML / DL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Metrics
from sklearn.metrics import classification_report, confusion_matrix

# Reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

print("TensorFlow:", tf.__version__)


## 2. Load CIFAR-10 and create train/validation split
CIFAR-10 consists of 60,000 32×32 colour images across 10 classes (50,000 train / 10,000 test). We'll create a validation split from the training set.

In [None]:
from tensorflow.keras.datasets import cifar10
from sklearn.model_selection import train_test_split

# Load dataset
(x_train_full, y_train_full), (x_test, y_test) = cifar10.load_data()

# Train/val split
x_train, x_val, y_train, y_val = train_test_split(
    x_train_full, y_train_full, test_size=0.10, random_state=SEED, stratify=y_train_full
)

# Normalise to [0, 1]
x_train = x_train.astype("float32") / 255.0
x_val   = x_val.astype("float32") / 255.0
x_test  = x_test.astype("float32") / 255.0

class_names = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

print("Train:", x_train.shape, y_train.shape)
print("Val:  ", x_val.shape, y_val.shape)
print("Test: ", x_test.shape, y_test.shape)


## 3. Quick data sanity check
We visualise a few training images and their labels.

In [None]:
plt.figure(figsize=(8, 8))
for i in range(9):
    plt.subplot(3, 3, i+1)
    plt.imshow(x_train[i])
    plt.title(class_names[int(y_train[i])])
    plt.axis("off")
plt.tight_layout()
plt.show()


## 4. Helper functions (evaluation + error analysis)
These utilities standardise evaluation so results are consistent across models (matching a 'prediction project' format).

In [None]:
def plot_history(history, title="Training curves"):
    plt.figure(figsize=(8, 5))
    plt.plot(history.history.get("accuracy", []), label="Train accuracy")
    plt.plot(history.history.get("val_accuracy", []), label="Val accuracy")
    plt.title(title)
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend()
    plt.show()

    plt.figure(figsize=(8, 5))
    plt.plot(history.history.get("loss", []), label="Train loss")
    plt.plot(history.history.get("val_loss", []), label="Val loss")
    plt.title(title.replace("accuracy", "loss"))
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend()
    plt.show()

def plot_confusion_matrix(cm, labels, title="Confusion matrix"):
    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation="nearest")
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(labels))
    plt.xticks(tick_marks, labels, rotation=45, ha="right")
    plt.yticks(tick_marks, labels)
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
    plt.tight_layout()
    plt.show()

def evaluate_classifier(model, x, y, labels, batch_size=128, title_prefix="Model"):
    # Predict
    probs = model.predict(x, batch_size=batch_size, verbose=0)
    y_pred = np.argmax(probs, axis=1)
    y_true = y.reshape(-1)

    # Confusion matrix + report
    cm = confusion_matrix(y_true, y_pred)
    print(f"\n{title_prefix} — Classification report:")
    print(classification_report(y_true, y_pred, target_names=labels, digits=4))
    plot_confusion_matrix(cm, labels, title=f"{title_prefix} — Confusion matrix")

    return y_true, y_pred, probs

def show_misclassifications(x, y_true, y_pred, labels, n=12):
    wrong = np.where(y_true != y_pred)[0]
    if len(wrong) == 0:
        print("No misclassifications found.")
        return

    n = min(n, len(wrong))
    picks = np.random.choice(wrong, size=n, replace=False)

    plt.figure(figsize=(10, 7))
    cols = 4
    rows = int(np.ceil(n / cols))
    for i, idx in enumerate(picks, start=1):
        plt.subplot(rows, cols, i)
        plt.imshow(x[idx])
        plt.title(f"T: {labels[y_true[idx]]}\nP: {labels[y_pred[idx]]}", fontsize=9)
        plt.axis("off")
    plt.tight_layout()
    plt.show()


## 5. Baseline model: Convolutional Neural Network (CNN)
We build a straightforward CNN with dropout + batch normalisation. This serves as the baseline for comparison.

In [None]:
def build_cnn(input_shape=(32, 32, 3), num_classes=10):
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), padding="same", activation="relu", input_shape=input_shape),
        layers.BatchNormalization(),
        layers.Conv2D(32, (3, 3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.MaxPooling2D(),
        layers.Dropout(0.25),

        layers.Conv2D(64, (3, 3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.MaxPooling2D(),
        layers.Dropout(0.30),

        layers.Flatten(),
        layers.Dense(256, activation="relu"),
        layers.Dropout(0.40),
        layers.Dense(num_classes, activation="softmax")
    ])

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-3),
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

cnn = build_cnn()
cnn.summary()


### 5B. Train the CNN
We include early stopping to reduce overfitting and keep training time reasonable.

In [None]:
early_stop = keras.callbacks.EarlyStopping(
    monitor="val_accuracy", patience=5, restore_best_weights=True
)

history_cnn = cnn.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    epochs=30,
    batch_size=64,
    callbacks=[early_stop],
    verbose=1
)

plot_history(history_cnn, title="CNN training curves")


### 5C. CNN evaluation on the test set
We evaluate beyond accuracy: confusion matrix, class-level precision/recall/F1, and misclassifications.

In [None]:
cnn_test_loss, cnn_test_acc = cnn.evaluate(x_test, y_test, verbose=0)
print(f"CNN test accuracy: {cnn_test_acc:.4f} | test loss: {cnn_test_loss:.4f}")

y_true_cnn, y_pred_cnn, probs_cnn = evaluate_classifier(
    cnn, x_test, y_test, class_names, title_prefix="CNN"
)

show_misclassifications(x_test, y_true_cnn, y_pred_cnn, class_names, n=12)


## 6. Transfer learning: VGG16 feature extractor
We reuse ImageNet-learned features from VGG16. Because CIFAR-10 images are 32×32, we resize to 224×224 and apply `preprocess_input`.

> Note: Transfer learning is commonly used in applied settings (including medical imaging) where labelled data may be limited and strong pre-trained features improve performance.

In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input

IMG_SIZE = 224
BATCH = 64

def make_ds(X, y, batch=BATCH, shuffle=False):
    ds = tf.data.Dataset.from_tensor_slices((X, y))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(X), seed=SEED)
    ds = ds.map(lambda img, lab: (tf.image.resize(img, (IMG_SIZE, IMG_SIZE)), lab),
                num_parallel_calls=tf.data.AUTOTUNE)
    ds = ds.map(lambda img, lab: (preprocess_input(img), lab),
                num_parallel_calls=tf.data.AUTOTUNE)
    ds = ds.batch(batch).prefetch(tf.data.AUTOTUNE)
    return ds

train_ds = make_ds(x_train, y_train, shuffle=True)
val_ds   = make_ds(x_val, y_val, shuffle=False)
test_ds  = make_ds(x_test, y_test, shuffle=False)

print("Prepared tf.data pipelines.")


### 6B. Build and train the transfer learning model (frozen base)
We freeze the VGG16 convolutional base and train a small classification head first.

In [None]:
base = VGG16(weights="imagenet", include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3))
base.trainable = False

inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = base(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(10, activation="softmax")(x)

vgg_frozen = keras.Model(inputs, outputs)

vgg_frozen.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

vgg_frozen.summary()


In [None]:
early_stop_vgg = keras.callbacks.EarlyStopping(
    monitor="val_accuracy", patience=5, restore_best_weights=True
)

history_vgg_frozen = vgg_frozen.fit(
    train_ds,
    validation_data=val_ds,
    epochs=15,
    callbacks=[early_stop_vgg],
    verbose=1
)

plot_history(history_vgg_frozen, title="VGG16 (frozen) training curves")


### 6C. Optional fine-tuning
If validation performance plateaus, we can unfreeze the top layers of VGG16 and continue training with a smaller learning rate.

In [None]:
# Unfreeze top convolutional block(s) for fine-tuning
base.trainable = True

# Freeze earlier layers to keep low-level features stable
for layer in base.layers[:-16]:
    layer.trainable = False

vgg_finetuned = keras.Model(inputs, outputs)

vgg_finetuned.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

early_stop_ft = keras.callbacks.EarlyStopping(
    monitor="val_accuracy", patience=4, restore_best_weights=True
)

history_vgg_ft = vgg_finetuned.fit(
    train_ds,
    validation_data=val_ds,
    epochs=10,
    callbacks=[early_stop_ft],
    verbose=1
)

plot_history(history_vgg_ft, title="VGG16 (fine-tuned) training curves")


## 7. Evaluate and compare models
We compute test accuracy for each model and review class-level performance for the strongest model.

In [None]:
# Evaluate on test set
cnn_test_loss, cnn_test_acc = cnn.evaluate(x_test, y_test, verbose=0)
vgg_frozen_loss, vgg_frozen_acc = vgg_frozen.evaluate(test_ds, verbose=0)

# Fine-tuned model may not have been run; guard accordingly
vgg_ft_acc = None
try:
    vgg_ft_loss, vgg_ft_acc = vgg_finetuned.evaluate(test_ds, verbose=0)
except Exception as e:
    print("Fine-tuned model not evaluated (likely not trained yet).")

print(f"CNN test accuracy:         {cnn_test_acc:.4f}")
print(f"VGG16 frozen test accuracy:{vgg_frozen_acc:.4f}")
if vgg_ft_acc is not None:
    print(f"VGG16 fine-tuned accuracy: {vgg_ft_acc:.4f}")

# Choose best available model for deeper evaluation
best_model = vgg_finetuned if vgg_ft_acc is not None else vgg_frozen
best_name  = "VGG16 fine-tuned" if vgg_ft_acc is not None else "VGG16 frozen"

# For evaluate_classifier we need arrays; create predictions from ds
probs_best = best_model.predict(test_ds, verbose=0)
y_pred_best = np.argmax(probs_best, axis=1)
y_true_best = y_test.reshape(-1)

cm_best = confusion_matrix(y_true_best, y_pred_best)
print(f"\n{best_name} — Classification report:")
print(classification_report(y_true_best, y_pred_best, target_names=class_names, digits=4))
plot_confusion_matrix(cm_best, class_names, title=f"{best_name} — Confusion matrix")
show_misclassifications(x_test, y_true_best, y_pred_best, class_names, n=12)


## 8. Conclusion (portfolio-ready)
**Key takeaways**
- A baseline CNN provides a strong starting point for CIFAR-10 classification.
- Transfer learning with VGG16 demonstrates how pre-trained features can accelerate learning and improve performance in applied scenarios.
- Class-level metrics and misclassification analysis reveal where the model is most reliable and where it struggles—critical for real-world deployment.

**Next steps**
- Add data augmentation (random flips/crops) and systematic hyperparameter tuning.
- Try modern architectures (ResNet/EfficientNet) and compare.
- Package the best model for inference (saved model + simple prediction script) and document results in the repository README.

## 9. Save trained models
Saving models makes your repository reproducible. Commit large model files only if your repo policy allows; otherwise, upload to a release or cloud storage and link in the README.

In [None]:
# Save baseline CNN
cnn.save("cnn_cifar10_lauretta.keras")
print("Saved: cnn_cifar10_lauretta.keras")

# Save VGG16 models (if trained)
try:
    vgg_frozen.save("vgg16_frozen_cifar10_lauretta.keras")
    print("Saved: vgg16_frozen_cifar10_lauretta.keras")
except Exception:
    pass

try:
    vgg_finetuned.save("vgg16_finetuned_cifar10_lauretta.keras")
    print("Saved: vgg16_finetuned_cifar10_lauretta.keras")
except Exception:
    pass
