# Dogs vs Cats Image Classification using MobileNetV2 Transfer Learning

**A complete beginner-friendly guide to binary image classification with deep learning**

In this notebook we will:
- Load the Kaggle Dogs vs Cats dataset
- Understand what Transfer Learning is and why MobileNetV2
- Build, train and fine-tune a classifier
- Visualize predictions, training progress and model behaviour
- Evaluate with real metrics

**No prior deep learning experience needed. Every step is explained.**

---

## 1. What is Binary Image Classification?

Binary classification means the model must choose between exactly **two classes**.

In our case:
- Input → a photo of an animal
- Output → **Dog** (1) or **Cat** (0)

The model learns by looking at thousands of labelled photos and adjusting its internal weights until it gets good at telling them apart.

---

## 2. What is Transfer Learning?

Training a deep neural network from scratch needs:
- Millions of images
- Days of GPU time
- A lot of expertise

**Transfer Learning lets us skip all that.**

We take a model already trained on 1.2 million images (ImageNet) and reuse its knowledge. The model already knows how to detect edges, textures, shapes and object parts. We only need to teach it the final step: *is this a dog or a cat?*

```
Pretrained MobileNetV2          Our Addition
─────────────────────    +    ──────────────────
Knows edges, textures         New Dense layers
Knows shapes, patterns   →    Trained on our data
Knows animal features         Outputs: Dog or Cat
```

---

## 3. Why MobileNetV2?

| Property | Value |
|---|---|
| Parameters | 3.4 Million (very light) |
| Input size | 224 x 224 pixels |
| Designed for | Mobile and edge devices |
| Speed | Very fast — ideal for Kaggle |
| Accuracy | Good — 71.8% on ImageNet |

It uses **depthwise separable convolutions** — a smart technique that gets similar accuracy to larger models while being much cheaper to run.

---

## 4. Imports

In [None]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

from sklearn.metrics import (
    classification_report, confusion_matrix,
    roc_auc_score, roc_curve
)

# Reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

print(f'TensorFlow version : {tf.__version__}')
print(f'GPU available      : {len(tf.config.list_physical_devices("GPU")) > 0}')

## 5. Load the Dataset

We use `kagglehub` to download the Dogs vs Cats dataset directly. The dataset contains 25,000 images — 12,500 dogs and 12,500 cats — perfectly balanced.

In [None]:
import kagglehub

# Download dataset
path = kagglehub.dataset_download('salader/dogs-vs-cats')
print(f'Dataset downloaded to: {path}')

# Explore the folder structure
for root, dirs, files in os.walk(path):
    level = root.replace(path, '').count(os.sep)
    indent = '  ' * level
    print(f'{indent}{os.path.basename(root)}/')
    if level < 2:
        for f in files[:3]:
            print(f'{indent}  {f}')
        if len(files) > 3:
            print(f'{indent}  ... ({len(files)} total files)')

In [None]:
# Set up directory paths
# Adjust these paths based on the actual folder structure printed above
BASE_DIR  = path
TRAIN_DIR = os.path.join(BASE_DIR, 'train')
TEST_DIR  = os.path.join(BASE_DIR, 'test')

# If the structure is flat (all images in one folder), use this instead:
# TRAIN_DIR = os.path.join(BASE_DIR, 'train', 'train')

# Count images per class
for split_name, split_dir in [('Train', TRAIN_DIR), ('Test', TEST_DIR)]:
    if os.path.exists(split_dir):
        for cls in ['cats', 'dogs', 'cat', 'dog']:
            cls_path = os.path.join(split_dir, cls)
            if os.path.exists(cls_path):
                print(f'{split_name} | {cls}: {len(os.listdir(cls_path))} images')

# Config
IMG_SIZE   = 224   # MobileNetV2 expects 224x224
BATCH_SIZE = 32
EPOCHS_1   = 10    # Phase 1: frozen base
EPOCHS_2   = 5     # Phase 2: fine-tuning

## 6. Visualize Raw Images

Always look at your data before training. This tells you image quality, variety, and potential problems.

In [None]:
from tensorflow.keras.preprocessing.image import load_img, img_to_array

def show_sample_images(train_dir, n_each=4):
    fig, axes = plt.subplots(2, n_each, figsize=(14, 6))
    classes = ['cats', 'dogs']
    # Try alternate naming
    alt = ['cat', 'dog']

    for row, (cls, alt_cls) in enumerate(zip(classes, alt)):
        cls_dir = os.path.join(train_dir, cls)
        if not os.path.exists(cls_dir):
            cls_dir = os.path.join(train_dir, alt_cls)
        files = random.sample(os.listdir(cls_dir), min(n_each, len(os.listdir(cls_dir))))
        for col, fname in enumerate(files):
            img = load_img(os.path.join(cls_dir, fname), target_size=(224, 224))
            axes[row, col].imshow(img)
            axes[row, col].axis('off')
            axes[row, col].set_title(cls.capitalize(), fontsize=11, fontweight='bold')

    plt.suptitle('Sample Images from Dataset', fontsize=14)
    plt.tight_layout()
    plt.show()

show_sample_images(TRAIN_DIR)

## 7. Data Augmentation — Teaching the Model to Generalise

Augmentation means we artificially create variations of each image during training — flips, zooms, rotations. This prevents the model from memorising exact pixels and forces it to learn actual features.

**We only augment the training set. Validation and test sets stay unchanged.**

In [None]:
# Training augmentation (artificial variety)
train_datagen = ImageDataGenerator(
    rescale=1./255,           # Pixel values: 0-255 → 0-1
    horizontal_flip=True,     # Mirror image left-right
    zoom_range=0.15,          # Zoom in/out slightly
    rotation_range=15,        # Rotate up to 15 degrees
    width_shift_range=0.1,    # Shift left/right
    height_shift_range=0.1,   # Shift up/down
    validation_split=0.2      # Reserve 20% for validation
)

# Validation: only rescale, no augmentation
val_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_gen = train_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    seed=SEED
)

val_gen = val_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    seed=SEED,
    shuffle=False
)

print(f'Class mapping: {train_gen.class_indices}')
print(f'Train batches : {len(train_gen)}')
print(f'Val batches   : {len(val_gen)}')

In [None]:
# Visualise what augmentation looks like on one image
aug_gen = ImageDataGenerator(
    horizontal_flip=True,
    zoom_range=0.15,
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1
)

# Load one sample image
sample_cls = 'cats' if os.path.exists(os.path.join(TRAIN_DIR, 'cats')) else 'cat'
sample_dir = os.path.join(TRAIN_DIR, sample_cls)
sample_file = os.path.join(sample_dir, os.listdir(sample_dir)[0])
sample_img  = img_to_array(load_img(sample_file, target_size=(224, 224)))
sample_img  = sample_img.reshape((1,) + sample_img.shape)

fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes[0, 0].imshow(load_img(sample_file, target_size=(224, 224)))
axes[0, 0].set_title('Original', fontweight='bold')
axes[0, 0].axis('off')

aug_iter = aug_gen.flow(sample_img, batch_size=1)
for i, ax in enumerate(axes.flatten()[1:]):
    batch = next(aug_iter)
    ax.imshow(batch[0].astype('uint8'))
    ax.set_title(f'Augmented {i+1}')
    ax.axis('off')

plt.suptitle('Data Augmentation — Same Image, Different Views for Training', fontsize=13)
plt.tight_layout()
plt.show()

## 8. Build the Model — MobileNetV2 + Custom Head

We split the model into two parts:

**Part 1 — Base (Frozen):** The pretrained MobileNetV2. We freeze its weights so it doesn't forget what it already knows about images.

**Part 2 — Head (Trainable):** Our new layers that learn to classify dogs vs cats.

```
Input Image (224×224×3)
        ↓
MobileNetV2 base (FROZEN — pretrained on ImageNet)
        ↓
GlobalAveragePooling2D  ← squashes feature maps to a vector
        ↓
Dense(128, ReLU) + BatchNorm + Dropout(0.3)
        ↓
Dense(1, Sigmoid)  ← outputs probability: 0=Cat, 1=Dog
```

In [None]:
def build_model():
    # Load MobileNetV2 pretrained on ImageNet, remove top classifier
    base = MobileNetV2(
        input_shape=(IMG_SIZE, IMG_SIZE, 3),
        include_top=False,       # Remove the ImageNet output layer
        weights='imagenet'       # Use pretrained weights
    )
    base.trainable = False       # Freeze: don't update pretrained weights yet

    # Build our classification head on top
    inputs  = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
    x       = base(inputs, training=False)           # Run through base
    x       = layers.GlobalAveragePooling2D()(x)     # Flatten feature maps
    x       = layers.Dense(128, activation='relu')(x)
    x       = layers.BatchNormalization()(x)
    x       = layers.Dropout(0.3)(x)
    outputs = layers.Dense(1, activation='sigmoid')(x)  # Binary output

    model = keras.Model(inputs, outputs)

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-3),
        loss='binary_crossentropy',
        metrics=['accuracy', keras.metrics.AUC(name='auc')]
    )
    return model, base


model, base_model = build_model()

# Summary — only show our new layers (base has 154 layers inside)
print(f'Total layers        : {len(model.layers)}')
print(f'Trainable params    : {model.count_params()["trainable_params"]:,}')
print(f'Non-trainable params: {model.count_params()["non_trainable_params"]:,}')
model.summary()

## 9. Phase 1 — Train the Head (Base Frozen)

First we only train our new dense layers. The MobileNetV2 base stays frozen.

**Why?** If we updated all layers at once with a big learning rate, we would destroy the carefully learned ImageNet features.

In [None]:
# Callbacks — automatic helpers during training
callbacks_phase1 = [
    EarlyStopping(
        monitor='val_auc', patience=3,
        restore_best_weights=True, mode='max',
        verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss', factor=0.5,
        patience=2, min_lr=1e-6, verbose=1
    )
]

print('Phase 1: Training classification head (base frozen)...')
history1 = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=EPOCHS_1,
    callbacks=callbacks_phase1,
    verbose=1
)

## 10. Phase 2 — Fine-Tuning (Unfreeze Top Layers)

Now we unfreeze the **last 30 layers** of MobileNetV2 and continue training with a very small learning rate.

**Why a small LR?** We don't want to destroy learned features — we just want to slightly nudge the base model to better fit our specific dataset.

In [None]:
# Unfreeze last 30 layers of the base model
base_model.trainable = True
for layer in base_model.layers[:-30]:
    layer.trainable = False

trainable_count = sum(1 for l in base_model.layers if l.trainable)
print(f'Unfrozen layers in base: {trainable_count} / {len(base_model.layers)}')

# Recompile with a much smaller learning rate
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss='binary_crossentropy',
    metrics=['accuracy', keras.metrics.AUC(name='auc')]
)

callbacks_phase2 = [
    EarlyStopping(
        monitor='val_auc', patience=3,
        restore_best_weights=True, mode='max', verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss', factor=0.3,
        patience=2, min_lr=1e-8, verbose=1
    )
]

print('\nPhase 2: Fine-tuning top layers of MobileNetV2...')
history2 = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=EPOCHS_2,
    callbacks=callbacks_phase2,
    verbose=1
)

## 11. Training Curves — What Happened During Training?

In [None]:
def merge_histories(h1, h2):
    merged = {}
    for key in h1.history:
        merged[key] = h1.history[key] + h2.history.get(key, [])
    return merged

hist = merge_histories(history1, history2)
phase1_end = len(history1.history['loss'])
total_epochs = len(hist['loss'])
ep = range(1, total_epochs + 1)

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

for ax, metric, ylabel in zip(
    axes,
    [('loss','val_loss'), ('accuracy','val_accuracy'), ('auc','val_auc')],
    ['Loss', 'Accuracy', 'ROC-AUC']
):
    train_key, val_key = metric
    ax.plot(ep, hist[train_key], 'o-', color='#3498db', lw=2, markersize=5, label='Train')
    ax.plot(ep, hist[val_key],   'o-', color='#e74c3c', lw=2, markersize=5, label='Validation')

    # Mark phase boundary
    ax.axvline(phase1_end + 0.5, color='black', linestyle='--', lw=1.2, alpha=0.6)
    ax.text(phase1_end * 0.5, ax.get_ylim()[0],
            'Phase 1\n(Frozen)', ha='center', fontsize=8, alpha=0.7)
    ax.text(phase1_end + (total_epochs - phase1_end) * 0.5, ax.get_ylim()[0],
            'Phase 2\n(Fine-tune)', ha='center', fontsize=8, alpha=0.7)

    ax.set_title(f'{ylabel} over Epochs')
    ax.set_xlabel('Epoch')
    ax.set_ylabel(ylabel)
    ax.legend()

plt.suptitle('Training History — Phase 1 (Frozen) + Phase 2 (Fine-tuning)', fontsize=13)
plt.tight_layout()
plt.show()

print(f'Best val accuracy : {max(hist["val_accuracy"]):.4f}')
print(f'Best val AUC      : {max(hist["val_auc"]):.4f}')

## 12. Evaluate on Validation Set

In [None]:
# Predict on entire validation set
val_gen.reset()
y_prob = model.predict(val_gen, verbose=1).flatten()
y_true = val_gen.classes
y_pred = (y_prob >= 0.5).astype(int)

class_names = list(val_gen.class_indices.keys())

print('\nClassification Report')
print('=' * 50)
print(classification_report(y_true, y_pred, target_names=class_names))
print(f'ROC-AUC Score: {roc_auc_score(y_true, y_prob):.4f}')

## 13. Confusion Matrix — Where Does the Model Make Mistakes?

In [None]:
cm = confusion_matrix(y_true, y_pred)

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Raw counts
im = axes[0].imshow(cm, cmap='Blues')
axes[0].set_xticks([0,1]); axes[0].set_xticklabels(class_names)
axes[0].set_yticks([0,1]); axes[0].set_yticklabels(class_names)
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')
axes[0].set_title('Confusion Matrix (counts)')
for i in range(2):
    for j in range(2):
        axes[0].text(j, i, cm[i,j], ha='center', va='center',
                     fontsize=20, fontweight='bold',
                     color='white' if cm[i,j] > cm.max()/2 else 'black')
plt.colorbar(im, ax=axes[0])

# Normalised percentages
cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True)
im2 = axes[1].imshow(cm_norm, cmap='Greens', vmin=0, vmax=1)
axes[1].set_xticks([0,1]); axes[1].set_xticklabels(class_names)
axes[1].set_yticks([0,1]); axes[1].set_yticklabels(class_names)
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('Actual')
axes[1].set_title('Confusion Matrix (normalised)')
for i in range(2):
    for j in range(2):
        axes[1].text(j, i, f'{cm_norm[i,j]:.2%}', ha='center', va='center',
                     fontsize=16, fontweight='bold',
                     color='white' if cm_norm[i,j] > 0.5 else 'black')
plt.colorbar(im2, ax=axes[1])

plt.suptitle('How Well Does the Model Classify?', fontsize=13)
plt.tight_layout()
plt.show()

tn, fp, fn, tp = cm.ravel()
print(f'True Positives  (Dog predicted as Dog)  : {tp}')
print(f'True Negatives  (Cat predicted as Cat)  : {tn}')
print(f'False Positives (Cat predicted as Dog)  : {fp}')
print(f'False Negatives (Dog predicted as Cat)  : {fn}')

## 14. ROC Curve — Visualising Model Discrimination

The ROC curve shows the tradeoff between **catching positives (sensitivity)** and **avoiding false alarms (specificity)**. A perfect model reaches the top-left corner. The area under the curve (AUC) should be as close to 1.0 as possible.

In [None]:
fpr, tpr, thresholds = roc_curve(y_true, y_prob)
auc_score = roc_auc_score(y_true, y_prob)

# Find optimal threshold (Youden's J statistic)
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold = thresholds[optimal_idx]

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# ROC Curve
axes[0].plot(fpr, tpr, color='#3498db', lw=2.5, label=f'MobileNetV2 (AUC = {auc_score:.4f})')
axes[0].plot([0,1], [0,1], 'k--', lw=1.2, label='Random Classifier (AUC = 0.5)')
axes[0].scatter(fpr[optimal_idx], tpr[optimal_idx],
                color='red', s=120, zorder=5,
                label=f'Best Threshold = {optimal_threshold:.2f}')
axes[0].set_xlabel('False Positive Rate (1 - Specificity)')
axes[0].set_ylabel('True Positive Rate (Sensitivity)')
axes[0].set_title('ROC Curve')
axes[0].legend()
axes[0].fill_between(fpr, tpr, alpha=0.1, color='#3498db')

# Prediction probability distribution
cats_mask = y_true == 0
dogs_mask = y_true == 1
axes[1].hist(y_prob[cats_mask], bins=40, alpha=0.6, color='#e74c3c',
             label='Cats (actual)', density=True)
axes[1].hist(y_prob[dogs_mask], bins=40, alpha=0.6, color='#3498db',
             label='Dogs (actual)', density=True)
axes[1].axvline(0.5, color='black', ls='--', lw=1.5, label='Default threshold (0.5)')
axes[1].axvline(optimal_threshold, color='green', ls='--', lw=1.5,
                label=f'Optimal threshold ({optimal_threshold:.2f})')
axes[1].set_xlabel('Predicted Probability (Dog)')
axes[1].set_ylabel('Density')
axes[1].set_title('Prediction Score Distribution')
axes[1].legend(fontsize=8)

plt.tight_layout()
plt.show()

## 15. Correct and Wrong Predictions — Visual Inspection

In [None]:
def show_predictions(val_gen, y_true, y_pred, y_prob, class_names,
                     correct=True, n=8):
    val_gen.reset()
    all_imgs, all_labels = [], []
    for imgs, labels in val_gen:
        all_imgs.append(imgs)
        all_labels.extend(labels)
        if len(all_labels) >= len(y_true): break
    all_imgs = np.concatenate(all_imgs, axis=0)[:len(y_true)]

    mask = (y_pred == y_true) if correct else (y_pred != y_true)
    indices = np.where(mask)[0]
    if len(indices) == 0:
        print('No samples found.')
        return
    indices = np.random.choice(indices, min(n, len(indices)), replace=False)

    cols = min(n, 8)
    fig, axes = plt.subplots(1, cols, figsize=(cols * 2.2, 3))
    if cols == 1: axes = [axes]

    for ax, idx in zip(axes, indices):
        ax.imshow(all_imgs[idx])
        ax.axis('off')
        prob = y_prob[idx]
        pred_lbl = class_names[y_pred[idx]]
        true_lbl = class_names[y_true[idx]]
        color = '#2ecc71' if correct else '#e74c3c'
        ax.set_title(f'Pred: {pred_lbl}\nTrue: {true_lbl}\n{prob:.2f}',
                     fontsize=8, color=color)

    title = 'Correct Predictions' if correct else 'Wrong Predictions (Mistakes)'
    plt.suptitle(title, fontsize=12, fontweight='bold')
    plt.tight_layout()
    plt.show()

print('Correctly classified samples:')
show_predictions(val_gen, y_true, y_pred, y_prob, class_names, correct=True)

print('\nMisclassified samples (where model was wrong):')
show_predictions(val_gen, y_true, y_pred, y_prob, class_names, correct=False)

## 16. Confidence Analysis — Is the Model Sure or Guessing?

In [None]:
confidence = np.abs(y_prob - 0.5) * 2  # 0=uncertain, 1=very confident
correct_mask = y_pred == y_true

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Confidence distribution — correct vs wrong
axes[0].hist(confidence[correct_mask],   bins=30, alpha=0.7,
             color='#2ecc71', label='Correct', density=True)
axes[0].hist(confidence[~correct_mask],  bins=30, alpha=0.7,
             color='#e74c3c', label='Wrong',   density=True)
axes[0].set_xlabel('Confidence Score')
axes[0].set_ylabel('Density')
axes[0].set_title('Confidence: Correct vs Wrong Predictions\n'
                  'Correct preds should be more confident')
axes[0].legend()

# Confidence bins — accuracy per bucket
bins = np.linspace(0, 1, 11)
bin_labels, bin_accs = [], []
for i in range(len(bins)-1):
    mask = (confidence >= bins[i]) & (confidence < bins[i+1])
    if mask.sum() > 0:
        bin_labels.append(f'{bins[i]:.1f}-{bins[i+1]:.1f}')
        bin_accs.append(correct_mask[mask].mean())

bar_colors = ['#2ecc71' if a >= 0.8 else '#e67e22' if a >= 0.6 else '#e74c3c' for a in bin_accs]
axes[1].bar(bin_labels, bin_accs, color=bar_colors, edgecolor='black')
axes[1].axhline(0.5, color='black', ls='--', lw=1, label='Random chance')
axes[1].set_xlabel('Confidence Bucket')
axes[1].set_ylabel('Accuracy in Bucket')
axes[1].set_title('Accuracy vs Confidence Level\n'
                  'Higher confidence should mean higher accuracy')
axes[1].tick_params(axis='x', rotation=45)
axes[1].legend()

# Prediction counts per bucket
bucket_counts = []
for i in range(len(bins)-1):
    mask = (confidence >= bins[i]) & (confidence < bins[i+1])
    bucket_counts.append(mask.sum())
axes[2].bar(bin_labels, bucket_counts, color='#3498db', edgecolor='black')
axes[2].set_xlabel('Confidence Bucket')
axes[2].set_ylabel('Number of Predictions')
axes[2].set_title('How Many Predictions in Each Confidence Range?')
axes[2].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print(f'Avg confidence on correct predictions : {confidence[correct_mask].mean():.4f}')
print(f'Avg confidence on wrong predictions   : {confidence[~correct_mask].mean():.4f}')

## 17. Final Metrics Summary

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

metrics = {
    'Accuracy' : accuracy_score(y_true, y_pred),
    'Precision': precision_score(y_true, y_pred),
    'Recall'   : recall_score(y_true, y_pred),
    'F1 Score' : f1_score(y_true, y_pred),
    'ROC-AUC'  : roc_auc_score(y_true, y_prob)
}

fig, ax = plt.subplots(figsize=(9, 5))
names  = list(metrics.keys())
values = list(metrics.values())
colors = ['#2ecc71' if v >= 0.9 else '#3498db' if v >= 0.8 else '#e67e22' for v in values]

bars = ax.bar(names, values, color=colors, edgecolor='black', width=0.5)
ax.set_ylim(0, 1.1)
ax.axhline(0.9, color='green', ls='--', lw=1, alpha=0.5, label='90% line')
ax.set_ylabel('Score')
ax.set_title('Final Model Performance — MobileNetV2 Dogs vs Cats')
ax.legend()

for bar, val in zip(bars, values):
    ax.text(bar.get_x() + bar.get_width()/2,
            bar.get_height() + 0.01,
            f'{val:.4f}', ha='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

print('\nFinal Metrics:')
for k, v in metrics.items():
    print(f'  {k:12s}: {v:.4f}')

## 18. Predict on a Single New Image

How to use the trained model on any new photo.

In [None]:
def predict_single_image(model, img_path, class_names, img_size=224):
    img  = load_img(img_path, target_size=(img_size, img_size))
    arr  = img_to_array(img) / 255.0
    inp  = np.expand_dims(arr, axis=0)
    prob = model.predict(inp, verbose=0)[0][0]

    pred_class = class_names[int(prob >= 0.5)]
    confidence = prob if prob >= 0.5 else 1 - prob

    fig, ax = plt.subplots(figsize=(4, 4))
    ax.imshow(img)
    ax.axis('off')
    color = '#3498db' if pred_class == 'dogs' else '#e74c3c'
    ax.set_title(
        f'Prediction: {pred_class.upper()}\n'
        f'Confidence: {confidence:.2%}\n'
        f'Raw probability (Dog): {prob:.4f}',
        fontsize=11, color=color, fontweight='bold'
    )
    plt.tight_layout()
    plt.show()
    return pred_class, confidence

# Demo with a sample from validation set
sample_cls_name = class_names[0]
sample_cls_dir  = os.path.join(TRAIN_DIR, sample_cls_name)
if not os.path.exists(sample_cls_dir):
    sample_cls_name = list(val_gen.class_indices.keys())[0]
    sample_cls_dir  = os.path.join(TRAIN_DIR, sample_cls_name)

demo_file = os.path.join(sample_cls_dir, os.listdir(sample_cls_dir)[10])
pred, conf = predict_single_image(model, demo_file, class_names)
print(f'Predicted: {pred} | Confidence: {conf:.2%}')

## 19. Key Takeaways

### What We Built

A binary image classifier using MobileNetV2 Transfer Learning that classifies dog and cat photos with high accuracy — trained in minutes instead of hours.

### Transfer Learning — Two Phase Strategy

| Phase | What We Did | Why |
|---|---|---|
| Phase 1 | Froze base, trained head only | Protect pretrained features |
| Phase 2 | Unfroze last 30 layers, tiny LR | Adapt features to our data |

### Important Concepts Recap

| Concept | What it means |
|---|---|
| Transfer Learning | Reuse features from a model trained on millions of images |
| Data Augmentation | Artificially increase training variety — prevents overfitting |
| Global Average Pooling | Converts spatial feature maps to a flat vector |
| Sigmoid output | Converts raw score to probability between 0 and 1 |
| Fine-tuning | Slightly updating pretrained layers with a very small LR |
| ROC-AUC | Measures discrimination ability — 1.0 = perfect, 0.5 = random |

### Tips for Better Results

- Use more data — the more images, the better the model
- Try EfficientNetB0 or B2 for better accuracy at moderate size
- Increase fine-tuning epochs carefully — monitor for overfitting
- Use learning rate warmup for more stable fine-tuning
- Consider test-time augmentation (TTA) for a small extra accuracy boost