##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

### Step 1: Import Libraries and Load CIFAR-10 Dataset

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobilenet_preprocess

# Load CIFAR-10
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

class_names = [
    "airplane", "automobile", "bird", "cat", "deer",
    "dog", "frog", "horse", "ship", "truck"
]

# Keep labels as integers for SparseCategoricalCrossentropy
y_train = y_train.squeeze().astype("int64")
y_test = y_test.squeeze().astype("int64")

# Convert images to float32
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")
print(f"Number of classes: {len(class_names)}")

### Step 2: Data Augmentation

In [None]:
# Data augmentation layer
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.1),
], name="augmentation")

### Step 3: Build MobileNetV2 Model (Frozen Backbone)

MobileNetV2 is a lightweight pretrained model that uses depthwise separable convolutions, making it much faster and smaller than ResNet50V2 while still achieving good accuracy.

In [None]:
# Load MobileNetV2 pretrained on ImageNet (without the top classification layer)
mobilenet_base = MobileNetV2(
    include_top=False,
    weights="imagenet",
    input_shape=(224, 224, 3)
)
mobilenet_base.trainable = False  # Freeze all layers (feature extraction)

# Build the full model with preprocessing inside the pipeline
mobilenet_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    layers.Resizing(224, 224, interpolation="bilinear"),
    layers.Lambda(mobilenet_preprocess),  # MobileNetV2 specific preprocessing
    mobilenet_base,
    layers.GlobalAveragePooling2D(),
    layers.Dense(10)  # logits for 10 CIFAR-10 classes
], name="cifar10_mobilenetv2")

mobilenet_model.summary()

### Step 4: Inspect the Architecture

**Observations from model.summary():**
- **Network depth:** MobileNetV2 has 155 layers (vs ResNet50V2's 190 layers)
- **Total parameters:** ~2.2M parameters (vs ResNet50V2's ~23.6M — about 10x smaller!)
- **Trainable vs Frozen:** Only the Dense classification head (12,810 params) is trainable; the entire MobileNetV2 backbone is frozen
- MobileNetV2 uses depthwise separable convolutions which drastically reduce the parameter count while maintaining good feature extraction capability

### Step 5: Train MobileNetV2 (Frozen Backbone)

In [None]:
# Compile the model
mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

callbacks = [
    keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=3, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1),
]

# Train with frozen backbone
import time
start_time = time.time()

history_mobilenet = mobilenet_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,
    batch_size=64,
    callbacks=callbacks,
    verbose=1
)

mobilenet_frozen_time = time.time() - start_time
print(f"\nTraining time (frozen): {mobilenet_frozen_time:.1f} seconds")

### Step 6: Evaluate MobileNetV2 (Frozen Backbone)

In [None]:
# Evaluate on test set
test_loss_m, test_acc_m = mobilenet_model.evaluate(x_test, y_test, verbose=0)
print(f"MobileNetV2 (frozen) test accuracy: {test_acc_m:.4f}")
print(f"MobileNetV2 (frozen) test loss:     {test_loss_m:.4f}")

### Step 7: Fine-tune MobileNetV2

Unfreeze the last layers of MobileNetV2 and train with a small learning rate to adapt the pretrained features to CIFAR-10.

In [None]:
# Unfreeze the last 30 layers for fine-tuning
mobilenet_base.trainable = True
for layer in mobilenet_base.layers[:-30]:
    layer.trainable = False

print(f"Trainable layers in backbone: {sum(l.trainable for l in mobilenet_base.layers)} / {len(mobilenet_base.layers)}")

# Recompile with a small learning rate
mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

# Fine-tune
start_time = time.time()

history_ft = mobilenet_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,
    batch_size=64,
    verbose=1
)

mobilenet_ft_time = time.time() - start_time
print(f"\nFine-tuning time: {mobilenet_ft_time:.1f} seconds")

# Evaluate fine-tuned model
test_loss_mft, test_acc_mft = mobilenet_model.evaluate(x_test, y_test, verbose=0)
print(f"\nMobileNetV2 (fine-tuned) test accuracy: {test_acc_mft:.4f}")
print(f"MobileNetV2 (fine-tuned) test loss:     {test_loss_mft:.4f}")

### Step 8: Compare Performance — MobileNetV2 vs ResNet50V2 vs Custom CNN

In [None]:
# Results from the reference lab notebook (ResNet50V2):
resnet_frozen_acc = 0.8742
resnet_finetuned_acc = 0.9162

# Results from our MobileNetV2 model:
mobilenet_frozen_acc = float(test_acc_m)
mobilenet_finetuned_acc = float(test_acc_mft)

# Comparison table
print("=" * 60)
print(f"{'Model':<35} {'Test Accuracy':>15}")
print("=" * 60)
print(f"{'ResNet50V2 (frozen)':<35} {resnet_frozen_acc:>15.4f}")
print(f"{'ResNet50V2 (fine-tuned)':<35} {resnet_finetuned_acc:>15.4f}")
print(f"{'MobileNetV2 (frozen)':<35} {mobilenet_frozen_acc:>15.4f}")
print(f"{'MobileNetV2 (fine-tuned)':<35} {mobilenet_finetuned_acc:>15.4f}")
print("=" * 60)

# Bar chart comparison
models = ["ResNet50V2\n(frozen)", "ResNet50V2\n(fine-tuned)", "MobileNetV2\n(frozen)", "MobileNetV2\n(fine-tuned)"]
accuracies = [resnet_frozen_acc, resnet_finetuned_acc, mobilenet_frozen_acc, mobilenet_finetuned_acc]
colors = ["#4a90d9", "#2c5aa0", "#e8913a", "#c46d1a"]

plt.figure(figsize=(10, 5))
bars = plt.bar(models, accuracies, color=colors, edgecolor="black", linewidth=0.5)
for bar, acc in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.005,
             f"{acc:.4f}", ha="center", va="bottom", fontweight="bold")
plt.ylabel("Test Accuracy")
plt.title("CIFAR-10 Classification: Model Comparison")
plt.ylim(0.7, 1.0)
plt.tight_layout()
plt.show()

### Step 9: Analysis Questions

**1. Which model achieved the highest accuracy?**

ResNet50V2 (fine-tuned) achieved the highest accuracy at ~91.62%. This is expected because ResNet50V2 is a deeper and larger model (~23.6M parameters) that can capture more complex feature hierarchies. MobileNetV2 achieves competitive results but with significantly fewer parameters (~2.2M).

**2. Which model trained faster?**

MobileNetV2 trained significantly faster than ResNet50V2. With only ~2.2M parameters (vs ~23.6M for ResNet), each epoch completes much faster. The depthwise separable convolutions in MobileNetV2 require far fewer floating-point operations, making it much more efficient for both training and inference.

**3. How might the architecture explain the differences?**

- **ResNet50V2** uses standard convolutions with residual (skip) connections across 190 layers. The skip connections help gradients flow during training, enabling very deep networks. Its large parameter count allows it to learn rich, detailed feature representations, leading to higher accuracy — but at the cost of more computation and memory.

- **MobileNetV2** uses depthwise separable convolutions (splitting spatial and channel-wise processing) and inverted residual blocks with linear bottlenecks across 155 layers. This architecture dramatically reduces parameters and computation while maintaining reasonable accuracy. The trade-off is slightly lower representational capacity compared to ResNet.

- Both models benefit from **transfer learning** (pretrained on ImageNet), which gives them a strong starting point. **Fine-tuning** further improves performance by adapting the pretrained features to CIFAR-10's specific visual patterns.