##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, applications

# Load CIFAR-10 Dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
y_train, y_test = y_train.squeeze().astype("int64"), y_test.squeeze().astype("int64")
x_train, x_test = x_train.astype("float32"), x_test.astype("float32")

#  Data Augmentation 
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.1),
], name="augmentation")

#  Build MobileNetV2 Backbone
mobilenet_base = applications.MobileNetV2(
    weights='imagenet', 
    include_top=False, 
    input_shape=(224, 224, 3)
)
mobilenet_base.trainable = False  # Initially frozen

# Full Model Construction
model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    layers.Resizing(224, 224, interpolation="bilinear"),
    layers.Lambda(applications.mobilenet_v2.preprocess_input),
    mobilenet_base,
    layers.GlobalAveragePooling2D(),
    layers.Dense(10)
], name="cifar10_mobilenetv2_comparison")

# ARCHITECTURE INSPECTION 
print("--- Architecture Observation ---")
print(f"Total layers in backbone (Network Depth): {len(mobilenet_base.layers)}")
model.summary() 


#  Phase 1: Feature Extraction (Frozen Backbone)
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

print("\nStarting Phase 1: Feature Extraction...")
model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.1)

#  Phase 2: Fine-Tuning (Unfreezing top layers)
mobilenet_base.trainable = True
for layer in mobilenet_base.layers[:-30]: 
    layer.trainable = False

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

print("\nStarting Phase 2: Fine-Tuning...")
model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.1)

--- Architecture Observation ---
Total layers in backbone (Network Depth): 154



Starting Phase 1: Feature Extraction...
Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 107ms/step - accuracy: 0.5894 - loss: 1.1805 - val_accuracy: 0.7968 - val_loss: 0.5878
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 107ms/step - accuracy: 0.7377 - loss: 0.7491 - val_accuracy: 0.8356 - val_loss: 0.4862
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 108ms/step - accuracy: 0.7576 - loss: 0.6972 - val_accuracy: 0.8260 - val_loss: 0.5079

Starting Phase 2: Fine-Tuning...
Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m109s[0m 142ms/step - accuracy: 0.6763 - loss: 0.9334 - val_accuracy: 0.8202 - val_loss: 0.5330
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 140ms/step - accuracy: 0.7704 - loss: 0.6615 - val_accuracy: 0.8368 - val_loss: 0.4694
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m100s[0m 142ms/step - accuracy: 0.7

<keras.src.callbacks.history.History at 0x7a876f5a54f0>

In [4]:
# FINAL EVALUATION 
print("\n--- Final Model Evaluation ---")
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=1)
print(f"MobileNetV2 Final Test Accuracy: {test_acc:.4f}")
print(f"MobileNetV2 Final Test Loss: {test_loss:.4f}")


--- Final Model Evaluation ---
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 51ms/step - accuracy: 0.8412 - loss: 0.4535
MobileNetV2 Final Test Accuracy: 0.8460
MobileNetV2 Final Test Loss: 0.4455


### Questions:

- **Which model achieved the highest accuracy?**

ResNet50V2 achieved the highest accuracy among the three models.

-**Which model trained faster?**

The Custom CNN was the fastest to train, but between the two large pre-trained models, MobileNetV2 was significantly faster than ResNet50V2.

- **How might the architecture explain the differences?**

**ResNet50V2 (Residual Learning):** Its superior accuracy is due to Residual Blocks (skip connections). These allow the model to be much deeper (190 layers total) without the gradients vanishing.

**MobileNetV2 (Inverted Residuals):** This model is optimized for mobile efficiency rather than maximum depth. It uses Depthwise Separable Convolutions, which perform nearly 8–9 times fewer mathematical operations ($FLOPs$) than standard convolutions. This is why it has only 2.3 million parameters compared to ResNet’s 23.5 million, explaining its faster speed despite a slight drop in accuracy.

**Custom CNN:** This model consists of only a few basic convolutional and pooling layers. Its speed comes from this simplicity, but it lacks the pre-trained weights and architectural depth necessary to reach the high accuracy levels achieved by other pre-traind models.
