##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

In [1]:
import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Reproducibility (optional)
tf.random.set_seed(42)
np.random.seed(42)

# Load CIFAR-10
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
y_train = y_train.squeeze().astype("int64")
y_test  = y_test.squeeze().astype("int64")
x_train = x_train.astype("float32")
x_test  = x_test.astype("float32")

print("Train:", x_train.shape, y_train.shape)
print("Test :", x_test.shape, y_test.shape)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 0us/step
Train: (50000, 32, 32, 3) (50000,)
Test : (10000, 32, 32, 3) (10000,)


In [2]:
# Small custom CNN baseline (fast)
cnn = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    layers.Rescaling(1./255),

    layers.Conv2D(32, 3, padding="same", activation="relu"),
    layers.MaxPooling2D(),

    layers.Conv2D(64, 3, padding="same", activation="relu"),
    layers.MaxPooling2D(),

    layers.Conv2D(128, 3, padding="same", activation="relu"),
    layers.GlobalAveragePooling2D(),

    layers.Dense(128, activation="relu"),
    layers.Dropout(0.3),
    layers.Dense(10)  # logits
], name="custom_cnn")

cnn.summary()

cnn.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
hist_cnn = cnn.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=128,
    verbose=1
)
cnn_time = time.time() - t0

cnn_loss, cnn_acc = cnn.evaluate(x_test, y_test, verbose=0)
print(f"Custom CNN test acc: {cnn_acc:.4f} | time: {cnn_time:.1f}s")


Epoch 1/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 276ms/step - accuracy: 0.2085 - loss: 2.0614 - val_accuracy: 0.3868 - val_loss: 1.6413
Epoch 2/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m139s[0m 267ms/step - accuracy: 0.3701 - loss: 1.6652 - val_accuracy: 0.4514 - val_loss: 1.5100
Epoch 3/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m97s[0m 276ms/step - accuracy: 0.4347 - loss: 1.5298 - val_accuracy: 0.4874 - val_loss: 1.3901
Custom CNN test acc: 0.4871 | time: 381.2s


In [3]:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

# Light augmentation
augment = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.1),
], name="augment")

# MobileNetV2 backbone (lightweight)
base_model = MobileNetV2(
    include_top=False,
    weights="imagenet",
    input_shape=(160, 160, 3)  # smaller than 224 -> faster
)
base_model.trainable = False

# Build model
inputs = keras.Input(shape=(32, 32, 3))
x = augment(inputs)
x = layers.Resizing(160, 160, interpolation="bilinear")(x)
x = layers.Lambda(preprocess_input)(x)  # IMPORTANT for ImageNet pretrained models
x = base_model(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(10)(x)  # logits

mobilenet = keras.Model(inputs, outputs, name="cifar10_mobilenetv2")

# Requirement: inspect architecture BEFORE training
mobilenet.summary()
print("Trainable weights:", len(mobilenet.trainable_weights), "Frozen weights:", len(mobilenet.non_trainable_weights))

mobilenet.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
hist_mnet = mobilenet.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=128,
    verbose=1
)
mnet_time = time.time() - t0

mnet_loss, mnet_acc = mobilenet.evaluate(x_test, y_test, verbose=0)
print(f"MobileNetV2 (frozen) test acc: {mnet_acc:.4f} | time: {mnet_time:.1f}s")


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_160_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


Trainable weights: 2 Frozen weights: 260
Epoch 1/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1177s[0m 3s/step - accuracy: 0.5345 - loss: 1.3691 - val_accuracy: 0.8236 - val_loss: 0.5248
Epoch 2/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1147s[0m 3s/step - accuracy: 0.7310 - loss: 0.7749 - val_accuracy: 0.8358 - val_loss: 0.4802
Epoch 3/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1161s[0m 3s/step - accuracy: 0.7517 - loss: 0.7164 - val_accuracy: 0.8384 - val_loss: 0.4748
MobileNetV2 (frozen) test acc: 0.8296 | time: 3487.3s


In [4]:
# Fine-tune last layers (small LR)
base_model.trainable = True
for layer in base_model.layers[:-20]:
    layer.trainable = False

print("Trainable backbone layers:", sum(l.trainable for l in base_model.layers), "/", len(base_model.layers))

mobilenet.compile(
    optimizer=keras.optimizers.Adam(1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
hist_mnet_ft = mobilenet.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=128,
    verbose=1
)
mnet_ft_time = time.time() - t0

mnet_ft_loss, mnet_ft_acc = mobilenet.evaluate(x_test, y_test, verbose=0)
print(f"MobileNetV2 (fine-tuned) test acc: {mnet_ft_acc:.4f} | time: {mnet_ft_time:.1f}s")


Trainable backbone layers: 20 / 154
Epoch 1/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1419s[0m 4s/step - accuracy: 0.6676 - loss: 0.9870 - val_accuracy: 0.8488 - val_loss: 0.4457
Epoch 2/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1383s[0m 4s/step - accuracy: 0.7447 - loss: 0.7481 - val_accuracy: 0.8560 - val_loss: 0.4276
Epoch 3/3
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1378s[0m 4s/step - accuracy: 0.7649 - loss: 0.6852 - val_accuracy: 0.8600 - val_loss: 0.4154
MobileNetV2 (fine-tuned) test acc: 0.8475 | time: 4185.6s


1) Which model achieved the highest accuracy?

MobileNetV2 (fine-tuned) achieved the highest accuracy: 0.8475.
(Next best is MobileNetV2 frozen: 0.8296, then Custom CNN: 0.4871.)

2) Which model trained faster?

Custom CNN trained fastest: 381.2s.
Among the transfer learning models:
Frozen MobileNetV2 (3487.3s) trained faster than fine-tuned (4185.6s) because fewer layers were trainable.

3) How might the architecture explain the differences?

MobileNetV2 starts with pretrained ImageNet features, so it already knows useful patterns (edges, textures, shapes). That gives a big accuracy jump compared to a CNN trained from scratch.

Fine-tuning unfreezes the last layers (you trained 20/154 layers) so the model can adapt those higher-level features to CIFAR-10, which explains why accuracy improved from 0.8296 → 0.8475.

Custom CNN is much smaller (~111k parameters) and trains on 32×32 images directly, so it’s much faster, but with only 3 epochs it can’t learn as rich features as a pretrained network → lower accuracy.

MobileNetV2 takes longer because the pipeline resizes to 160×160 and runs a deeper backbone (~2.26M params, even if most are frozen), which increases compute per step.