##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

### Load and Prepare CIFAR-10 Dataset

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load CIFAR-10
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

num_classes = 10
IMG_SIZE = 96   # resized to fit pretrained models
BATCH_SIZE = 64

def preprocess(images, labels):
    images = tf.image.resize(images, (IMG_SIZE, IMG_SIZE))
    images = tf.cast(images, tf.float32)
    images = tf.keras.applications.mobilenet_v2.preprocess_input(images)
    return images, labels

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)) \
            .shuffle(10000).batch(BATCH_SIZE) \
            .map(preprocess).prefetch(tf.data.AUTOTUNE)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)) \
            .batch(BATCH_SIZE) \
            .map(preprocess).prefetch(tf.data.AUTOTUNE)

  d = cPickle.load(f, encoding="bytes")


### Load Pretrained Model (Backbone)

In [2]:
mobilenet_base = tf.keras.applications.MobileNetV2(
    input_shape=(IMG_SIZE, IMG_SIZE, 3),
    include_top=False,
    weights="imagenet"
)

# Freeze all backbone layers initially
mobilenet_base.trainable = False

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_96_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


### Build Classification Model

In [3]:
inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))

x = mobilenet_base(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)

mobilenet_model = keras.Model(inputs, outputs)

### Inspect Architecture Before Training

In [4]:
mobilenet_model.summary()

print("Network depth:", len(mobilenet_model.layers))
print("Total parameters:", mobilenet_model.count_params())
print("Trainable parameters:",
      sum(tf.size(w).numpy() for w in mobilenet_model.trainable_weights))
print("Frozen parameters:",
      sum(tf.size(w).numpy() for w in mobilenet_model.non_trainable_weights))

Network depth: 5
Total parameters: 2270794
Trainable parameters: 12810
Frozen parameters: 2257984


### Train with Frozen Backbone (Transfer Learning Stage)

In [5]:
mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

history = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=5
)

Epoch 1/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 107ms/step - accuracy: 0.7902 - loss: 0.6195 - val_accuracy: 0.8424 - val_loss: 0.4425
Epoch 2/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 111ms/step - accuracy: 0.8458 - loss: 0.4495 - val_accuracy: 0.8519 - val_loss: 0.4241
Epoch 3/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 111ms/step - accuracy: 0.8563 - loss: 0.4202 - val_accuracy: 0.8580 - val_loss: 0.4091
Epoch 4/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 96ms/step - accuracy: 0.8599 - loss: 0.4071 - val_accuracy: 0.8589 - val_loss: 0.4033
Epoch 5/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m74s[0m 95ms/step - accuracy: 0.8585 - loss: 0.4024 - val_accuracy: 0.8602 - val_loss: 0.4055


### Fine-Tuning Stage

In [6]:
# Unfreeze backbone
mobilenet_base.trainable = True

# Freeze early layers and fine-tune last layers only
for layer in mobilenet_base.layers[:-30]:
    layer.trainable = False

print("Trainable layers in backbone:",
      sum(l.trainable for l in mobilenet_base.layers),
      "/", len(mobilenet_base.layers))

mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

history_ft = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=3
)

Trainable layers in backbone: 30 / 154
Epoch 1/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m108s[0m 132ms/step - accuracy: 0.8132 - loss: 0.6043 - val_accuracy: 0.8511 - val_loss: 0.4734
Epoch 2/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m104s[0m 133ms/step - accuracy: 0.8605 - loss: 0.4274 - val_accuracy: 0.8732 - val_loss: 0.4083
Epoch 3/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m104s[0m 134ms/step - accuracy: 0.8797 - loss: 0.3607 - val_accuracy: 0.8791 - val_loss: 0.3809


### Which model achieved the highest accuracy?

ResNet50V2 (fine-tuned) achieved the highest test accuracy at approximately 0.9162, outperforming both MobileNetV2 and the custom CNN.

### Which model trained faster?

ResNet trained faster than MobileNetV2. Despite MobileNetV2 being a lighter architecture, ResNet converged more quickly during training on the CIFAR-10 dataset.

### How might the architecture explain the differences?

ResNet uses residual (skip) connections that allow gradients to flow through deeper layers more effectively, enabling richer feature learning and higher accuracy. MobileNetV2 relies on depthwise separable convolutions, which reduce computational cost and model size but sacrifice some representational capacity compared to ResNet. The custom CNN, with its simpler and shallower architecture, lacks the depth and pretrained features of both transfer learning models, resulting in lower overall performance.