##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

## 1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
## You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

In [1]:
import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

SEED = 42
tf.random.set_seed(SEED)
np.random.seed(SEED)

# Load CIFAR-10
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
y_train = y_train.squeeze()
y_test  = y_test.squeeze()

IMG_SIZE = 96
BATCH_SIZE = 128
NUM_CLASSES = 10
AUTOTUNE = tf.data.AUTOTUNE

In [2]:
def preprocess_for_mobilenet(x, y, training=False):
    x = tf.image.convert_image_dtype(x, tf.float32)
    x = tf.image.resize(x, (IMG_SIZE, IMG_SIZE))
    x = x * 255.0
    x = tf.keras.applications.mobilenet_v2.preprocess_input(x)

    if training:
        x = tf.image.random_flip_left_right(x)

    return x, y

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))\
    .shuffle(20000)\
    .map(lambda a,b: preprocess_for_mobilenet(a,b, True),
         num_parallel_calls=AUTOTUNE)\
    .batch(BATCH_SIZE).prefetch(AUTOTUNE)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))\
    .map(lambda a,b: preprocess_for_mobilenet(a,b, False),
         num_parallel_calls=AUTOTUNE)\
    .batch(BATCH_SIZE).prefetch(AUTOTUNE)

In [3]:
base = tf.keras.applications.MobileNetV2(
    include_top=False,
    weights="imagenet",
    input_shape=(IMG_SIZE, IMG_SIZE, 3)
)

base.trainable = False  # freeze for stage 1

inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = base(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(NUM_CLASSES, activation="softmax")(x)

mobilenet = keras.Model(inputs, outputs, name="MobileNetV2_CIFAR10")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_96_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


## 2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

In [4]:
mobilenet.summary()

trainable_params = sum([np.prod(v.shape) for v in mobilenet.trainable_weights])
frozen_params = sum([np.prod(v.shape) for v in mobilenet.non_trainable_weights])

print("Depth:", len(mobilenet.layers))
print("Total params:", trainable_params + frozen_params)
print("Trainable:", trainable_params)
print("Frozen:", frozen_params)

Depth: 5
Total params: 2270794
Trainable: 12810
Frozen: 2257984


## Back to task 1

In [5]:
mobilenet.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

t0 = time.time()

mobilenet.fit(train_ds, epochs=5, validation_data=test_ds)

time_stage1 = time.time() - t0

Epoch 1/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 91ms/step - accuracy: 0.7631 - loss: 0.7063 - val_accuracy: 0.8445 - val_loss: 0.4475
Epoch 2/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 90ms/step - accuracy: 0.8430 - loss: 0.4589 - val_accuracy: 0.8521 - val_loss: 0.4250
Epoch 3/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 90ms/step - accuracy: 0.8558 - loss: 0.4216 - val_accuracy: 0.8585 - val_loss: 0.4101
Epoch 4/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 90ms/step - accuracy: 0.8617 - loss: 0.4037 - val_accuracy: 0.8556 - val_loss: 0.4193
Epoch 5/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 90ms/step - accuracy: 0.8631 - loss: 0.3994 - val_accuracy: 0.8596 - val_loss: 0.4011


In [6]:
base.trainable = True

FINE_TUNE_AT = 100
for layer in base.layers[:FINE_TUNE_AT]:
    layer.trainable = False

mobilenet.compile(
    optimizer=keras.optimizers.Adam(1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

t1 = time.time()

mobilenet.fit(train_ds, epochs=5, validation_data=test_ds)

time_stage2 = time.time() - t1

Epoch 1/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 150ms/step - accuracy: 0.8689 - loss: 0.4090 - val_accuracy: 0.8604 - val_loss: 0.4312
Epoch 2/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 150ms/step - accuracy: 0.9355 - loss: 0.1882 - val_accuracy: 0.8772 - val_loss: 0.4275
Epoch 3/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 149ms/step - accuracy: 0.9583 - loss: 0.1174 - val_accuracy: 0.8916 - val_loss: 0.3823
Epoch 4/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 150ms/step - accuracy: 0.9757 - loss: 0.0728 - val_accuracy: 0.9050 - val_loss: 0.3228
Epoch 5/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 150ms/step - accuracy: 0.9832 - loss: 0.0491 - val_accuracy: 0.9058 - val_loss: 0.3425


In [7]:
test_loss, test_acc = mobilenet.evaluate(test_ds)
total_time = time_stage1 + time_stage2

print("MobileNetV2 Accuracy:", test_acc)
print("Training Time:", total_time)

[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 75ms/step - accuracy: 0.9058 - loss: 0.3425
MobileNetV2 Accuracy: 0.9057999849319458
Training Time: 475.1380388736725


## 3. Then compare its performance with ResNet and the custom CNN.

In [10]:

cnn_test_acc = 0.7027999758720398

resnet_test_acc = 0.9161999821662903

mobilenet_test_acc = 0.9057999849319458


results = [
    ("ResNet50V2 (fine-tuned)", resnet_test_acc),
    ("MobileNetV2", mobilenet_test_acc),
    ("Custom CNN", cnn_test_acc)
]

print("Model Comparison (CIFAR-10)")
print("-" * 60)
for name, acc in results:
    print(f"{name:<25} | Accuracy: {acc:.4f}")

Model Comparison (CIFAR-10)
------------------------------------------------------------
ResNet50V2 (fine-tuned)   | Accuracy: 0.9162
MobileNetV2               | Accuracy: 0.9058
Custom CNN                | Accuracy: 0.7028


# 1) Accuracy Comparison
- ResNet50V2 (fine-tuned): 0.9162  → highest accuracy
- MobileNetV2 (fine-tuned):  0.9058
- Basic CNN: 0.7028
- ResNet50V2 achieved the best performance on CIFAR-10, followed closely by MobileNetV2. Both transfer learning models significantly outperform the basic CNN trained from scratch.


# 2) Training Speed
- The basic CNN trained the fastest because it has fewer layers and parameters.
- MobileNetV2 required moderate training time due to its lightweight architecture.
- ResNet50V2 was the slowest because of its deeper structure and larger number of parameters.


# 3) Architecture Explanation
- ResNet50V2 uses residual (skip) connections that allow very deep networks to learn effectively.
- MobileNetV2 uses depthwise-separable convolutions, making it computationally efficient.
- The basic CNN has a simple architecture with limited depth, which results in faster training, but lower accuracy compared to transfer learning models.
