##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

In [5]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Fix label shape
y_train = y_train.squeeze()
y_test = y_test.squeeze()

print("Train shape:", x_train.shape)
print("Test shape :", x_test.shape)

Train shape: (50000, 32, 32, 3)
Test shape : (10000, 32, 32, 3)


In [6]:
IMG_SIZE = 160
NUM_CLASSES = 10

def build_efficientnet_model(train_backbone=False):
    
    backbone = tf.keras.applications.EfficientNetB0(
        include_top=False,
        weights="imagenet",
        input_shape=(IMG_SIZE, IMG_SIZE, 3)
    )
    
    backbone.trainable = train_backbone  # Frozen
    
    inputs = keras.Input(shape=(32, 32, 3))
    
    # Data augmentation
    x = layers.RandomFlip("horizontal")(inputs)
    x = layers.RandomRotation(0.1)(x)
    
    # Resize to match EfficientNet input
    x = layers.Resizing(IMG_SIZE, IMG_SIZE)(x)
    
    # Preprocess for EfficientNet
    x = tf.keras.applications.efficientnet.preprocess_input(x)
    
    # Backbone forward pass
    x = backbone(x, training=False)
    
    # Classification head
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(256, activation="relu")(x)
    x = layers.Dropout(0.4)(x)
    outputs = layers.Dense(NUM_CLASSES)(x)  # logits
    
    model = keras.Model(inputs, outputs, name="EfficientNetB0_CIFAR10")
    
    return model, backbone


model, backbone = build_efficientnet_model(train_backbone=False)

In [7]:
model.summary()

print("\n=== Parameter Summary ===")
print("Total params       :", model.count_params())
print("Trainable params   :", sum(tf.size(w) for w in model.trainable_weights))
print("Non-trainable params:", sum(tf.size(w) for w in model.non_trainable_weights))

print("\nBackbone layers:", len(backbone.layers))
print("Backbone trainable:", backbone.trainable)


=== Parameter Summary ===
Total params       : 4380077
Trainable params   : tf.Tensor(330506, shape=(), dtype=int32)
Non-trainable params: tf.Tensor(4049571, shape=(), dtype=int32)

Backbone layers: 238
Backbone trainable: False


In [8]:
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=3e-4),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

callbacks = [
    keras.callbacks.ReduceLROnPlateau(
        monitor="val_accuracy",
        factor=0.5,
        patience=2,
        verbose=1
    ),
    keras.callbacks.EarlyStopping(
        monitor="val_accuracy",
        patience=3,
        restore_best_weights=True,
        verbose=1
    )
]

history = model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=8,
    batch_size=64,
    callbacks=callbacks
)

Epoch 1/8
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 87ms/step - accuracy: 0.6282 - loss: 1.1006 - val_accuracy: 0.8900 - val_loss: 0.3284 - learning_rate: 3.0000e-04
Epoch 2/8
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 82ms/step - accuracy: 0.7824 - loss: 0.6311 - val_accuracy: 0.9024 - val_loss: 0.2867 - learning_rate: 3.0000e-04
Epoch 3/8
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 84ms/step - accuracy: 0.8004 - loss: 0.5836 - val_accuracy: 0.9020 - val_loss: 0.2755 - learning_rate: 3.0000e-04
Epoch 4/8
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 83ms/step - accuracy: 0.8086 - loss: 0.5613 - val_accuracy: 0.9092 - val_loss: 0.2562 - learning_rate: 3.0000e-04
Epoch 5/8
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 83ms/step - accuracy: 0.8179 - loss: 0.5244 - val_accuracy: 0.9106 - val_loss: 0.2560 - learning_rate: 3.0000e-04
Epoch 6/8
[1m704/704[0m [32m━━━━━━━━━━━━━━

In [9]:
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)

print("\nEfficientNetB0 (Frozen) Test Accuracy:", test_acc)
print("EfficientNetB0 (Frozen) Test Loss:", test_loss)


EfficientNetB0 (Frozen) Test Accuracy: 0.9103000164031982
EfficientNetB0 (Frozen) Test Loss: 0.26787564158439636


In [10]:
#  6. Fine-tune EfficientNetB0 (Unfreeze last layers)

# Unfreeze backbone
backbone.trainable = True

# Fine-tune only the last N layers (keep earlier layers frozen)
N = 30  # try 20, 30, 50
for layer in backbone.layers[:-N]:
    layer.trainable = False

print("Backbone trainable:", backbone.trainable)
print("Trainable backbone layers:", sum(l.trainable for l in backbone.layers), "/", len(backbone.layers))

# IMPORTANT: re-compile after changing trainable flags
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

callbacks_ft = [
    keras.callbacks.ReduceLROnPlateau(
        monitor="val_accuracy",
        factor=0.5,
        patience=1,
        verbose=1
    ),
    keras.callbacks.EarlyStopping(
        monitor="val_accuracy",
        patience=2,
        restore_best_weights=True,
        verbose=1
    )
]

history_ft = model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,
    batch_size=64,
    callbacks=callbacks_ft
)

#  Evaluate after fine-tuning
test_loss_ft, test_acc_ft = model.evaluate(x_test, y_test, verbose=0)

print("\nEfficientNetB0 (Fine-tuned) Test Accuracy:", test_acc_ft)
print("EfficientNetB0 (Fine-tuned) Test Loss:", test_loss_ft)

Backbone trainable: True
Trainable backbone layers: 30 / 238
Epoch 1/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m86s[0m 104ms/step - accuracy: 0.7453 - loss: 0.7509 - val_accuracy: 0.8960 - val_loss: 0.3151 - learning_rate: 1.0000e-05
Epoch 2/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 98ms/step - accuracy: 0.7980 - loss: 0.5921 - val_accuracy: 0.9026 - val_loss: 0.2829 - learning_rate: 1.0000e-05
Epoch 3/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 98ms/step - accuracy: 0.8176 - loss: 0.5333 - val_accuracy: 0.9096 - val_loss: 0.2642 - learning_rate: 1.0000e-05
Epoch 4/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 99ms/step - accuracy: 0.8200 - loss: 0.5177 - val_accuracy: 0.9126 - val_loss: 0.2528 - learning_rate: 1.0000e-05
Epoch 5/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 98ms/step - accuracy: 0.8304 - loss: 0.4909 - val_accuracy: 0.9158 - val_loss: 0.2467 - learning_

### 1. Which model achieved the highest accuracy?

The highest accuracy was achieved by **ResNet50V2 (fine-tuned)** with a test accuracy of **91.62%**.

Although EfficientNetB0 performed very strongly (91.03% frozen and 91.07% fine-tuned), fine-tuned ResNet50V2 slightly outperformed it.

The custom CNN achieved 70.28%, which was significantly lower than both pretrained models.


---

### 2. Which model trained faster?

The **custom CNN** trained the fastest because it is smaller and trained from scratch without a large pretrained backbone.

Among the transfer learning models:
- **EfficientNetB0 (frozen)** trained faster since only the classifier head was updated.
- **ResNet50V2 (fine-tuned)** trained the slowest because many deep layers were unfrozen and updated during training.

In general, models with more trainable parameters require more computation time.


---

### 3. How might the architecture explain the differences?

The differences in performance can be explained by architectural design:

- **Custom CNN** is relatively shallow and lacks pretrained knowledge, so it has limited feature extraction capability compared to large ImageNet-trained models.

- **EfficientNetB0** uses compound scaling (balanced depth, width, and resolution), making it highly parameter-efficient. Its pretrained features transferred very well to CIFAR-10, which is why it performed strongly even when frozen.

- **ResNet50V2** uses deep residual connections that improve gradient flow and allow effective fine-tuning. When layers were unfrozen, the network adapted better to CIFAR-10, leading to the highest overall accuracy.

Overall, transfer learning significantly improved performance, and architectural design influenced how much each model benefited from fine-tuning.