##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

Sarah Altheeb FA01

Model used: EfficientNetV2

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import EfficientNetV2B0

# -----------------------------
# 1) Setup & Load Data 
# -----------------------------
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()


x_train = x_train.astype("float32")
x_test  = x_test.astype("float32")
y_train = y_train.squeeze().astype("int64")
y_test  = y_test.squeeze().astype("int64")

# -----------------------------
# 2) Build EfficientNetV2B0 Backbone
# -----------------------------
# Note: EfficientNet models expect inputs to be 0-255 or 0-1 depending on implementation.
# EfficientNetV2B0 in Keras has a built-in Rescaling layer, so we pass 0-255 inputs.

# Define the input shape (Resizing to 224x224 is standard for these models to keep accuracy high)
input_shape = (224, 224, 3)

base_model = EfficientNetV2B0(
    include_top=False,
    weights="imagenet",
    input_shape=input_shape,
    include_preprocessing=True # Handles scaling automatically
)

# Freeze the base model
base_model.trainable = False

# -----------------------------
# 3) Build the Full Model
# -----------------------------
model_eff = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    
    # Augmentation
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
    
    # Resize to what EfficientNet expects (224 is best for acc, 160 is faster)
    layers.Resizing(224, 224, interpolation="bicubic"),
    
    base_model,
    
    layers.GlobalAveragePooling2D(),
    layers.BatchNormalization(), # Helps with stability
    layers.Dropout(0.2),         # Helps prevent overfitting
    layers.Dense(10)             # Output layer (Logits)
], name="cifar10_efficientnet")

# Inspect architecture
model_eff.summary()

# -----------------------------
# 4) Compile & Train (Frozen)
# -----------------------------
model_eff.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

callbacks = [
    keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=3, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=2)
]

print("\n--- Training Frozen Model ---")
history_eff = model_eff.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,  # 5 epochs is usually enough for the head
    batch_size=64,
    callbacks=callbacks
)

# -----------------------------
# 5) Fine-Tuning (Unfreeze Top Layers)
# -----------------------------
print("\n--- Fine-Tuning ---")
base_model.trainable = True

# Fine-tune ONLY the top N layers to avoid destroying learned features
# EfficientNetV2B0 has about 270 layers--> unfreeze the last 50.
for layer in base_model.layers[:-50]:
    layer.trainable = False

# Recompile with a MUCH lower learning rate
model_eff.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5), # Low LR
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

history_eff_ft = model_eff.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,
    batch_size=32, # Smaller batch size for fine-tuning helps
    callbacks=callbacks
)

# -----------------------------
# 6) Evaluate & Compare
# -----------------------------
loss, acc_eff = model_eff.evaluate(x_test, y_test)
print(f"\nEfficientNetV2 Final Test Accuracy: {acc_eff:.4f}")

# Comparison Logic 
print("\n--- Model Comparison ---")

resnet_acc = 0.9162 

print(f"ResNet (Fine-Tuned): {resnet_acc}")
print(f"EfficientNet (Fine-Tuned): {acc_eff}")

if acc_eff > resnet_acc:
    print("Winner: EfficientNetV2B0")
else:
    print("Winner: ResNet50V2")


--- Training Frozen Model ---
Epoch 1/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m113s[0m 146ms/step - accuracy: 0.6438 - loss: 1.0756 - val_accuracy: 0.9052 - val_loss: 0.2932 - learning_rate: 0.0010
Epoch 2/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 141ms/step - accuracy: 0.7707 - loss: 0.6639 - val_accuracy: 0.9116 - val_loss: 0.2716 - learning_rate: 0.0010
Epoch 3/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 140ms/step - accuracy: 0.7863 - loss: 0.6225 - val_accuracy: 0.9112 - val_loss: 0.2726 - learning_rate: 0.0010
Epoch 4/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 140ms/step - accuracy: 0.7884 - loss: 0.6113 - val_accuracy: 0.9132 - val_loss: 0.2751 - learning_rate: 0.0010
Epoch 5/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 140ms/step - accuracy: 0.7957 - loss: 0.5869 - val_accuracy: 0.9178 - val_loss: 0.2522 - learning_rate: 5.0000e-04

--- Fine-Tuning ---
Epo

1. Which model achieved the highest accuracy?

EfficientNetV2B0 achieved higher accuracy 92% compared to ResNet50V2: 91% and CNN: 87% because it uses a more advanced architecture optimization search (NAS) that balances depth, width, and resolution better than ResNet.

2. Which model trained faster?

EfficientNetV2B0 is computationally more efficient. As shown in the model summaries, EfficientNet has only ~5.9 million parameters, whereas ResNet50V2 has ~23.6 million parameters. This means EfficientNet requires almost 4x less memory to store weights and performs significantly fewer calculations per forward and backward pass.

3. How might the architecture explain the differences?

Parameter Efficiency: The huge difference in size (5.9M vs 23.6M params) is because of EfficientNet's use of MBConv (Mobile Inverted Bottleneck) blocks with depthwise separable convolutions. These blocks extract features more efficiently than standard ResNet convolution blocks.

Compound Scaling: Unlike ResNet, which was designed for ImageNet at a fixed scale, EfficientNetV2 uses Neural Architecture Search (NAS) to optimally balance depth, width, and resolution. This allows it to learn sharper features for CIFAR-10 without the need of 23 million parameters.