##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

Imports

In [12]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import time

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess

from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input as vgg_preprocess


Load CIFAR-10

In [13]:
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

print("Train shape:", x_train.shape)
print("Test shape:", x_test.shape)

num_classes = 10


Train shape: (50000, 32, 32, 3)
Test shape: (10000, 32, 32, 3)


Data Augmentation

In [14]:
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
])


Custom CNN

In [15]:
custom_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    layers.Rescaling(1./255),
    data_augmentation,

    layers.Conv2D(32, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(128, 3, activation="relu"),
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation="relu"),
    layers.Dense(10)
], name="custom_cnn")

custom_model.summary()

custom_model.compile(
    optimizer="adam",
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
history_custom = custom_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=5,
    batch_size=64
)
custom_time = time.time() - t0

test_loss_custom, test_acc_custom = custom_model.evaluate(x_test, y_test)
print("Custom CNN Test Accuracy:", test_acc_custom)
print("Custom CNN Training Time:", custom_time)


Epoch 1/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 7ms/step - accuracy: 0.2540 - loss: 1.9649 - val_accuracy: 0.3940 - val_loss: 1.5949
Epoch 2/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.4164 - loss: 1.5842 - val_accuracy: 0.4790 - val_loss: 1.4221
Epoch 3/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 8ms/step - accuracy: 0.4673 - loss: 1.4563 - val_accuracy: 0.5178 - val_loss: 1.3148
Epoch 4/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.5088 - loss: 1.3614 - val_accuracy: 0.5272 - val_loss: 1.3628
Epoch 5/5
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.5389 - loss: 1.2827 - val_accuracy: 0.5774 - val_loss: 1.1767
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.5779 - loss: 1.1830
Custom CNN Test Accuracy: 0.5723000168800354
Custom CNN Training Time: 27.334829092025

ResNet50 (Frozen)

In [16]:
resnet_base = ResNet50(
    include_top=False,
    weights="imagenet",
    input_shape=(224, 224, 3)
)
resnet_base.trainable = False

resnet_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    layers.Resizing(224, 224),
    layers.Lambda(resnet_preprocess),
    resnet_base,
    layers.GlobalAveragePooling2D(),
    layers.Dense(10)
], name="resnet50_model")

resnet_model.summary()

resnet_model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
history_resnet = resnet_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=64
)
resnet_time = time.time() - t0

test_loss_resnet, test_acc_resnet = resnet_model.evaluate(x_test, y_test)
print("ResNet Frozen Test Accuracy:", test_acc_resnet)
print("ResNet Frozen Training Time:", resnet_time)


Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m214s[0m 295ms/step - accuracy: 0.6991 - loss: 0.8685 - val_accuracy: 0.8852 - val_loss: 0.3169
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m205s[0m 292ms/step - accuracy: 0.8242 - loss: 0.5006 - val_accuracy: 0.8968 - val_loss: 0.2986
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m205s[0m 291ms/step - accuracy: 0.8362 - loss: 0.4688 - val_accuracy: 0.9036 - val_loss: 0.2844
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 134ms/step - accuracy: 0.8980 - loss: 0.3027
ResNet Frozen Test Accuracy: 0.8985999822616577
ResNet Frozen Training Time: 624.6624710559845


ResNet Fine-tuning

In [None]:
resnet_base.trainable = True

for layer in resnet_base.layers[:-10]:
    layer.trainable = False

resnet_model.compile(
    optimizer=keras.optimizers.Adam(1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

history_resnet_ft = resnet_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=64
)

test_loss_resnet_ft, test_acc_resnet_ft = resnet_model.evaluate(x_test, y_test)
print("ResNet Fine-tuned Test Accuracy:", test_acc_resnet_ft)


Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m233s[0m 319ms/step - accuracy: 0.8598 - loss: 0.4069 - val_accuracy: 0.9102 - val_loss: 0.2587
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m221s[0m 314ms/step - accuracy: 0.8720 - loss: 0.3636 - val_accuracy: 0.9162 - val_loss: 0.2408
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m221s[0m 313ms/step - accuracy: 0.8783 - loss: 0.3495 - val_accuracy: 0.9182 - val_loss: 0.2323
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 135ms/step - accuracy: 0.9102 - loss: 0.2580
ResNet Fine-tuned Test Accuracy: 0.9136000275611877


VGG16 (Frozen)

In [None]:
vgg_base = VGG16(
    include_top=False,
    weights="imagenet",
    input_shape=(224, 224, 3)
)
vgg_base.trainable = False

vgg_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    layers.Resizing(224, 224),
    layers.Lambda(vgg_preprocess),
    vgg_base,
    layers.GlobalAveragePooling2D(),
    layers.Dense(10)
], name="vgg16_model")

vgg_model.summary()

vgg_model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

t0 = time.time()
history_vgg = vgg_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=64
)
vgg_time = time.time() - t0

test_loss_vgg, test_acc_vgg = vgg_model.evaluate(x_test, y_test)
print("VGG16 Frozen Test Accuracy:", test_acc_vgg)
print("VGG16 Frozen Training Time:", vgg_time)


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m243s[0m 334ms/step - accuracy: 0.4981 - loss: 1.6813 - val_accuracy: 0.7988 - val_loss: 0.5969
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m233s[0m 331ms/step - accuracy: 0.7428 - loss: 0.7560 - val_accuracy: 0.8222 - val_loss: 0.5155
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m233s[0m 331ms/step - accuracy: 0.7650 - loss: 0.6866 - val_accuracy: 0.8256 - val_loss: 0.5114
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 157ms/step - accuracy: 0.8206 - loss: 0.5370
VGG16 Frozen Test Accuracy: 0.8241999745368958
VGG16 Frozen Training Time: 709.8589382171631


VGG16 Fine-tuning

In [17]:
vgg_base.trainable = True

for layer in vgg_base.layers[:-4]:
    layer.trainable = False

vgg_model.compile(
    optimizer=keras.optimizers.Adam(1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

history_vgg_ft = vgg_model.fit(
    x_train, y_train,
    validation_split=0.1,
    epochs=3,
    batch_size=64
)

test_loss_vgg_ft, test_acc_vgg_ft = vgg_model.evaluate(x_test, y_test)
print("VGG16 Fine-tuned Test Accuracy:", test_acc_vgg_ft)


Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m266s[0m 374ms/step - accuracy: 0.7893 - loss: 0.6118 - val_accuracy: 0.8764 - val_loss: 0.3788
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m263s[0m 373ms/step - accuracy: 0.8432 - loss: 0.4536 - val_accuracy: 0.8924 - val_loss: 0.3184
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m262s[0m 372ms/step - accuracy: 0.8652 - loss: 0.3915 - val_accuracy: 0.8982 - val_loss: 0.3062
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 150ms/step - accuracy: 0.8988 - loss: 0.3189
VGG16 Fine-tuned Test Accuracy: 0.8991000056266785


Compare Results

In [18]:
print("\n==== Final Comparison ====\n")

print("Custom CNN Accuracy:", test_acc_custom)
print("ResNet Frozen Accuracy:", test_acc_resnet)
print("ResNet Fine-tuned Accuracy:", test_acc_resnet_ft)
print("VGG16 Frozen Accuracy:", test_acc_vgg)
print("VGG16 Fine-tuned Accuracy:", test_acc_vgg_ft)



==== Final Comparison ====

Custom CNN Accuracy: 0.5723000168800354
ResNet Frozen Accuracy: 0.8985999822616577
ResNet Fine-tuned Accuracy: 0.9136000275611877
VGG16 Frozen Accuracy: 0.8241999745368958
VGG16 Fine-tuned Accuracy: 0.8991000056266785


### Questions:

- Which model achieved the highest accuracy?

From the final comparison, the model that achieved the highest accuracy was ResNet fine-tuned, with about 91.36% test accuracy. It performed better than both VGG16 and the custom CNN. So overall, ResNet with fine-tuning gave the best performance on CIFAR-10.

- Which model trained faster?

The model that trained the fastest was the Custom CNN. It only took around 27 seconds, while ResNet and VGG16 took several minutes because they are much deeper and have more parameters. So even though the custom CNN was less accurate, it was much faster to train.

- How might the architecture explain the differences?

The custom CNN is small and simple, so it trains fast but can’t learn very complex features. VGG16 and ResNet are much deeper and already pretrained on ImageNet, so they start with strong features. ResNet did the best because its skip connections help it train deeper layers more effectively, which improves accuracy.