# <center> <b> <font color='#052F4A'> Image Classification with CIFAR-10 </b> </font> </center>

### <font color='blue'> TABLE OF CONTENTS </font>

1. [Introduction](#1)
2. [Setup](#2)
3. [Load and preprocess data](#3)
4. [Feature Extraction](#4)
5. [Fine Tuning](#5)
6. [ANNEX](#annex) <br>
    A. [VGG-16](#vgg-16)<br>
    B. [About TF's compile() and fit()](#compile_fit)
7. [References](#references)

<a name="1"></a>
## <b> <font color='darkred'> 1. Introduction  </b> </font>

### <font color='darkorange'> The problem </font>

The CIFAR-10 dataset is a well-known benchmark in computer vision, consisting of 60,000 color images of size 32×32 pixels, divided into 10 classes such as airplanes, cars, birds, and cats. The task is to correctly classify each image into its respective category, which can be challenging due to the small image size and high intra-class variability.

To address this problem, I will apply transfer learning, leveraging a pre-trained VGG-16 model as the base feature extractor. Instead of training a deep network from scratch, I will use the rich representations already learned by VGG-16 on ImageNet and adapt them to CIFAR-10, adding and training custom layers for classification on the target dataset.

### <font color='darkorange'> General Workflow for Transfer Learning </font>

- **Step 1: Define the Problem**
  - Identify the target task and dataset (e.g., classification, detection, segmentation).
  - Prepare and preprocess the data in a way that matches the input requirements of the base model.


- **Step 2: Select a Pre-trained Model**
  - Choose a model that has been trained on a large, general dataset (e.g., ImageNet for vision tasks).
  - Load it without its original classification head (`include_top=False`).
  - Decide which parts of the pre-trained network to use as a feature extractor.


- **Step 3: Feature Extraction**
  - Freeze the pre-trained layers so their weights are not updated.
  - Add a new task-specific classification (or regression) head.
  - Train only the newly added layers on the target dataset.
  - Purpose: leverage generic features (edges, shapes, textures, etc.) learned from the source dataset.


- **Step 4: Fine-Tuning**
  - Unfreeze some (or all) of the pre-trained layers.
  - Train both the pre-trained layers and the new head on the target dataset, typically with a lower learning rate.
  - Purpose: adapt higher-level representations to the specifics of the target task.


- **Step 5: Evaluation**
  - Assess model performance on a held-out test set.
  - Compare results between feature extraction and fine-tuning.
  - Use metrics appropriate to the problem (accuracy, F1, IoU, etc.).


- **Step 6: Deployment**
  - Save the trained model and preprocessing pipeline.
  - Export for inference on unseen data.


⚠️ **Note on Batch Normalization:**  
In architectures that include BatchNorm, keep those layers in inference mode (`training=False` when calling the base model).  
Otherwise, their internal statistics may be corrupted during fine-tuning.



### <font color='darkorange'> Workflow for this example </font>


- **Step 1: Problem Setup**
  - Load CIFAR-10 dataset (60,000 images, 10 classes).
  - Preprocess images (resize to 224×224, normalize).
  - Train/test split.


- **Step 2: Feature Extraction**
  - Load VGG-16 pre-trained on ImageNet (`include_top=False`).
  - Freeze all base layers.
  - Add custom classification head:
    - GlobalAveragePooling2D.
    - Dense layer(s) with ReLU activation (optional).
    - Final Dense layer with 10 units (softmax).
  - Train only the top classification layers.
  - Evaluate baseline performance.


- **Step 3: Fine-Tuning**
  - Unfreeze part of the base layers(e.g., in this example the last block of VGG-16).
  - Recompile model with a lower learning rate.
  - Train both the unfrozen base layers and the classification head.


- **Step 4: Evaluation**
  - Evaluate on the test set.
  - Compare performance between feature extraction and fine-tuning stages.
  - Visualize metrics (accuracy, loss curves, confusion matrix).




MAYBE A DRAW

<a name="2"></a>
## <b> <font color='darkred'> 2. Setup  </b> </font>

In [3]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

<a name="3"></a>
## <b> <font color='darkred'> 3. Load and preprocess data  </b> </font>

In [4]:
# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [5]:
# Preprocess input for VGG16 (scales pixel values in the way VGG16 expects)
x_train_preprocessed = preprocess_input(x_train)
x_test_preprocessed = preprocess_input(x_test)

# One-hot encode labels
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

In [6]:
x_train_preprocessed.shape

(50000, 32, 32, 3)

50.000 32x32 RGB images.

In [7]:
y_train_cat[0] # is one hot encoded

array([0., 0., 0., 0., 0., 0., 1., 0., 0., 0.])

Note that labels are one-hot encoded.

<a name="4"></a>
## <b> <font color='darkred'> 4. Feature Extraction  </b> </font>

We will:

- Load VGG-16 pre-trained on ImageNet (include_top=False).
- Freeze all base layers.
- Add custom classification head.

<img src="images/FeatureExtraction.png"/>


In [11]:
# Feature extraction using VGG16
def feature_extractor(inputs):
    vgg = tf.keras.applications.VGG16(
        input_shape=(224, 224, 3),
        include_top=False,
        weights='imagenet'
    )
    vgg.trainable = False  # Freeze feature extractor
    return vgg(inputs)

# Classifier head
def classifier(inputs):
    x = tf.keras.layers.GlobalAveragePooling2D()(inputs) # add another dense later
    x = tf.keras.layers.Dense(10, activation="softmax", name="classification")(x)
    return x

# Final model combining resize, feature extractor, and classifier
def final_model(inputs):
    resize = tf.keras.layers.Resizing(height=224, width=224)(inputs)  # resize; from 32x32 to 224x224
                    # vgg-16 expects an input of shape 224x224
    vgg_features = feature_extractor(resize)
    classification_output = classifier(vgg_features)
    return classification_output


In [12]:
# Define and compile the model
def define_compile_model():
    inputs = tf.keras.layers.Input(shape=(32, 32, 3))
    output = final_model(inputs)
    model = tf.keras.Model(inputs=inputs, outputs=output)

    model.compile(
        optimizer='SGD',
        loss='categorical_crossentropy',  # labels are one-hot encoded
        metrics=['accuracy']
    )
    return model

# Instantiate the model
model = define_compile_model()
model.summary()

In [10]:
# Train
print("Stage 1: Training classifier only (feature extraction)")
model.fit(x_train_preprocessed, y_train_cat,
          validation_data=(x_test_preprocessed, y_test_cat),
          epochs=2,
          batch_size=32)

Stage 1: Training classifier only (feature extraction)
Epoch 1/2


2025-09-26 19:01:10.112929: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 614400000 exceeds 10% of free system memory.
2025-09-26 19:01:11.258389: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 411041792 exceeds 10% of free system memory.
2025-09-26 19:01:11.370905: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 411041792 exceeds 10% of free system memory.
2025-09-26 19:01:11.873263: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 102760448 exceeds 10% of free system memory.
2025-09-26 19:01:11.946875: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 205520896 exceeds 10% of free system memory.


[1m 221/1563[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m1:27:01[0m 4s/step - accuracy: 0.4490 - loss: 2.2339

KeyboardInterrupt: 

<a name="5"></a>
## <b> <font color='darkred'> 5. Fine-tuning  </b> </font>

We will unfreeze last VGG16 block for fine-tuning.

We need to:

- Access the VGG16 model inside feature_extractor.

- Unfreeze only the last block (block5) layers.

- Recompile the model with a smaller learning rate.



<img src="images/FineTuning.png"/>


In [13]:
# Access the VGG16 layer in the model
vgg_layer = None
for layer in model.layers:
    if isinstance(layer, tf.keras.Model) and 'vgg16' in layer.name:
        vgg_layer = layer
        break

if vgg_layer is None:
    print("VGG16 layer not found!")
else:
    # Unfreeze last conv block (block5)
    vgg_layer.trainable = True
    for layer in vgg_layer.layers:
        if not layer.name.startswith('block5'):
            layer.trainable = False


In [14]:
vgg = model.get_layer('vgg16')
for layer in vgg.layers:
    print(layer.name, layer.trainable)

input_layer_3 False
block1_conv1 False
block1_conv2 False
block1_pool False
block2_conv1 False
block2_conv2 False
block2_pool False
block3_conv1 False
block3_conv2 False
block3_conv3 False
block3_pool False
block4_conv1 False
block4_conv2 False
block4_conv3 False
block4_pool False
block5_conv1 True
block5_conv2 True
block5_conv3 True
block5_pool True


We can see block5 is trainable now (trainable=True).

In [15]:
model.summary()

We can see that our model now has more trainable parameters (7,804,554) compared to the 5,130 it had before.

In [None]:
# Recompile with a smaller learning rate
model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=1e-4, momentum=0.9), # lower learning rate
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Stage 2: Fine-tuning last VGG block")
model.fit(
    x_train_preprocessed, y_train_cat,
    validation_data=(x_test_preprocessed, y_test_cat),
    epochs=5,  # can increase
    batch_size=64
)

<a name="annex"></a>
## <b> <font color='darkred'> Annex  </b> </font>


<a name="vgg-16"></a>
### <b> <font color='darkorange'> A. VGG-16  </b> </font>

Let’s take a closer look at our base model: VGG-16.

In [16]:
vgg = tf.keras.applications.VGG16(
        input_shape=(224, 224, 3),
        include_top=False,
        weights='imagenet'
    )
vgg.trainable = False  # Freeze feature extractor

In [17]:
vgg.summary()

**Brief analysis:**
    
- The model expects 224×224 RGB images.
- Spatial resolution halves after each pooling layer: 224 → 112 → 56 → 28 → 14 → 7.
- Depth of filters increases as we go deeper: 64 → 128 → 256 → 512 → 512.
- Final output from the convolutional base is a 7×7×512 feature map, ready for flattening or global pooling before classification.

In short: the model expects 224×224×3 images, compresses spatial size while increasing feature richness, and ends with a compact but deep representation.


**Output shape calculation**

Input: (224, 224, 3)

VGG16 has 5 convolutional blocks with MaxPooling2D(pool_size=2) after each block.

So the spatial dimensions halve after each pooling layer:

```
| Block | Input size | After pooling |
| ----- | ---------- | ------------- |
| 1     | 224×224    | 112×112       |
| 2     | 112×112    | 56×56         |
| 3     | 56×56      | 28×28         |
| 4     | 28×28      | 14×14         |
| 5     | 14×14      | 7×7           |
```

Number of channels after last conv block = 512

**Note.** Convolutions preserve spatial size because of “same” padding.

<a name="compile_fit"></a>
### <b> <font color='darkorange'> B. About TF's compile () and fit()  </b> </font>


A few notes about compile() and fit()

- Calling model.compile() more than once does NOT reset or forget the learned weights.

- When you call model.fit(), training updates the model's weights.

- Changing layer.trainable flags changes which weights will be updated in subsequent training.

- Calling model.compile() again only updates the training configuration — e.g., optimizer, loss, metrics, learning rate.

- The model’s weights stay intact across recompiles, so previously learned information is preserved.

So the workflow is correct for fine-tuning:

- Initially, you train the model (usually with some layers frozen).

- Then you unfreeze some layers (e.g., 'block5' layers).

- You recompile the model with a lower learning rate.

- Finally, you call fit() again to continue training those unfrozen layers.

This will fine-tune those layers without losing the previous training progress.



**In summary**

- Compile defines the loss function, the optimizer and the metrics.

- It has nothing to do with the weights and you can compile a model as many times as you want without causing any problem to pretrained weights.



<a name="references"></a>
## <b> <font color='darkred'> References </b> </font>

- [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning)
- [TF Advanced Techniques Specialization](https://www.coursera.org/specializations/tensorflow-advanced-techniques)
