## Batch Normalization and Weight Decay
Two common regularization techniques in CNNs when working with computer vision tasks are Batch Normalization and Weight Decay. Regularization techniques in CNNs are used to prevent overfitting, improve generalization ability and speed up the training process.
<br>Gradient descent<br>
Gradient descent is a way to adjust model's weights and biases to find the lowest point of error, helping the model make better guesses when it sees new data.
<br>Normalization vs standardization<br>
Normalization scales data to a range between 0 and 1, while standardization transforms data using the formula (x - m)/sigma, where x is a data point, m is the mean, and sigma is the standard deviation, resulting in data with a mean of 0 and a standard deviation of 1.<br>
Exploding gradients is a problem in neural networks where the gradients (which guide how the model learns) become too large, causing the model to make huge, erratic changes to the weights and making the learning process fail.<br>
Stochastic Gradient Descent (SGD) is a variation of gradient descent where the model updates its weights using only one random data point at a time, instead of the entire dataset, making the learning process faster and more efficient for large datasets.<br>
Batch normalization normalizes the output of each activation function, ensuring that the data entering the next layer has a consistent scale. This helps speed up training and makes it more stable



### Batch normalization layer (BNL)
Batch Norm is a neural network layer that is now commonly used in many architectures. It often gets added as part of a Linear or Convolutional block and helps to stabilize the network during training.<br>
Batch normalization is used to normalize the input to a specific layer so that the distribution of the activations remains stable during training. By controlling the distribution of activations and gradients, the learning process is stabilized, which leads to improved convergence, enables usage of higher learning rates, and has regularization technique (even though this is not its primary purpose). Since this layer enables the normalization process between two consecutive hidden layers, it is usually placed after the convolutional layers (before or after the activation function).
<br>BNL Parameters<br>
The batch normalization layer has 4 parameters, 2 trainable during the backpropagation (β and γ - used to shift and scale transformed distribution to ensure better performance of the model for the specific task), and 2 non-trainable (moving average mean and variance - used for normalization). Even though moving average mean and variance are not learned from training, their values are estimated from input data during the training phase, and are stored and used during inference.

In [None]:
from tensorflow.keras import models, layers
from tensorflow.keras.layers import Input

# BATCH NORMALIZATION LAYER
model = models.Sequential()

# Remove activation function from the Conv2D layer
model.add(Input(shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), padding="same"))

# Insert batch normalization
model.add(layers.BatchNormalization())

# Add activation function
model.add(layers.Activation("relu"))

# Remove activation function from the Conv2D layer
model.add(layers.Conv2D(32, (3, 3), padding="same"))

# Insert batch normalization
model.add(layers.BatchNormalization())

# Add activation function
model.add(layers.Activation("relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))

### Weight decay
Weight decay is a regularization technique that penalizes large weights during training. It does this by adding a term to the loss function that encourages smaller values for the model’s weights, thus preventing overfitting. There are multiple version of the weight decay implementation based on the vector norm typed that are added as a penalty to the loss function:

1. L1 - Uses L1 vector norm (sum of the absolute weights);
2. L2 - Uses L2 vector norm (sum of the squared weights);
3. L1L2 - Sum of the absolute and squared weights.

By introducing weight decay, we ensure smaller values for the weights, which prevents the model from capturing noise instead of true patterns in the data and improves generalization abilities. Weight decay can be applied on the model’s layers by using the Keras regularizers module.

In [None]:
from keras.regularizers import l2
from tensorflow.keras import models, layers
from tensorflow.keras.layers import Input

# L2 regularization with 0.001 regularization factor
model = models.Sequential()
model.add(Input(shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), padding="same", kernel_regularizer=l2(0.001)))

# Batch normalization layer
model.add(layers.BatchNormalization())

# Activation function
model.add(layers.Activation("relu"))

# L2 regularization with 0.001 regularization factor
# model.add(Input(shape=(32, 32, 3))) # already configured to use input shape (None, 32, 32, 3)
model.add(layers.Conv2D(32, (3, 3), padding="same", kernel_regularizer=l2(0.001)))

# Batch normalization layer
model.add(layers.BatchNormalization())

# Activation function
model.add(layers.Activation("relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))

### Transfer learning
Transfer learning is a machine learning technique where a model trained on one task (typically with a large dataset) is reused or "fine-tuned" to solve a different, but related, task. The key idea is that knowledge gained from solving one problem can be transferred to help solve a different problem more efficiently, particularly when the new task has limited data. In the context of Convolutional Neural Networks (CNNs), transfer learning leverages pre-trained models (typically trained on large, general-purpose datasets like ImageNet) and adapts them for specific tasks. Some popular pre-trained models for computer vision tasks are VGG16 (or VGG19), ResNet, Inception and MobileNet.<br>
CNN architecture can be divided into two main parts:
1. Feature extraction layers - Typically consisted of convolutional and pooling layers.
2. Classification layers - Dense (fully connected) layers.

Considering that, pre-trained models can be used in several different ways:
1. One-shot classification - Already pre-trained model is loaded and used for custom task prediction without additional training.
2. Transfer learning - Feature extraction layers from pre-trained model are loaded and "freezed" so their weights cannot be modified during the training process, while newly created dense layers are added on top of them, which will be trained for custom task on a new dataset.
3. Fine-tuning - Feature extraction layers from pre-trained model are loaded and trained along with the classification layers using a small learning rate for a new custom task.


### Neural network VGG16
VGG16 is a 16-layer deep neural network with a straightforward architecture. It exclusively uses 3x3 convolutional filters with a stride of 1, and 2x2 max-pooling layers with a stride of 2, arranged in 5 convolutional blocks (with various number of Conv2D layers, MaxPooling layer and ReLU activation function). It was originally trained on 224x224x3 size images.
https://keras.io/api/applications/vgg/#vgg16-function

In [None]:
from keras.applications import VGG16
from tensorflow.keras import models, layers
from tensorflow.keras.layers import Input

# Load the VGG16 model which is pre-trained on ImageNet data and exclude top layer
# When loading the VGG16 model, the input_shape parameter should be (32, 32, 3) to match the shape of
# the CIFAR-10 images, since the original VGG16 model was trained on images with different resolution (224x224x3)
vgg_base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))

# Freeze the VGG16 model layers to prevent training them
vgg_base_model.trainable = False

# On top of the VGG16 feature extraction model, we can add fully connected layers for classification purpose
model = models.Sequential()

# Add the base VGG16 model
model.add(vgg_base_model)

# Add fully connected layers on top of VGG16 base model
model.add(layers.Flatten())
model.add(layers.Dense(256))
model.add(layers.BatchNormalization())
model.add(layers.Activation("relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))

# Finally, the code for loading and unfreezing certain VGG16 layers is provided
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))
base_model.trainable = False
set_trainable = False

# Unfreeze layers from block5_conv1 onwards
for layer in base_model.layers:
  if layer.name == "block5_conv1":
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False


<MaxPooling2D name=block5_pool, built=True>


### Assignment 1
1. In the final model from the last lab exercise (Figure 4), insert a batch normalization layer between every convolutional layer and activation function. Also, add batch normalization between the first dense layer and
activation function.

In [None]:
import matplotlib.pyplot as plt
from keras.layers import BatchNormalization
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.optimizers import SGD
from tensorflow.keras.layers import Input
from keras.layers import Activation
from keras.layers import Dropout
from keras.utils import to_categorical
import numpy as np
from keras.datasets import cifar10

(trainX, trainy), (testX, testy) = cifar10.load_data() # load data

# One-hot encoding for labels
trainy = to_categorical(trainy, 10)
testy = to_categorical(testy, 10)

# Convert images to float32 and scale them
trainX = trainX.astype('float32') / 255.0
testX = testX.astype('float32') / 255.0

model = Sequential()

# First Conv2D Block
model.add(Input(shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after first MaxPooling2D

# Second Conv2D Block
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after second MaxPooling2D

# Third Conv2D Block (newly added)
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after third MaxPooling2D

# Flatten Layer
model.add(Flatten())

# Dense Layer with Batch Normalization
model.add(Dense(128))
model.add(BatchNormalization())  # Batch Normalization
model.add(Activation('relu'))    # Activation after Batch Normalization

# Dropout after Dense layer
model.add(Dropout(0.2))  # Dropout after first Dense layer

# Output Layer
model.add(Dense(10, activation='softmax'))

# Compile the model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(trainX, trainy, epochs=5, batch_size=64, validation_data=(testX, testy), verbose=1)

# Print accuracy vs loss
for epoch in range(5):
    print(f"Epoch {epoch + 1}: Accuracy = {history.history['accuracy'][epoch]}, Loss = {history.history['loss'][epoch]}")

# Print final loss and accuracy
print(f"\nFinal Loss: {history.history['loss'][-1]}, Final Accuracy: {history.history['accuracy'][-1]}")


### Plot the accuracy vs loss
Plot the behavior of accuracy and loss over the epochs.
1. What can you conclude from the plots?

In [None]:
# Plot accuracy and loss
plt.figure(figsize=(8, 6))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['loss'], label='Training Loss')
plt.title('Training Accuracy and Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Value')
plt.legend()
plt.show()


### Reduce the batch normalization momentum
Instability in validation accuracy is a common issue that can arise due to interactions between batch normalization and other training setup (e.g., optimizer, learning rate). The easiest solution is to reduce the batch normalization’s momentum (default 0.99), while keeping our base architecture the same.

In [None]:
import matplotlib.pyplot as plt
from keras.layers import BatchNormalization
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.optimizers import SGD
from tensorflow.keras.layers import Input
from keras.layers import Activation
from keras.layers import Dropout
from keras.utils import to_categorical
import numpy as np
from keras.datasets import cifar10

(trainX, trainy), (testX, testy) = cifar10.load_data() # load data

# One-hot encoding for labels
trainy = to_categorical(trainy, 10)
testy = to_categorical(testy, 10)

# Convert images to float32 and scale them
trainX = trainX.astype('float32') / 255.0
testX = testX.astype('float32') / 255.0

model = Sequential()

# First Conv2D Block
model.add(Input(shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(32, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after first MaxPooling2D

# Second Conv2D Block
model.add(Conv2D(64, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(64, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after second MaxPooling2D

# Third Conv2D Block (newly added)
model.add(Conv2D(128, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(128, (3, 3), padding='same'))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after third MaxPooling2D

# Flatten Layer
model.add(Flatten())

# Dense Layer with Batch Normalization
model.add(Dense(128))
#model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization

# Dropout after Dense layer
model.add(Dropout(0.2))  # Dropout after first Dense layer

# Output Layer
model.add(Dense(10, activation='softmax'))

# Compile the model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(trainX, trainy, epochs=5, batch_size=64, validation_data=(testX, testy), verbose=1)

# Print accuracy vs loss
for epoch in range(5):
    print(f"Epoch {epoch + 1}: Accuracy = {history.history['accuracy'][epoch]}, Loss = {history.history['loss'][epoch]}")

# Print final loss and accuracy
print(f"\nFinal Loss: {history.history['loss'][-1]}, Final Accuracy: {history.history['accuracy'][-1]}")

# Plot accuracy and loss
plt.figure(figsize=(8, 6))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['loss'], label='Training Loss')
plt.title('Training Accuracy and Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Value')
plt.legend()
plt.show()


### Adding weight decay
In addition to the previously added batch normalization (with momentum 0.8), add weight decay. More precisely, add the L2 regularizer with regularization factor of 0.001 to every Conv2D and first Dense layer.

In [None]:
import matplotlib.pyplot as plt
from keras.layers import BatchNormalization
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.optimizers import SGD
from tensorflow.keras.layers import Input
from keras.layers import Activation
from keras.layers import Dropout
from keras.utils import to_categorical
import numpy as np
from keras.datasets import cifar10

(trainX, trainy), (testX, testy) = cifar10.load_data() # load data

# One-hot encoding for labels
trainy = to_categorical(trainy, 10)
testy = to_categorical(testy, 10)

# Convert images to float32 and scale them
trainX = trainX.astype('float32') / 255.0
testX = testX.astype('float32') / 255.0

model = Sequential()

# First Conv2D Block
model.add(Input(shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(32, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after first MaxPooling2D

# Second Conv2D Block
model.add(Conv2D(64, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(64, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after second MaxPooling2D

# Third Conv2D Block (newly added)
model.add(Conv2D(128, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(Conv2D(128, (3, 3), padding='same', kernel_regularizer=l2(0.001)))
# model.add(BatchNormalization())  # Batch Normalization
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization
model.add(MaxPooling2D((3, 3)))
model.add(Dropout(0.2))  # Dropout after third MaxPooling2D

# Flatten Layer
model.add(Flatten())

# Dense Layer with Batch Normalization
model.add(Dense(128, kernel_regularizer=l2(0.001)))
model.add(BatchNormalization(momentum=0.8)) # Batch Normalization Reduced
model.add(Activation('relu'))    # Activation after Batch Normalization

# Dropout after Dense layer
model.add(Dropout(0.2))  # Dropout after first Dense layer

# Output Layer
model.add(Dense(10, activation='softmax'))

# Compile the model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(trainX, trainy, epochs=5, batch_size=64, validation_data=(testX, testy), verbose=1)

# Print accuracy vs loss
for epoch in range(5):
    print(f"Epoch {epoch + 1}: Accuracy = {history.history['accuracy'][epoch]}, Loss = {history.history['loss'][epoch]}")

# Print final loss and accuracy
print(f"\nFinal Loss: {history.history['loss'][-1]}, Final Accuracy: {history.history['accuracy'][-1]}")

# Plot accuracy and loss
plt.figure(figsize=(8, 6))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['loss'], label='Training Loss')
plt.title('Training Accuracy and Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Value')
plt.legend()
plt.show()


### Model comparison
Compare the accuracy obtained by using these three models. The expected accuracy per model type is:

Baseline model 73 - 76%
Baseline model + BN 80 - 82%
Baseline model + BN + L2 83 - 85%

1. Explain the increase in accuracy after every modification.

### Testing batch normalization, momentum and l2
Try different values for the batch normalization momentum and l2 parameters. Record the behavior of these models. Did you achieve better performance, and using which values of these parameters?

### Assignment 2
1) Using the provided code samples in the VGG16 section, load the VGG16 model by excluding the top layer. Freeze all VGG16 model layers to prevent training them. Build the transfer learning model by adding the base VGG16 model and the same fully connected layers as shown in the code snippet above.

In [None]:
from keras.applications import VGG16
from tensorflow.keras import models, layers
from tensorflow.keras.layers import Input

# Load the VGG16 model pre-trained on ImageNet data, excluding the top layer.
# Set input shape to (32, 32, 3) for CIFAR-10 compatibility.
vgg_base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))

# Freeze all layers of the VGG16 model to prevent them from being trained initially.
vgg_base_model.trainable = False

# Build the transfer learning model.
model = models.Sequential()

# Add the base VGG16 model.
model.add(vgg_base_model)

# Add fully connected layers on top for classification.
model.add(layers.Flatten())
model.add(layers.Dense(256))
model.add(layers.BatchNormalization())
model.add(layers.Activation("relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))  # 10 classes for CIFAR-10.

# Print the model summary to verify the architecture.
model.summary()

# OPTIONAL: Unfreeze specific layers from block5_conv1 onward for fine-tuning.
set_trainable = False
for layer in vgg_base_model.layers:
    if layer.name == "block5_conv1":
        set_trainable = True
    layer.trainable = set_trainable

# Print a summary to verify the trainable status of each layer.
for layer in vgg_base_model.layers:
    print(f"Layer: {layer.name}, Trainable: {layer.trainable}")


2) Compile this model by using a smaller learning rate (e.g., lr= 0.0001). Perform training and evaluation of the
model on the CIFAR-10 dataset. What accuracy is achieved? Is this result satisfactory?

In [16]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to the range [0, 1]
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Convert class labels to one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.0001), loss="categorical_crossentropy", metrics=["accuracy"])

# Train the model
history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, verbose=1)

# Evaluate the model
test_loss_i, test_accuracy_i = model.evaluate(x_test, y_test, verbose=2)
print(f"Iteration I Test Accuracy: {test_accuracy_i:.2f}")

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 17ms/step - accuracy: 0.5162 - loss: 1.4481 - val_accuracy: 0.7161 - val_loss: 0.8138
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 14ms/step - accuracy: 0.7517 - loss: 0.7339 - val_accuracy: 0.7982 - val_loss: 0.5834
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 14ms/step - accuracy: 0.8025 - loss: 0.5742 - val_accuracy: 0.8112 - val_loss: 0.5515
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 15ms/step - accuracy: 0.8462 - loss: 0.4492 - val_accuracy: 0.8131 - val_loss: 0.5566
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 15ms/step - accuracy: 0.8803 - loss: 0.3578 - val_accuracy: 0.8215 - val_loss: 0.5296
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 14ms/step - accuracy: 0.9072 - loss: 0.2821 - val_accuracy: 0.8202 - val_loss: 0.5341
Epoch 7/10
[1m625

3) Perform the model fine-tuning by unfreezing the last three layers of the base VGG16 model. Evaluate this model and explain the differences in results obtained using transfer learning with and without fine-tuning.<br>
Answer<br>
The fine-tuned model achieved a slightly higher test accuracy (0.75 vs. 0.73), indicating improved performance after fine-tuning. However, the test loss increased (1.74 vs. 1.21), suggesting that fine-tuning may have led to overfitting or less generalization on the test data.

In [17]:
# Unfreeze the last 3 layers of the VGG16 model
vgg_base_model.trainable = True
for layer in vgg_base_model.layers[:-3]:
    layer.trainable = False

# Re-compile the model after unfreezing layers
model.compile(optimizer=Adam(learning_rate=0.0001), loss="categorical_crossentropy", metrics=["accuracy"])

# Train the fine-tuned model again if needed
history_fine_tune = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, verbose=1)

# Evaluate the fine-tuned model
test_loss_fine_tune, test_accuracy_fine_tune = model.evaluate(x_test, y_test, verbose=2)

# Evaluate the fine-tuned model (Iteration II)
test_loss_ii, test_accuracy_ii = model.evaluate(x_test, y_test, verbose=2)
print(f"Iteration II Test Accuracy: {test_accuracy_ii:.2f}, Test Loss: {test_loss_ii:.2f}")


Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9860 - loss: 0.0592 - val_accuracy: 0.8628 - val_loss: 0.4617
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.9923 - loss: 0.0390 - val_accuracy: 0.8642 - val_loss: 0.4745
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.9951 - loss: 0.0289 - val_accuracy: 0.8623 - val_loss: 0.4991
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 8ms/step - accuracy: 0.9958 - loss: 0.0252 - val_accuracy: 0.8626 - val_loss: 0.5054
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - accuracy: 0.9966 - loss: 0.0201 - val_accuracy: 0.8643 - val_loss: 0.5239
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.9968 - loss: 0.0185 - val_accuracy: 0.8630 - val_loss: 0.5348
Epoch 7/10
[1m625/625[0m

4) Load the VGG16 model again, but this time only up to the third convolutional block. Unfreeze layers from the third convolutional block. Build the new transfer learning model by adding the block3 model and the same fully connected layers as
shown in the code snippet above.

In [18]:
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras import models

# Load the VGG16 model and take only up to block3
vgg_base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))
block3_output = vgg_base_model.get_layer("block3_pool").output
block3_model = Model(inputs=vgg_base_model.input, outputs=block3_output)

# Freeze all layers up to block3, unfreeze block3 for fine-tuning
for layer in block3_model.layers:
    if "block3" in layer.name:
        layer.trainable = True
    else:
        layer.trainable = False

# Build the model
model = models.Sequential()
model.add(block3_model)

# Add fully connected layers on top of the block3 model
model.add(layers.Flatten())
model.add(layers.Dense(256))
model.add(layers.BatchNormalization())
model.add(layers.Activation("relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))  # 10 classes for CIFAR-10

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Train the model after unfreezing block 2 and block 3
history_iii = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, verbose=1)

# Evaluate the model after fine-tuning block 2 and block 3 (Iteration III)
test_loss_iii, test_accuracy_iii = model.evaluate(x_test, y_test, verbose=2)
print(f"Iteration III Test Accuracy: {test_accuracy_iii:.2f}, Test Loss: {test_loss_iii:.2f}")

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 13ms/step - accuracy: 0.5634 - loss: 1.2923 - val_accuracy: 0.5587 - val_loss: 1.3348
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 11ms/step - accuracy: 0.7302 - loss: 0.7818 - val_accuracy: 0.7164 - val_loss: 0.7995
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 11ms/step - accuracy: 0.7745 - loss: 0.6485 - val_accuracy: 0.7547 - val_loss: 0.7366
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 12ms/step - accuracy: 0.8153 - loss: 0.5423 - val_accuracy: 0.7680 - val_loss: 0.7027
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.8453 - loss: 0.4509 - val_accuracy: 0.8022 - val_loss: 0.6029
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 11ms/step - accuracy: 0.8719 - loss: 0.3709 - val_accuracy: 0.8069 - val_loss: 0.6171
Epoch 7/10
[1m625/

5) Modify the previous model by unfreezing the layers of the second block along with the layers of the third block.

In [19]:
# Freeze all layers up to block2, unfreeze block2 and block3 for fine-tuning
for layer in block3_model.layers:
  if "block2" in layer.name or "block3" in layer.name:
    layer.trainable = True
  else:
    layer.trainable = False

# Re-compile the model after unfreezing layers
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Train the fine-tuned model again (Iteration IV)
history_iv = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, verbose=1)

# Evaluate the fine-tuned model (Iteration IV)
test_loss_iv, test_accuracy_iv = model.evaluate(x_test, y_test, verbose=2)
print(f"Iteration IV Test Accuracy: {test_accuracy_iv:.2f}, Test Loss: {test_loss_iv:.2f}")

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 16ms/step - accuracy: 0.8924 - loss: 0.3213 - val_accuracy: 0.7693 - val_loss: 0.7935
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 15ms/step - accuracy: 0.9403 - loss: 0.1755 - val_accuracy: 0.8046 - val_loss: 0.7072
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 15ms/step - accuracy: 0.9519 - loss: 0.1429 - val_accuracy: 0.7816 - val_loss: 0.9111
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 16ms/step - accuracy: 0.9542 - loss: 0.1355 - val_accuracy: 0.7778 - val_loss: 0.9401
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 16ms/step - accuracy: 0.9629 - loss: 0.1067 - val_accuracy: 0.7870 - val_loss: 0.9753
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 16ms/step - accuracy: 0.9620 - loss: 0.1148 - val_accuracy: 0.8114 - val_loss: 0.8353
Epoch 7/10
[1m62

6) Finally

In [21]:
import pandas as pd

# Assuming you have the accuracy values from each model iteration
results = [
    {"iteration": "I", "loaded_vgg16_blocks": "All", "fine_tuning_performed": "No", "unfreezed_layers": "-", "accuracy": test_accuracy_i},
    {"iteration": "II", "loaded_vgg16_blocks": "All", "fine_tuning_performed": "Yes", "unfreezed_layers": "Last 3", "accuracy": test_accuracy_ii},
    {"iteration": "III", "loaded_vgg16_blocks": "First 3", "fine_tuning_performed": "Yes", "unfreezed_layers": "Block 3 layers", "accuracy": test_accuracy_iii},
    {"iteration": "IV", "loaded_vgg16_blocks": "First 3", "fine_tuning_performed": "Yes", "unfreezed_layers": "Block 2 and block 3 layers", "accuracy": test_accuracy_iv}
]

# Convert the results to a DataFrame
df = pd.DataFrame(results)

# Print the table
print(df)


  iteration loaded_vgg16_blocks fine_tuning_performed  \
0         I                 All                    No   
1        II                 All                   Yes   
2       III             First 3                   Yes   
3        IV             First 3                   Yes   

             unfreezed_layers  accuracy  
0                           -    0.8415  
1                      Last 3    0.8606  
2              Block 3 layers    0.8135  
3  Block 2 and block 3 layers    0.7871  
