---
# Model T - Transfer Learning, Feature Extraction, Data Augmentation, Adaptive Moment Estimation (Adam)
- **128 x 128 x 3** Image size.  
- **64** Batch size.
- Build the **full model** with **VGG16** convolutional base and a **Classifier** and **Train it**.
    - **Data Augmentation Pipeline**
        - Random **Horizontal** Flip
        - Random Rotation **5%**
        - Random Zoom **5%**
        - Random Contrast **5%**
        - Random Brightness **5%**
    - **Feature Extraction**.
        - **VGG16** Convolutional Base **(Frozen)**.
        - **4 x 4 x 512** Feature Maps.
    - **Classifier**: 
        - Adaptive Moment Estimation **(Adam)** optimizer.
        - **0.001** Initial Learning rate.
        - **Sparse Categorical Cross-Entropy** loss function.
        - **Reduce Learning Rate on Plateau** callback with a **0.1** factor and **3** patience.
        - **Early Stopping** callback with **6** patience.
        - **Model Checkpoint** callback to save the best model based on validation loss.
        - **4 x 4 x 512** Tensor before the **Flatten** layer.
        - **512** Dense layer with **ReLU** activation.
        - **10** Dense output layer with **Softmax** activation.
        - **Dropout** layers with **0.5** rate after the Flatten and Dense layers.
        - **L2** regularization with **0.0001** rate on the Dense layers.
    - **4 199 946** Trainable Parameters.
    - **30 Epochs**.  
- **Model Evaluation**.

---
#### Imports and Setup

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
print(f'TensorFlow version: {tf.__version__}')
tf.get_logger().setLevel('ERROR')
tf.autograph.set_verbosity(3)
import matplotlib.pyplot as plt
import pickle
import numpy as np
from tensorflow.keras.utils import image_dataset_from_directory
from tensorflow import keras
from tensorflow.keras import callbacks, layers, optimizers
from keras import regularizers, Model
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay ,accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc
from sklearn.preprocessing import label_binarize
from itertools import cycle

TensorFlow version: 2.15.0


---
#### Group Datasets

In [2]:
IMG_SIZE = 128

train_dirs = [f'../data/train1_resized_{IMG_SIZE}', f'../data/train3_resized_{IMG_SIZE}', f'../data/train4_resized_{IMG_SIZE}', f'../data/train5_resized_{IMG_SIZE}']
validation_dir = f'../data/train2_resized_{IMG_SIZE}'
test_dir = f'../data/test_resized_{IMG_SIZE}'

---
#### Create Datasets

In [3]:
BATCH_SIZE = 64
NUM_CLASSES = 10

train_datasets = [image_dataset_from_directory(directory, image_size=(IMG_SIZE, IMG_SIZE), batch_size=BATCH_SIZE) for directory in train_dirs]

train_dataset = train_datasets[0]
for dataset in train_datasets[1:]:
    train_dataset = train_dataset.concatenate(dataset)

train_dataset = train_dataset.shuffle(buffer_size=1000).prefetch(buffer_size=tf.data.AUTOTUNE)
validation_dataset = image_dataset_from_directory(validation_dir, image_size=(IMG_SIZE, IMG_SIZE), batch_size=BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)
test_dataset = image_dataset_from_directory(test_dir, image_size=(IMG_SIZE, IMG_SIZE), batch_size=BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)

class_names = train_datasets[0].class_names

for data_batch, labels_batch in train_dataset.take(1):
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)

Found 10000 files belonging to 10 classes.
Found 10000 files belonging to 10 classes.
Found 10000 files belonging to 10 classes.
Found 10000 files belonging to 10 classes.
Found 10000 files belonging to 10 classes.
Found 10000 files belonging to 10 classes.
data batch shape: (64, 128, 128, 3)
labels batch shape: (64,)


- We define the image size of 128 x 128 x 3, batch size of 64 and create an array with the label's names.  
- We create the train dataset by concatenating them, we **shuffle** the samples before each epoch and **prefetch** them to memory.  
- We do the same for the validation and test dataset except **shuffling** which is **unwanted** for these datasets.

---
#### Data Augmentation Pipeline

In [4]:
data_augmentation = keras.Sequential(
    [
        # keras.layers.RandomCrop(height=16, width=16), # This layer is commented out because it is not compatible with the current model architecture.
        keras.layers.RandomFlip("horizontal"),
        # keras.layers.RandomTranslation(0.1, 0.1), # This layer is commented out because it didn't improve the model performance.
        keras.layers.RandomRotation(0.05),
        keras.layers.RandomZoom(0.05),
        keras.layers.RandomContrast(0.05),
        keras.layers.RandomBrightness(0.05),
    ]
)

---
#### Loading the VGG16 Model

In [5]:
from tensorflow.keras.applications.vgg16 import VGG16
conv_base = VGG16(weights='imagenet', include_top=False)
conv_base.trainable = False
conv_base.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)  

---
#### Model Arquitecture

In [6]:
inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = data_augmentation(inputs)
x = keras.applications.vgg16.preprocess_input(x)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(512, activation="relu", kernel_regularizer=regularizers.L2(1e-4))(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(NUM_CLASSES, activation="softmax", kernel_regularizer=regularizers.L2(1e-4))(x)
model = keras.Model(inputs, outputs)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 128, 128, 3)]     0         
                                                                 
 sequential (Sequential)     (None, 128, 128, 3)       0         
                                                                 
 tf.__operators__.getitem (  (None, 128, 128, 3)       0         
 SlicingOpLambda)                                                
                                                                 
 tf.nn.bias_add (TFOpLambda  (None, 128, 128, 3)       0         
 )                                                               
                                                                 
 vgg16 (Functional)          (None, None, None, 512)   14714688  
                                                                 
 flatten (Flatten)           (None, 8192)              0     

---
#### Model Compilation

In [7]:
initial_learning_rate = 0.001
optimizer = optimizers.Adam(learning_rate=initial_learning_rate)
loss_function = keras.losses.SparseCategoricalCrossentropy()

lr_scheduler = callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = callbacks.EarlyStopping(monitor='val_loss', patience=6, restore_best_weights=True, verbose=1)
save_best_model = callbacks.ModelCheckpoint(filepath='../models/06_model_t_tl_data_augm_adam.h5', save_best_only=True, monitor='val_loss', verbose=1)

callbacks = [lr_scheduler, early_stopping, save_best_model]

model.compile(
    loss=loss_function,
    optimizer=optimizer,
    metrics=["accuracy"])

---
#### Model Training

In [8]:
history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=validation_dataset,
    callbacks=callbacks)

Epoch 1/50
Epoch 1: val_loss improved from inf to 0.68523, saving model to ../models/06_model_t_tl_feat_ext_data_augm_adam.h5
Epoch 2/50


  saving_api.save_model(


Epoch 2: val_loss improved from 0.68523 to 0.61996, saving model to ../models/06_model_t_tl_feat_ext_data_augm_adam.h5
Epoch 3/50
Epoch 3: val_loss improved from 0.61996 to 0.60610, saving model to ../models/06_model_t_tl_feat_ext_data_augm_adam.h5
Epoch 4/50
Epoch 4: val_loss did not improve from 0.60610
Epoch 5/50
Epoch 5: val_loss did not improve from 0.60610
Epoch 6/50
Epoch 6: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513.

Epoch 6: val_loss did not improve from 0.60610
Epoch 7/50
Epoch 7: val_loss did not improve from 0.60610
Epoch 8/50
Epoch 8: val_loss did not improve from 0.60610
Epoch 9/50
Epoch 9: val_loss improved from 0.60610 to 0.59534, saving model to ../models/06_model_t_tl_feat_ext_data_augm_adam.h5
Epoch 10/50

KeyboardInterrupt: 

---
#### Save Model History

In [None]:
with open("../history/06_model_t_tl_data_augm_adam.pkl", "wb") as file:
    pickle.dump(history.history, file)

---
#### Model Evaluation

In [None]:
val_loss, val_acc = model.evaluate(validation_dataset)
print(f'Classifier Validation Loss: {val_loss:.2f}')
print(f'Classifier Validation Accuracy: {val_acc:.2%}')

---
#### Model Visualization

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.tight_layout()
plt.show()

- Analyzing the training and validation, accuracy and loss over the epochs:
    - We see that the model begins overfitting slightly after the **24th** epoch.
    - The validation accuracy stops improving significantly after the **26th** epoch while the training accuracy keeps improving.
    - The validation loss stops improving significantly after the **24th** epoch while the training loss keeps improving.
    - The best model, based on validation loss, is saved on the **29th** epoch.   

---
#### Model Testing

In [None]:
test_labels = []
test_predictions = []
test_probabilities = []

for images, labels in test_dataset:
    test_labels.extend(labels.numpy())
    predictions = model.predict(images)
    test_predictions.extend(np.argmax(predictions, axis=-1))
    test_probabilities.extend(predictions)

test_labels = np.array(test_labels)
test_predictions = np.array(test_predictions)
test_probabilities = np.array(test_probabilities)

---
#### Confusion Matrix

In [None]:
cm = confusion_matrix(test_labels, test_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)
disp.plot(cmap=plt.cm.Blues, xticks_rotation=90)
plt.show()

- Looking at the confusion matrix, we see that:  
    - The model still has a hard time distinguishing between the categories 003_cat and 005_dog but with less error.  
    - The model has a below average performance on the categories 003_cat, 005_dog and 002_bird, in which we see a very high false positive rate.
    - The model also has a hard time distinguishing between some other categories but the error is not as significant.  
    - The model has an above average performance on the categories 001_automobile, 006_frog, 008_ship and 009_truck.
    - The model has shown a performance increase with higher accuracy across all categories.

---
#### ROC Curve Analysis

In [None]:
test_labels_bin = label_binarize(test_labels, classes=range(NUM_CLASSES))

false_positive_rate = dict()
true_positive_rate = dict()
roc_auc = dict()

for i in range(NUM_CLASSES):
    false_positive_rate[i], true_positive_rate[i], _ = roc_curve(test_labels_bin[:, i], test_probabilities[:, i])
    roc_auc[i] = auc(false_positive_rate[i], true_positive_rate[i])

plt.figure(figsize=(10, 8))
colors = cycle(['aqua', 'darkorange', 'cornflowerblue', 'blue', 'green', 'red', 'purple', 'brown', 'pink', 'grey'])
for i, color in zip(range(NUM_CLASSES), colors):
    plt.plot(false_positive_rate[i], true_positive_rate[i], color=color, lw=2, label=f'Class {class_names[i]} (AUC = {roc_auc[i]:.2f})')

plt.plot([0, 1], [0, 1], 'k--', lw=2)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

- Looking at the ROC curve:
    - We see that the model has a good performance on the ROC curve for most categories.  
    - The categories 003_cat, 002_bird and 005_dog have the worst AUC (Area Under Curve) performance.
    - The other categories have the same performance but with higher AUC.
    - The category 001_automobile, 008_ship and 009_truck has the best AUC performance.
    - The overall AUC performance increases as the false positive rate decreases and the true positive rate increases.
    - **A perfect AUC of 1.0 would mean that the model classifies all images either true positives or true negatives**.

---
#### Performance Metrics
- **Accuracy** is the proportion of correctly predicted instances out of the total instances.  
- **Precision** is the ratio of true positive predictions to the total predicted positives. Macro precision calculates this for each class independently and then averages them.  
- **Weighted precision** calculates the precision for each class, then averages them, weighted by the number of true instances for each class.  
- **Recall** is the ratio of true positive predictions to the total actual positives. Macro recall calculates this for each class independently and then averages them.  
- **Weighted recall** calculates the recall for each class, then averages them, weighted by the number of true instances for each class.  
- The **F1-score** is the harmonic mean of precision and recall. Macro F1-score calculates this for each class independently and then averages them.  
- **Weighted F1-score** calculates the F1-score for each class, then averages them, weighted by the number of true instances for each class.  

In [None]:
acc = accuracy_score(y_true =  test_labels, y_pred = test_predictions)
print(f'Accuracy : {np.round(acc*100,2)}%')
precision = precision_score(y_true =  test_labels, y_pred = test_predictions, average='macro')
print(f'Precision - Macro: {np.round(precision*100,2)}%')
recall = recall_score(y_true =  test_labels, y_pred = test_predictions, average='macro')
print(f'Recall - Macro: {np.round(recall*100,2)}%')
f1 = f1_score(y_true =  test_labels, y_pred = test_predictions, average='macro')
print(f'F1-score - Macro: {np.round(f1*100,2)}%')
precision = precision_score(y_true =  test_labels, y_pred = test_predictions, average='weighted')
print(f'Precision - Weighted: {np.round(precision*100,2)}%')
recall = recall_score(y_true =  test_labels, y_pred = test_predictions, average='weighted')
print(f'Recall - Weighted: {np.round(recall*100,2)}%')
f1 = f1_score(y_true =  test_labels, y_pred = test_predictions, average='weighted')
print(f'F1-score - Weighted: {np.round(f1*100,2)}%')

- **Since the dataset is balanced, the **MACRO** average is a good metric to evaluate the model.**

# Conclusion
### Summary
- Before this notebook:
    - We resized our images to be the 128 x 128 x 3.
    - Our reasoning was up scaling by a factor of 4.

- In this notebook:
    - We extracted feature maps from our train and validation datasets using the convolutional base of the VGG16.
    - We trained a classifier with those extracted features:
        - We used the Root Mean Squared Propagation (RMSProp) optimizer with an initial learning rate of 0.001.
        - We kept the same 30 epochs.
        - We evaluated the classifier. 
        - Overfitting was observed after **15 epochs**, but the best classifier was saved at the **20th epoch**.
    - We then joined the VGG16 Convolutional Base with our Classifier
    - We tested the resulting model
        - The model showed some difficulty distinguishing between certain categories, particularly cats and dogs.
        - Overfitting was observed after **15 epochs**, but the best model was saved at the **20th epoch**.
        - We evaluated the model using a confusion matrix to analyze its performance on each category.
        - We evaluated the model using ROC curves for a deeper performance analysis.
        - **The model achieved an accuracy of 81.83% on the test set**.
    - Performance on the test set was good, with:
        - Macro F1-score: 89.29%
        - Weighted F1-score: 89.29%
        - Macro precision: 89.32%
        - Weighted precision: 89.32%
        - Macro recall: 89.29%
        - Weighted recall: 89.29%

### Future Work
- In the next notebook:
    - Implement and train a transfer learning model with the VGG16 convolutional base frozen and a new classifier.
    - Experiment with data augmentation to improve classifier generalization.
    - Test the model performance.