<a href="https://colab.research.google.com/github/CesarRoldan99/CE888_CesarRoldan/blob/main/Assignment2/VGG_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Science and Decision Making - Second Assignment

All the following cells correspond to the process of loading, manipulating, split the data, as well as, train, save and test the model. 

*Keep the Library secion intact, as they are important for the proper function of the entire code. The requirements are in a README.txt file in the repository.*

In [None]:
# Libraries.
from tensorflow import keras
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras import Model, layers
import matplotlib.pyplot as plt

*Training and test sets of images were uploaded to Google's drive. Therefore, if you want to make experiments, both train and test images should be in drive. Upload the images, right-click over the folder, copy and paste the file's path.*

In [None]:
# Directories of data.
d_training = "/content/drive/MyDrive/FLAME_DATASET/Training_R_V2"
d_test = "/content/drive/MyDrive/FLAME_DATASET/Test_R"

*This section can be modified or left as it is. Validation-split, subset, label_mode, shuffle and seed are extremely important. Modify with care!*

In [None]:
# Image loading into train, validation, and test.
train = image_dataset_from_directory(d_training,
                                     batch_size=32,
                                     labels="inferred",
                                     shuffle=True,
                                     seed=100,
                                     validation_split=0.2,
                                     subset="training",
                                     label_mode="binary")

validation = image_dataset_from_directory(d_training,
                                          batch_size=32,
                                          labels="inferred",
                                          shuffle=True,
                                          seed=100,
                                          validation_split=0.2,
                                          subset="validation",
                                          label_mode="binary")

test = image_dataset_from_directory(d_test,
                                    batch_size=32,
                                    labels="inferred",
                                    shuffle=True,
                                    label_mode="binary")

*Augmentation section. This only apply to the images within the training set. The resizing layer NEEDS to be modified is other model is expected to be used. VGG19 has an input of 224 x 224 x 3. Rescaling layer is also important as the normalization of the data improves efficiency while fitting the model.*

In [None]:
# Preprocessing Sequence
augmentation = keras.Sequential(
    [
        layers.experimental.preprocessing.Resizing(224, 224),
        layers.experimental.preprocessing.RandomFlip(mode="horizontal_and_vertical"),
        layers.experimental.preprocessing.RandomRotation(0.1),
        layers.experimental.preprocessing.Rescaling(1.0 / 255)
    ]
)

# Augmentation of data
train_gen = train.map(lambda x, y: (augmentation(x, training=True), y))
train_gen = train_gen.prefetch(buffer_size=32)

*This section resizes and normalize the data for the validation and the test sets.*

In [None]:
# Data normalization
norm = keras.Sequential(
    [
        layers.experimental.preprocessing.Resizing(224, 224),
        layers.experimental.preprocessing.Rescaling(1.0 / 255)
    ]
)

val_gen = validation.map(lambda x, y: (norm(x, training=True), y))
val_gen = val_gen.prefetch(buffer_size=32)

test_gen = test.map(lambda x, y: (norm(x, training=True), y))
test_gen = test_gen.prefetch(buffer_size=32)

*Using Keras for loading the pre-trained VGG19 model. Important: then using Imagenet's weights, classes' number must be 1000. The fully connected layers have been modified for binary classification.*

In [None]:
# Loading pretrained model
base_model = keras.applications.VGG19(include_top=True,
                                      weights="imagenet",
                                      classes=1000)

# Feature extraction
img_input = base_model.inputs

x = base_model.get_layer("block5_pool").output
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dropout(0.1)(x)
x = Dense(2048, activation="relu", name="fc2")(x)
x = Dense(1, activation='sigmoid', name='out')(x)

model = Model(img_input, x)

for layer in model.layers[:-5]:
    layer.trainable = False


*Compilation of the model using rmsprop, binary crossentropy and accuracy as the main metric.*

In [None]:
# Compilation of model
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

*Fitting the model using the augmented train images, ten epochs and validated using the normalized validation set.*

In [None]:
# Training of model
history = model.fit(train_gen, 
                    epochs=10, 
                    validation_data=val_gen, 
                    batch_size=32)

*This section plots both accuracy and loss during the training.*

In [None]:
# Plotting the accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

# Plotting the loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

*Saving the model. The directory need to be changed to a folder made in your own drive. Run this cell only if you want to save the model.*

In [None]:
# Save model
Save_Dir = "/content/drive/MyDrive/FLAME_DATASET/Saved_Model"
model.save(Save_Dir)

*Evaluation of the model using the normalized test set.*

In [None]:
# Evaluate model
results = model.evaluate(test_gen, 
                         batch_size=32, 
                         verbose=1)