<a href="https://colab.research.google.com/github/chenoa23/CV-Projects/blob/main/VGG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="./imgs/logo--vggnet.png" width="800" height="400" align="center"></center>

<h3 style="background-color:#0071c5;color:white;text-align: center;padding-top: 5px;padding-bottom: 5px;border-radius: 15px 50px;"><strong><centre>Introduction</centre></strong></h3>

VGGNet is a Convolutional Neural Network architecture proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford in 2014. It won the 2014 ILSVR (Imagenet) competition. It is regarded as one of the best vision model architectures to date.

<blockquote> 📌 The original paper of VGGnet can be found at <a href='https://arxiv.org/abs/1409.1556'>Link</a> </blockquote>


<h3 style="background-color:#0071c5;color:white;text-align: center;padding-top: 5px;padding-bottom: 5px;border-radius: 15px 50px;"><strong><centre>Architecture(VGG-16) 📚 </centre></strong></h3>

VGG-16 consists of 2 parts:
1. The first part includes thirteen(13) convolutional layers and five pooling layers which are placed alternatively.
2. The second part consists of three fully connected layers.

The architecture of VGG-16 is described by the following figure:
</hr>
<center><img src="https://cdn-images-1.medium.com/max/850/1*_Lg1i7wv1pLpzp2F4MLrvw.png" width="600" height="600" align="center"></center>

A 224*224 RGB image is used as the input to the VGG-based convNet. The preprocessing layer takes an RGB image with pixel values ranging from 0 to 255 and subtracts the mean image values calculated across the entire ImageNet training set.

Architecture walkthrough:
- The first two layers are convolutional layers with 3 * 3 filters, and first two layers use 64 filters that results in 224 * 224 * 64 volume as same convolutions are used. The filters are always 3 * 3 with stride of 1
- After this, pooling layer was used with max-pool of 2 * 2 size and stride 2 which reduces height and width of a volume from 224 * 224 * 64 to 112 *112 * 64.
- This is followed by 2 more convolution layers with 128 filters. This results in the new dimension of 112 * 112 *128.
- After pooling layer is used, volume is reduced to 56 * 56 * 128.
- Two more convolution layers are added with 256 filters each followed by down sampling layer that reduces the size to 28 * 28 * 256.
- Two more stack each with 3 convolution layer is separated by a max-pool layer.
- After the final pooling layer, 7 * 7 * 512 volume is flattened into Fully Connected (FC) layer with 4096 channels and softmax output of 1000 classes.


<h3 style="background-color:#0071c5;color:white;text-align: center;padding-top: 5px;padding-bottom: 5px;border-radius: 15px 50px;"><strong><centre>Transfer learning with VGG-16 on TF-Flower Dataset</centre></strong></h3>

In this section, we apply VGG_16 for classification of flower images.

- Note: VGG networks were not trained to classify different kind of flowers.

Sample Images from the dataset:
<center><img src="https://miro.medium.com/max/1400/1*5IUj-C5CSCAad4ptrH1ddA.png" width="600" height="400" align="center"></center>

## Importing Libraries

In [None]:
import tensorflow as tf
import visualkeras # for visualising the architecture # THIS DID NOT RUN
import tensorflow_datasets as tfds

# Libraries to build model
from tensorflow.keras import layers, models
from tensorflow.keras.utils import to_categorical

# library to implement early stopping
from tensorflow.keras.callbacks import EarlyStopping

# Libraries for VGG-16 transfer learning
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input


ModuleNotFoundError: No module named 'visualkeras'

### 1. Loading TF-Flower dataset

In [None]:
## Loading images and labels
(train_ds, train_labelsn), (test_ds, test_labelsn) = tfds.load(
    "tf_flowers",
    split=["train[:70%]", "train[:30%]"], ## Train test split
    batch_size=-1,
    as_supervised=True,  # Include labels
)

Downloading and preparing dataset 218.21 MiB (download: 218.21 MiB, generated: 221.83 MiB, total: 440.05 MiB) to /root/tensorflow_datasets/tf_flowers/3.0.1...


Dl Completed...:   0%|          | 0/5 [00:00<?, ? file/s]

Dataset tf_flowers downloaded and prepared to /root/tensorflow_datasets/tf_flowers/3.0.1. Subsequent calls will reuse this data.


### 2. Preprocessing data
This task includes the following steps:

- Reshape images into required size of Keras
- Transforming labels into format understandable by model, i.e, into catrgorical data.
- Pre processing the input data using preprocess module which includes normalisation

In [None]:
## Resizing images
train_ds = tf.image.resize(train_ds, (150, 150))
test_ds = tf.image.resize(test_ds, (150, 150))

## Transforming labels to correct format
train_labels = to_categorical(train_labelsn, num_classes=5)
test_labels = to_categorical(test_labelsn, num_classes=5)

## Preprocessing input
train_ds = preprocess_input(train_ds)
test_ds = preprocess_input(test_ds) #Also, we used the preprocess_input function from VGG16 to normalize the input data.

### 3. Buiding the VGG-16 model

In [None]:
## Loading VGG16 model
base_model = VGG16(weights="imagenet", include_top=False, input_shape=train_ds[0].shape)
base_model.trainable = False ## Not trainable weights

# We use Include_top=False to remove the classification layer that was trained on the ImageNet dataset and
    # set the model as not trainable.

# base_model.summary()

#visualkeras.layered_view(base_model)

Since we removed the classification layers of the model, we are left with model that generates features, so we add the layers that are needed for classification.

In [None]:
flatten_layer = layers.Flatten()
dense_layer_1 = layers.Dense(50, activation='relu')
dense_layer_2 = layers.Dense(20, activation='relu')
prediction_layer = layers.Dense(5, activation='softmax')


model = models.Sequential([
    base_model,
    flatten_layer,
    dense_layer_1,
    dense_layer_2,
    prediction_layer])

base_model.summary()
visualkeras.layered_view(base_model)

NameError: name 'visualkeras' is not defined

### 4. Training the model

We employ early stopping mechanism to ensure our model doesn't overfit.



In [None]:

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)


es = EarlyStopping(monitor='val_accuracy', mode='max', patience=5,  restore_best_weights=True)

# model.fit(train_ds, train_labels, epochs=50, validation_split=0.2, batch_size=32, callbacks=[es])

# the above line to be un-commented if willing to train. The training will take about 45 minutes.

### 5. Save the trained model

<p style="color:red;"> Note: Execute the following code of this section only if the model is trained. Else proceed to load the saved model in the directory "VGG-model"</p>


Since the model takes lot of time to train, to avoid the burden of training again, we can save the model for resuse later. Please note that, the model that has been trained on a dataset can be loaded and tested on the same dataset.

In [None]:
# tf.keras.models.save_model(
#     model, './VGG-model', overwrite=True, include_optimizer=True, save_format=None,
#     signatures=None, options=None, save_traces=True
# )

### 6. Evaluate the model

Load the saved model and evaluate the accuracy of prediction

In [None]:
loaded_model = tf.keras.models.load_model('./VGG-model')

ValueError: File format not supported: filepath=./VGG-model. Keras 3 only supports V3 `.keras` files and legacy H5 format files (`.h5` extension). Note that the legacy SavedModel format is not supported by `load_model()` in Keras 3. In order to reload a TensorFlow SavedModel as an inference-only layer in Keras 3, use `keras.layers.TFSMLayer(./VGG-model, call_endpoint='serving_default')` (note that your `call_endpoint` might have a different name).

In [None]:
score = loaded_model.evaluate(test_ds, test_labels)

In [None]:
print("------> Accuracy of Model on test data: %.2f" %(score[1]*100))

NameError: name 'score' is not defined



<h3 style="background-color:#0071c5;color:white;text-align: center;padding-top: 5px;padding-bottom: 5px;border-radius: 15px 50px;"><strong><centre>Conclusion</centre></strong></h3>

1. The VGG-16 architecture has been implemented and explained.
2. The model gave 97.46% accuracy on test images.


**Reference:**

📌 Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.