# Example 3.4.1

#Transfer Learning : Training and testing CNN extensions

This example shows an implementation of the EfficientNet that can be found in this [link](https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/).

The experimemt loads the CIFAR-10 dataset, consisting of a set of images of 10 classes corresponding to ... and classifies it. The training is carried out from scratch.

## Import the required libraries

Let us first import the required libraries for performing the classification of the CIFAR-10 dataset previously described in Example  3.3.1. The CNN architecture we are using here is the __EfficientNet-B0__.

For loading the model we need the _efficientnet_ library to be downloaded first.

Use the following command to download it.

      !pip install efficientnet

In [None]:
#!pip install efficientnet

Collecting efficientnet
  Downloading efficientnet-1.1.1-py3-none-any.whl (18 kB)
Collecting keras-applications<=1.0.8,>=1.0.7 (from efficientnet)
  Downloading Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: keras-applications, efficientnet
Successfully installed efficientnet-1.1.1 keras-applications-1.0.8


In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#import the required libraries
import numpy as np
import pandas as pd
import random
import tensorflow as tf
from sklearn.utils.multiclass import unique_labels
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import itertools
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from keras import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau
from keras.layers import Flatten,Dense,Dropout
from tensorflow.keras.utils import to_categorical
from efficientnet import tfkeras as efficientnet
from keras.callbacks import ReduceLROnPlateau
from keras.datasets import cifar10
from keras.models import save_model,load_model
from tensorflow.keras.models import Model
#import efficientnet.keras as efn
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = ['Times New Roman'] + plt.rcParams['font.serif']
np.random.seed(37)
random.seed(1254)
tf.random.set_seed(89)










## Dataset
__CIFAR-10 Dataset__  :
- The dataset consists of color images of  size  60000x32x32.
- It includes 10 different categories/classes of data with the classes being mutually exclusive.

Further details on the dataset can be accessed through this [here](https://www.cs.toronto.edu/~kriz/cifar.html).




### Loading the dataset

Lets look at the data using the following steps:
1. Load the CIFAR-10 dataset directly from Keras datasets using the following command:
        from keras.datasets import cifar10



In [None]:
#Loading Cifar-10 dataset
(x_train,y_train),(x_test,y_test)=cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


### Train test data split
2. Lets next do the train test split of the data. We first import _train_test_split_ using the following:

        from sklearn.model_selection import train_test_split


In [None]:
#Train-validation-test split
x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3)
#Display the size of the CIFAR10 dataset
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

((35000, 32, 32, 3), (35000, 1))
((15000, 32, 32, 3), (15000, 1))
((10000, 32, 32, 3), (10000, 1))


### One-hot encoding of labels
3. Next, we look at the size of the CIFAR10 dataset and perform a Onehot encoding of the labels.
Using __One hot Encoding__ we can convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns.

   For performing we first import _to_categorical_ using following:

        from keras.utils.np_utils import to_categorical




In [None]:
#Onehot Encoding the labels(convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns)
y_train=to_categorical(y_train)
y_val=to_categorical(y_val)
y_test=to_categorical(y_test)


### Image Data Augmentation

4. The next step in the example is to perform Image Data Augmentation.

__Data Augmentation__ is a method of artificially increasing the size of the dataset. This approach can help in improving the performance of the model. It improves the capability of models to generalize. __Keras__ has built-in class called _ImageDataGenerator_ which can fit models using data augmentation.

We import the ImageDataGenerator class using the following command:

    from keras.preprocessing.image import ImageDataGenerator

There are different Augmentation operations that can be performed such as Shifting, flipping, rotating, zooming etc. _ImageDataGenerator_ class allows us to select these operations and attributes associated with them to perform the Data Augmentation.

In this example, we are using the following operations for augmenting the Image data:
- __Rotation__: This operation randomly performs rotation of the image clockwise by given number of degrees. The degrees can be specified between 0 to 360. The _rotation_range_ accepts values between 0 and 90 degrees.
- __Horizontal flip__: Reverses the column pixels. The rows are flipped for __vertical flip__.
- __Zoom__:  The operations zooms the image in or out by removing or adding pixels around pixels. Its a form of interpolation operation. The _zoom_range is used to configure the percentage of zoom. For example, if you specify _zoom_range_ = _0.1_ (a), the actual range will be _[0.9,1.1]_ ([1-_a_, 1+_a_]). This would be between 90% (zoom in) to 110% zoom out.

More details on _ImageDataGenerator_ can be found on [link1](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) and [link2](https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/).

 Using the _ImageDataGenerator_ class we intialize, _train_generator_, _val_generator_, test_generator_ and fit them to the respective data.

Using this method the augmentation is performed while training the model.



In [None]:
#Image Data Augmentation
train_generator = ImageDataGenerator(rotation_range=15, horizontal_flip=True, zoom_range=.1 )
val_generator = ImageDataGenerator(rotation_range=15, horizontal_flip=True, zoom_range=.1)
test_generator = ImageDataGenerator(rotation_range=15, horizontal_flip= True, zoom_range=.1)
#Fitting the augmentation defined above to the data
train_generator.fit(x_train)
val_generator.fit(x_val)
test_generator.fit(x_test)
#Display the final size of the dataset
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))



((35000, 32, 32, 3), (35000, 10))
((15000, 32, 32, 3), (15000, 10))
((10000, 32, 32, 3), (10000, 10))


## Defining the Model

The model that is being used is the **EfficientNetB0** model. EfficientNets were able to efficiently scale up ResNet and MobileNets using compound coefficient scaling model. These family of models were known for its balance in both efficiency and accuracy. The original paper can be found in this [link](http://proceedings.mlr.press/v97/tan19a/tan19a.pdf) and summary of the latest updates on the model can be found [here](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet). EfficientNet family consists of 8 different versions (_B0_ to _B7_). More details regarding the models and the differences in the versions can be found [here](https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/). The input shape is different for all the different versions.

We first import the efficientnet model using the following:

      from efficientnet import tfkeras as efficientnet



### Freezing the top layers for training these layers from scratch

Next, we call the EfficientNetB0 version and set the option to remove the top layers. This is done using the command _include_top=False_.

      effi = efficientnet.EfficientNetB0(include_top=False, weights="None", input_shape=(32,32,3),classes=y_train.shape[1])

Unlike the transfer learning approach here we define the the _weights = None_. This makes sure that the model trains from scratch as it doesn't have any pre-trained weights in the base model. The top layers can be added similar to transfer learning example

In [None]:
#Defining the model

effi = efficientnet.EfficientNetB0(include_top=False, weights=None, input_shape=(32,32,3),classes=y_train.shape[1])
effi.summary()

Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, 32, 32, 3)]          0         []                            
                                                                                                  
 stem_conv (Conv2D)          (None, 16, 16, 32)           864       ['input_1[0][0]']             
                                                                                                  
 stem_bn (BatchNormalizatio  (None, 16, 16, 32)           128       ['stem_conv[0][0]']           
 n)                                                                                               
                                                                                                  
 stem_activation (Activatio  (None, 16, 16, 32)           0         ['stem_bn[0][0]'

### Adding top layers to the model

Next, we add the additional layers separately on top of the base model. In this case, we are adding the 5 dense layers on top of the base model.

Since we are stacking the layers linearly, we can use the _Sequential_ class from keras (More info can be found [here](https://keras.io/api/models/sequential/)).

- First we add the base model _effi_, followed by the other dense layers.
-The last dense layer has 10 outputs corresponding to the number of classes/categories being classified.

Finally, we print the model summary for the entire structure.

In [None]:

# Define the input layer
inputs = effi.input

# Add the base model output
x = effi.output

# Add custom layers on top of the base model
#x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(256, activation='relu')(x)
x = Dense(128, activation='relu')(x)
outputs = Dense(10, activation='softmax')(x)

# Create the model
model = Model(inputs=inputs, outputs=outputs)

# Display the model summary
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 efficientnet-b0 (Functiona  (None, 1, 1, 1280)        4049564   
 l)                                                              
                                                                 
 flatten_1 (Flatten)         (None, 1280)              0         
                                                                 
 dense_5 (Dense)             (None, 1024)              1311744   
                                                                 
 dense_6 (Dense)             (None, 512)               524800    
                                                                 
 dense_7 (Dense)             (None, 256)               131328    
                                                                 
 dense_8 (Dense)             (None, 128)               32896     
                                                      

## Training

### Defining Parameters for training

First, lets initialize the _batch size_, _learning rate_ and _number of epochs_ during training.

- We can make use of _ReduceLROnPlateau_ class here. Let us import this using the following command:

        from keras.callbacks import ReduceLROnPlateau

  __ReduceLROnPlateau__ stands for Reduce learning rate on Plateau which indicates to reduce the learning rate when the metric (here, validation accuracy) has stopped improving. It has the following parameters:
  - _monitor_ : quantity to be monitored. the quantity used here is the '_val_acc_'.
  - _factor_ : factor by which the learning rate will be reduced. new_lr = lr * factor.
  - _patience_ : number of epochs with no improvement after which learning rate will be reduced.


- Next, let us initialize the Optimizer. In this example we are using __Adam__ optimizer. Adam is the optimizer that implements the Adam algorithm. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. We pass the learning rate to the optimizer while initializing it. We import the Adam optimizer using the following command:

         from keras.optimizers import Adam

- __model.compile__ is the final step in configuring the model for training. It uses the following arguments:
  - __optimizer__: String (name of optimizer) or optimizer instance.
  - __loss__: This correspond to the Loss function. For multi class classification we use categorical cross entropy. But when you have only two classes we use binary cross entropy that is a special case of the categorical cross entropy. Other loss functions can also be used. Please refer to [this](https://keras.io/api/losses/) for more information on different kinds of loss functions.
  - __metrics__: We define the metrics that need to be validated which would be the basic criterion being monitored during training of the model. Here, we are monitoring the __accuracy__.



### Training the model

- The last step is to fit the model on the train and validation data. Since, we are using _ImageDataGenerator_ class for data augmentation, we are use the __model.fit__ for fitting the model. We pass the train_generator instance, number of epochs, steps per epoch (calculated based on the size of the trainset and batch size), val_generator and validation steps. Please note that the .fit() function is used for training when either we have a huge dataset to fit into our memory or when data augmentation needs to be applied.

In [None]:
#Defining the parameters
batch_size= 32
epochs=100

#Reduce learning rate when a metric(validation accuracy) has stopped improving.It has the following parameters:
#monitor: quantity to be monitored.
#factor: factor by which the learning rate will be reduced. new_lr = lr * factor.
#patience: number of epochs with no improvement after which learning rate will be reduced.
#min_lr: lower bound on the learning rate.
lrr= ReduceLROnPlateau(monitor='val_loss',factor=.01,patience=3,min_lr=1e-5)

#SGD is the Gradient descent (with momentum) optimizer.it has the following parameters:
#Learning_rate: A Tensor, floating point value that corresponds to the learning rate. Default value is 0.01.
#momentum: float hyperparameter >= 0 that accelerates gradient descent in the relevant direction and dampens oscillations. Default value is 0.
#sgd=SGD(lr=learn_rate,momentum=.9)

#Adam is the optimizer that implements the Adam algorithm. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.
adam = Adam(learning_rate=0.001)
#Configures the model for training. It uses the following arguments:
#optimizer: String (name of optimizer) or optimizer instance.
#loss: Loss function. For multi class classification we use categorical cross entropy. But when you have only two classes we use binary cross entropy that is a special case of the categorical cross entropy.
model.compile(optimizer=adam,loss='categorical_crossentropy',metrics=['accuracy'])
#Training the model
#The .fit_generator() function is used for training when either we have a huge dataset to fit into our memory or when data augmentation needs to be applied.
history=model.fit(train_generator.flow(x_train, y_train, batch_size = batch_size), epochs = epochs, steps_per_epoch = x_train.shape[0]//batch_size, validation_data = val_generator.flow(x_val, y_val, batch_size = batch_size), validation_steps = x_val.shape[0]//batch_size,  callbacks = [lrr], verbose = 1)

  history=model.fit_generator(train_generator.flow(x_train, y_train, batch_size = batch_size), epochs = epochs, steps_per_epoch = x_train.shape[0]//batch_size, validation_data = val_generator.flow(x_val, y_val, batch_size = batch_size), validation_steps = x_val.shape[0]//batch_size,  callbacks = [lrr], verbose = 1)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

## Visualizing the loss and accuracy curves

We can use the matplotlib.pyplot library to visualize the behavious of loss function as well as accuracy curve during both training and validation.

In [None]:
#plot to visualize the loss and accuracy against number of epochs
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.plot(history.history['loss'], label='Training Loss',color='black', linestyle='dashed')
plt.plot(history.history['val_loss'], label='Validation Loss',color='black')
plt.legend()
plt.xlabel('Number of epochs', fontsize=15)
plt.ylabel('Loss', fontsize=15)

plt.subplot(1,2,2)
plt.plot(history.history['accuracy'], label='Train Accuracy',color='black', linestyle='dashed')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy',color='black')
plt.legend()
plt.xlabel('Number of epochs', fontsize=14)
plt.ylabel('Accuracy', fontsize=14)
plt.savefig('/content/drive/MyDrive/DL_Book_Notebooks/Ch3/loss_acc_notf_effb0.pdf')
plt.show()


## Saving the model

The model can be saved in H5 format using __model.save__ command. This is useful in case of testing the model on a new data.

In [None]:
#Save the model in an H5 file
model.save('/content/drive/MyDrive/DL_Book_Notebooks/Ch3/model_notf_effb0.h5')
del model

## Loading a saved model

The function __load_model__ can be used to load a saved model with the same architecture and weights. Use the command for importing the load_model or save_model options from Keras:

       from keras.models import save_model,load_model

In [None]:
#Load the saved model with the same architecture and weights.
model = load_model("/content/drive/MyDrive/DL_Book_Notebooks/Ch3/model_notf_effb0.h5")

## Visualizing the performance using Confusion matrix.


The Seaborn library is used for visualizing the Confusion matrix. The following imports are required for plotting the confusion matrix.

    import seaborn as sns
    from sklearn.metrics import confusion_matrix


In [None]:
#Plotting the confusion matrix
y_test=np.argmax(y_test,axis=1)
y_pred = np.argmax(model.predict(x_test),axis=1)
cm=confusion_matrix(y_test,y_pred)
classes=['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
sns.heatmap(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], annot=True,
            fmt='.2f', xticklabels=classes, cbar=True, yticklabels=classes,cmap='Greys')
plt.title('Confusion matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.savefig('/content/drive/MyDrive/DL_Book_Notebooks/Ch3/confusion_notf_effb0.pdf')
plt.show()


## Overall accuracy

The overall accuracy can be obtained using __accuracy_score__ which can be imported as follows:

    from sklearn.metrics import accuracy_score

In [None]:
#Classification accuracy
from sklearn.metrics import accuracy_score
acc_score = accuracy_score(y_test, y_pred)
print('Accuracy Score = ', acc_score)