### Fit a CNN model on the dataset which has been assigned to you. Print a classification report to see the model metrics on train and test datasets.

In [28]:
import pandas as pd
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, BatchNormalization, Dropout, Flatten, Dense, Activation, MaxPool2D, Conv2D
from tensorflow.keras.layers import Conv2D, Dense, Flatten, Dropout, Activation
from tensorflow.keras.layers import BatchNormalization, Reshape, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [10]:
train = "dataset/train"
test = "dataset/test"
train_labels = pd.read_csv('dataset/train.csv')
test_labels = pd.read_csv('dataset/test.csv')

In [11]:
labels = pd.read_csv('dataset/train.csv')
labels_test = pd.read_csv('dataset/test.csv')

In [12]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMAGE_SHAPE = (224, 224)
BATCH_SIZE = 32

#This is for resnet
train_datagen_res = ImageDataGenerator(rescale=1/255.)
test_datagen_res = ImageDataGenerator(rescale=1/255.)

#This is for efficientNet, which does not require image rescaling.
train_datagen_eff = ImageDataGenerator(rescale=None)
test_datagen_eff = ImageDataGenerator(rescale=None)


print("Training images for resnet:")
train_data_res = train_datagen_res.flow_from_dataframe(dataframe=labels, 
                                                       directory=train,
                                                       x_col='image_ID',
                                                       y_col='label',
                                                       target_size=IMAGE_SHAPE,
                                                       batch_size=BATCH_SIZE,
                                                       class_mode="categorical")

print("Testing images for resnet:")
test_data_res = test_datagen_res.flow_from_dataframe(dataframe=labels_test, 
                                                     directory=test,
                                                     x_col='image_ID',
                                                     target_size=IMAGE_SHAPE,
                                                     batch_size=BATCH_SIZE,
                                                     class_mode=None)

print("Training images for efficientnet:")
train_data_eff = train_datagen_eff.flow_from_dataframe(dataframe=labels, 
                                                       directory=train,
                                                       x_col='image_ID',
                                                       y_col='label',
                                                       target_size=IMAGE_SHAPE,
                                                       batch_size=BATCH_SIZE,
                                                       class_mode="categorical")

print("Testing images for efficientnet:")
test_data_eff = test_datagen_eff.flow_from_dataframe(dataframe=labels_test, 
                                                     directory=test,
                                                     x_col='image_ID',
                                                     target_size=IMAGE_SHAPE,
                                                     batch_size=BATCH_SIZE,
                                                     class_mode=None)


Training images for resnet:
Found 8227 validated image filenames belonging to 7 classes.
Testing images for resnet:
Found 2056 validated image filenames.
Training images for efficientnet:
Found 8227 validated image filenames belonging to 7 classes.
Testing images for efficientnet:
Found 2056 validated image filenames.


In [14]:
# Resnet 50 V2 feature vector
resnet_url = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4"

# Original: EfficientNetB0 feature vector (version 1)
efficientnet_url = "https://tfhub.dev/tensorflow/efficientnet/b0/feature-vector/1"

In [16]:
pip install tensorflow_hub

Collecting tensorflow_hub
  Downloading tensorflow_hub-0.12.0-py2.py3-none-any.whl (108 kB)
Installing collected packages: tensorflow-hub
Successfully installed tensorflow-hub-0.12.0
Note: you may need to restart the kernel to use updated packages.


In [17]:
import tensorflow_hub as hub
from tensorflow.keras import layers

def create_model(model_url, num_classes=7): #Since there are 7 output classes in this dataset

  # Download the pretrained model and save it as a Keras layer
  feature_extractor_layer = hub.KerasLayer(model_url,
                                           trainable=False, # freeze the underlying patterns
                                           name='feature_extraction_layer',
                                           input_shape=IMAGE_SHAPE+(3,)) # define the input image shape
  
  # Create our own model
  model = tf.keras.Sequential([
    feature_extractor_layer, # use the feature extraction layer as the base
    layers.Dense(num_classes, activation='softmax', name='output_layer') # create our own output layer      
  ])

  return model

In [18]:
# Create model
resnet_model = create_model(resnet_url, num_classes=7)

# Compile
resnet_model.compile(loss='categorical_crossentropy',
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=['accuracy'])

# Fit the model
resnet_history = resnet_model.fit(train_data_res,
                                  epochs=2,
                                  steps_per_epoch=len(train_data_res),
                                  validation_data=test_data_res,
                                  validation_steps=len(test_data_res))

Epoch 1/2
Epoch 2/2


In [20]:
resnet_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 feature_extraction_layer (K  (None, 2048)             23564800  
 erasLayer)                                                      
                                                                 
 output_layer (Dense)        (None, 7)                 14343     
                                                                 
Total params: 23,579,143
Trainable params: 14,343
Non-trainable params: 23,564,800
_________________________________________________________________


In [21]:
# Create model
efficientnet_model = create_model(model_url=efficientnet_url,
                                  num_classes=7)

# Compile EfficientNet model
efficientnet_model.compile(loss='categorical_crossentropy',
                           optimizer=tf.keras.optimizers.Adam(),
                           metrics=['accuracy'])

# Fit EfficientNet model 
efficientnet_history = efficientnet_model.fit(train_data_eff,
                                              epochs=2,
                                              steps_per_epoch=len(train_data_eff),
                                              validation_data=test_data_eff,
                                              validation_steps=len(test_data_eff))  

Epoch 1/2
Epoch 2/2


In [22]:
efficientnet_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 feature_extraction_layer (K  (None, 1280)             4049564   
 erasLayer)                                                      
                                                                 
 output_layer (Dense)        (None, 7)                 8967      
                                                                 
Total params: 4,058,531
Trainable params: 8,967
Non-trainable params: 4,049,564
_________________________________________________________________


In [24]:
#Functional API
# Create a functional model with data augmentation
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.models import Sequential

# Build data augmentation layer
data_augmentation = Sequential([
  preprocessing.RandomFlip('horizontal'),
  preprocessing.RandomHeight(0.2),
  preprocessing.RandomWidth(0.2),
  preprocessing.RandomZoom(0.2),
  preprocessing.RandomRotation(0.2),                 
], name="data_augmentation")

# Setup input shape and base model, unfreezing the base model layers
input_shape = (224, 224, 3)
base_model = tf.keras.applications.EfficientNetB7(include_top=False)
base_model.trainable = True #Should be set to True for fine tuning

# Freeze all layers except for the
for layer in base_model.layers[:-10]:
  layer.trainable = False

# Create input layer
inputs = layers.Input(shape=input_shape, name="input_layer")

# Add in data augmentation Sequential model as a layer
x = data_augmentation(inputs)

# Give base_model inputs (after augmentation) and don't train it
x = base_model(x, training=False)

# Pool output features of base model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)

# Put a dense layer on as the output
outputs = layers.Dense(7, activation="softmax", name="output_layer")(x)

# Make a model with inputs and outputs
model_tuned = keras.Model(inputs, outputs)

# Compile the model
model_tuned.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
model_final_tuned = model_tuned.fit(train_data_eff,
                    epochs=3,
                    steps_per_epoch=len(train_data_eff),
                    validation_data=test_data_eff,
                    validation_steps=int(0.25* len(test_data_eff))) # validate for less steps

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [25]:
# Check which layers got tuned in the base model-EfficientNet (trainable)
for layer_number, layer in enumerate(base_model.layers):
  print(layer_number, layer.name, layer.trainable)

0 input_2 False
1 rescaling_1 False
2 normalization_1 False
3 tf.math.truediv_1 False
4 stem_conv_pad False
5 stem_conv False
6 stem_bn False
7 stem_activation False
8 block1a_dwconv False
9 block1a_bn False
10 block1a_activation False
11 block1a_se_squeeze False
12 block1a_se_reshape False
13 block1a_se_reduce False
14 block1a_se_expand False
15 block1a_se_excite False
16 block1a_project_conv False
17 block1a_project_bn False
18 block1b_dwconv False
19 block1b_bn False
20 block1b_activation False
21 block1b_se_squeeze False
22 block1b_se_reshape False
23 block1b_se_reduce False
24 block1b_se_expand False
25 block1b_se_excite False
26 block1b_project_conv False
27 block1b_project_bn False
28 block1b_drop False
29 block1b_add False
30 block1c_dwconv False
31 block1c_bn False
32 block1c_activation False
33 block1c_se_squeeze False
34 block1c_se_reshape False
35 block1c_se_reduce False
36 block1c_se_expand False
37 block1c_se_excite False
38 block1c_project_conv False
39 block1c_project_b

328 block4e_se_reshape False
329 block4e_se_reduce False
330 block4e_se_expand False
331 block4e_se_excite False
332 block4e_project_conv False
333 block4e_project_bn False
334 block4e_drop False
335 block4e_add False
336 block4f_expand_conv False
337 block4f_expand_bn False
338 block4f_expand_activation False
339 block4f_dwconv False
340 block4f_bn False
341 block4f_activation False
342 block4f_se_squeeze False
343 block4f_se_reshape False
344 block4f_se_reduce False
345 block4f_se_expand False
346 block4f_se_excite False
347 block4f_project_conv False
348 block4f_project_bn False
349 block4f_drop False
350 block4f_add False
351 block4g_expand_conv False
352 block4g_expand_bn False
353 block4g_expand_activation False
354 block4g_dwconv False
355 block4g_bn False
356 block4g_activation False
357 block4g_se_squeeze False
358 block4g_se_reshape False
359 block4g_se_reduce False
360 block4g_se_expand False
361 block4g_se_excite False
362 block4g_project_conv False
363 block4g_project_bn F

640 block6f_se_reshape False
641 block6f_se_reduce False
642 block6f_se_expand False
643 block6f_se_excite False
644 block6f_project_conv False
645 block6f_project_bn False
646 block6f_drop False
647 block6f_add False
648 block6g_expand_conv False
649 block6g_expand_bn False
650 block6g_expand_activation False
651 block6g_dwconv False
652 block6g_bn False
653 block6g_activation False
654 block6g_se_squeeze False
655 block6g_se_reshape False
656 block6g_se_reduce False
657 block6g_se_expand False
658 block6g_se_excite False
659 block6g_project_conv False
660 block6g_project_bn False
661 block6g_drop False
662 block6g_add False
663 block6h_expand_conv False
664 block6h_expand_bn False
665 block6h_expand_activation False
666 block6h_dwconv False
667 block6h_bn False
668 block6h_activation False
669 block6h_se_squeeze False
670 block6h_se_reshape False
671 block6h_se_reduce False
672 block6h_se_expand False
673 block6h_se_excite False
674 block6h_project_conv False
675 block6h_project_bn F

In [26]:
model_tuned.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 data_augmentation (Sequenti  (None, 224, 224, 3)      0         
 al)                                                             
                                                                 
 efficientnetb7 (Functional)  (None, None, None, 2560)  64097687 
                                                                 
 global_average_pooling_laye  (None, 2560)             0         
 r (GlobalAveragePooling2D)                                      
                                                                 
 output_layer (Dense)        (None, 7)                 17927     
                                                                 
Total params: 64,115,614
Trainable params: 5,353,127
Non-tr

### What is Stride, Padding & Pooling? Explain with an example.
Strides
When the array is created, the pixels are shifted over to the input matrix. The number of pixels turning to the input matrix is known as the strides. When the number of strides is 1, we move the filters to 1 pixel at a time. Similarly, when the number of strides is 2, we carry the filters to 2 pixels, and so on. They are essential because they control the convolution of the filter against the input, i.e., Strides are responsible for regulating the features that could be missed while flattening the image. They denote the number of steps we are moving in each convolution. 

Padding
The padding plays a vital role in creating CNN. After the convolution operation, the original size of the image is shrunk. Also, in the image classification task, there are multiple convolution layers after which our original image is shrunk after every step, which we don’t want. 

Secondly, when the kernel moves over the original image, it passes through the middle layer more times than the edge layers, due to which there occurs an overlap.

To overcome this problem, a new concept was introduced named padding. It is an additional layer that can add to the borders of an image while preserving the size of the original picture. 

Pooling
The pooling layer is another building block of a CNN and plays a vital role in pre-processing an image. In the pre-process, the image size shrinks by reducing the number of parameters if the image is too large. When the picture is shrunk, the pixel density is also reduced, the downscaled image is obtained from the previous layers. Basically, its function is to progressively reduce the spatial size of the image to reduce the network complexity and computational cost. Spatial pooling is also known as downsampling or subsampling that reduces the dimensionality of each map but retains the essential features. A rectified linear activation function, or ReLU, is applied to each value in the feature map. Relu is a simple and effective nonlinearity that does not change the values in the feature map but is present because later subsequent pooling layers are added.
References : https://www.codingninjas.com/codestudio/library/convolution-layer-padding-stride-and-pooling-in-cnn

### What is overfitting? How to overcome overfitting in an ML model?
Overfitting occurs when you achieve a good fit of your model on the training data, while it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant in other data.

We can identify overfitting by looking at validation metrics, like loss or accuracy. Usually, the validation metric stops improving after a certain number of epochs and begins to decrease afterward. The training metric continues to improve because the model seeks to find the best fit for the training data.

There are several manners in which we can reduce overfitting in deep learning models. The best option is to get more training data. Unfortunately, in real-world situations, we often do not have this possibility due to time, budget or technical constraints.

Another way to reduce overfitting is to lower the capacity of the model to memorize the training data. As such, the model will need to focus on the relevant patterns in the training data, which results in better generalization. 
Controlling the iteration is also known as the ‘early stopping’ method in machine learning, this overfitting avoidance technique works only when we have a process where our machine learning model learns iteratively.
If our model’s learning process is iterative, then there is a specific point or iteration until which the model learns new features that we need our model to learn, however, after a certain point, our model will learn noises and that will lead to the condition of overfitting of the model.
Cross-validation is another technique in machine learning that provides the method to solve the overfitting condition. Just like ensemble learning, cross-validation also divides the dataset, but the working is different.

In cross-validation, training data is made to split into several other small train-test splits. These splits help in reducing the error in the model. Now in order to predict the likeability of an event happening, we could use various machine learning algorithms like K-nearest neighbor, support vector machines, or logistic regression, cross-validation provides a method with which we can find the right machine learning algorithm, this is also the reason how it prevents overfitting.
We split usually 75% of the data for training, and the rest 25% for testing, or maybe the split is of 80-20, now the real question is which part of the dataset to keep for training and which for testing, to avoid this confusion, cross-validation cleverly uses 100% data for testing. 
For example, cross-validation would take first 75% of the data for the testing and rest 25% for training and store the results, then it will take first 25% of the data as testing and rest 75% of data for training, this is how every 25% block from the dataset is tested and the result is compared. 
Regularization is another powerful and arguably the most used machine learning technique to avoid overfitting, this method fits the function of the training dataset. This process makes the coefficient shift towards zero, hence reducing the errors. 
