




<h3>Natural Image Classification using Pytorch VGGNet and CNN</h3>

This is an image classification problem. I have solved it using two method VGGNet and CNN. <a href="https://drive.google.com/open?id=19L7j75M9iB4T0YpZ6sboQZ_h-5ryrcVC">Dataset</a>

In [2]:
"""Getting the data from google drive"""

from google.colab import drive
drive.mount('/content/drive')



Enter your authorization code:
··········
Mounted at /content/drive


<h2>Image Classification using CNN</h2>

In [0]:
"""Importing Necessary Libraries"""
import pandas as pd
import numpy as np
import os
from keras.models import Sequential
from keras.layers import Convolution2D,BatchNormalization
from keras.layers import MaxPooling2D,Dropout
from keras.layers import Flatten
from keras.layers import Dense
import cv2
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

from keras.preprocessing.image import img_to_array
import random
from keras.preprocessing.image import ImageDataGenerator

In [0]:
"""Importing the data folder and giving a shuffle"""
dataset=[]
labels=[]
random.seed(42)
imagePaths = sorted(list(os.listdir("/content/drive/My Drive/Colab Notebooks/natural_images")))
random.shuffle(imagePaths)

In [0]:
for images in imagePaths:
    path=sorted(list(os.listdir("/content/drive/My Drive/Colab Notebooks/natural_images/"+images)))
    for i in path:
        image = cv2.imread("/content/drive/My Drive/Colab Notebooks/natural_images/"+images+'/'+i) #using opencv to read image
        image = cv2.resize(image, (128,128)) 
        image = img_to_array(image) #converting image info to array
        dataset.append(image)
 
        l = label = images
        labels.append(l)

In [0]:
"""Converting to numpay array"""
dataset = np.array(dataset, dtype="float") / 255.0
labels = np.array(labels)

"""Here we are using LabelBinarizer to scale data because it does not need data in integer encoded form first to convert into its respective encoding"""
lb = LabelBinarizer()
labels = lb.fit_transform(labels)

In [0]:
"""Splitting dataset into train and test"""

x_train,x_test,y_train,y_test=train_test_split(dataset,labels,test_size=0.2,random_state=42)

<h3>Creating CNN model</h3>

We will build a sequential classifier. Here we will create full connected CNN model.

In first layer as input will come here first, we have patch size 3x3 with input shaper 128 as we have resized image in 128 size. We are using ReLU as an activation function because ReLU is sparsity and reduced likelihood of vanishing gradient. Sigmoid or others might generate some non-zero value resulting in dense representations. In this layer, we are using padding value "Same" because tensorflow then tries to spread evenly on both side. To reduce variance, reduce computation complexity and extract low level features from neighbourhood, we need to perform pooling. For that here we have used MaxPooling2D with pool size (2,2) because it extracts the most important features like edges whereas, average pooling extracts features so smoothly. we are using filter size 32 in layer 1. After that, we have used drop out to reduce overfitting an image.

For second layer, we have increased the filter size by keeping all other attributes are same. We have done same with other layers.

In the last layer we have first flatten the output to get result in array. 

At the end, we have connected all the layer to make it a fully connected CNN and give output dimension 8 as we have 8 classes. Then we have use softmax as activation function because 
<ul>
  <li>Each value ranges between 0 and 1</li>
  <li>The sum of all values is always 1</li>
  </ul>
  In multi class problem for example 5, output can look like [0, 0, 0, 1, 0] which is easy to understand and compute.

In [11]:
modelClassifier = Sequential()

# Step 1 - Convolution

modelClassifier.add(Convolution2D(32, (3, 3), input_shape = (128, 128, 3), activation = 'relu',padding='same'))

modelClassifier.add(MaxPooling2D(pool_size = (2, 2)))
modelClassifier.add(Dropout(0.25))


modelClassifier.add(Convolution2D(64, (3, 3), activation = 'relu',padding='same'))

modelClassifier.add(Convolution2D(64, (3, 3), activation = 'relu',padding='same'))

modelClassifier.add(MaxPooling2D(pool_size = (2, 2)))
modelClassifier.add(Dropout(0.25))


modelClassifier.add(Convolution2D(128, (3, 3), activation = 'relu',padding='same'))

modelClassifier.add(Convolution2D(128, (3, 3), activation = 'relu',padding='same'))


modelClassifier.add(MaxPooling2D(pool_size = (2, 2)))
modelClassifier.add(Dropout(0.25))

# Step 3 - Flattening
modelClassifier.add(Flatten())
modelClassifier.add(Dense(1024,activation='relu'))
modelClassifier.add(BatchNormalization())
modelClassifier.add(Dropout(0.5))

# Step 4 - Full connection
modelClassifier.add(Dense(output_dim = 8, activation = 'softmax'))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.




In [12]:
"""Now we need compile our model"""

"""
For optimizer, we are using Adam optimizer because Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems.
Besides, it is easy to configure.

We are using categorical crossentropy as loss function because

if we use cross-entropy error, the (output) * (1 – output) term goes away. So, the weight changes don’t get smaller and smaller and so training isn’t s likely to stall out.
"""
modelClassifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

modelClassifier.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 128, 128, 32)      896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 64, 64, 64)        18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 64, 64, 64)        36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 32, 32, 64)        0         
__________

In [0]:
train_datagen = ImageDataGenerator(rescale = 1./255, #here we need to rescale data as we have set this in our dataset earlier
                                   zoom_range = 0.2,
                                   horizontal_flip=True,
                                   vertical_flip=True,
                                   shear_range=0.2,
                                  )

In [15]:
modelClassifier.fit_generator(train_datagen.flow(x_train,y_train,batch_size=512),
                         epochs = 20,
                         steps_per_epoch=10,
                         validation_data=(x_test,y_test),
                         )

Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7fd0ce2e8470>

In [18]:
"""Testing whether it can detect image successfully or not"""

list=['airplane','car','cat','dog','flower','fruit','motorbike','person']
image = cv2.imread('/content/drive/My Drive/Colab Notebooks/natural_images/flower/flower_0076.jpg')


# pre-process the image for classification
image = cv2.resize(image, (128,128))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)


pred = modelClassifier.predict(image)[0]

for i in range(8):
    if pred[i]>0.5:
        print(list[i])

flower


We can see from above result that, we have sucessfully detected the class of a given image.

<h2>Image Classification using VGGNet</h2>


For VGGNet, I have used a pre train model to get higher accuracy.  We train a model on the natural image dataset available on <a href="https://drive.google.com/open?id=19L7j75M9iB4T0YpZ6sboQZ_h-5ryrcVC">this drive link</a> using transfer learning techniques to extract features from a pre-trained model to achieve high accuracy classification of this dataset.

In [0]:
"""Importing necessary libraries"""

from torchvision import transforms, datasets, models
import torch
from torch import optim, cuda
from torch.utils.data import DataLoader, sampler, random_split
import torch.nn as nn

from PIL import Image
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import random

In [2]:
classes = []
image_classes = []
n_image = []
height = []
width = []
dimension = []


# identifing classes using folder names
for folder in os.listdir('/content/drive/My Drive/Colab Notebooks/natural_images'):
    classes.append(folder)
    
    # Getting the number of image
    images = os.listdir('/content/drive/My Drive/Colab Notebooks/natural_images/'+folder)
    n_image.append(len(images))
    for i in images:
        image_classes.append(folder)
        img = np.array(Image.open('/content/drive/My Drive/Colab Notebooks/natural_images/'+folder+'/'+i))
        height.append(img.shape[0])
        width.append(img.shape[1])
    dimension.append(img.shape[2])
    
df = pd.DataFrame({
    'classes': classes,
    'number': n_image,
    "dim": dimension
})
print("Random heights:" + str(height[10]), str(height[123]))
print("Random Widths:" + str(width[10]), str(width[123]))

Random heights:110 100
Random Widths:271 100


In [3]:
image_df = pd.DataFrame({
    "classes": image_classes,
    "height": height,
    "width": width
})
image_dataframe = image_df.groupby("classes").describe()
image_dataframe

Unnamed: 0_level_0,height,height,height,height,height,height,height,height,width,width,width,width,width,width,width,width
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
classes,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
airplane,100.0,95.15,19.978461,56.0,80.0,92.0,110.0,169.0,100.0,296.6,13.862966,256.0,289.0,297.0,304.0,344.0
car,100.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0
cat,100.0,305.57,92.240053,101.0,233.25,311.0,370.25,497.0,100.0,308.09,102.849905,117.0,223.5,304.5,399.5,496.0
dog,100.0,304.33,106.701893,50.0,230.75,320.5,372.25,493.0,100.0,274.96,112.835262,57.0,187.75,266.0,354.75,493.0
flower,100.0,304.65,142.4269,59.0,187.75,316.0,373.5,714.0,100.0,363.78,188.117231,70.0,198.5,365.0,485.0,996.0
fruit,100.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0
motorbike,100.0,108.2,11.737878,71.0,100.75,108.0,116.0,142.0,100.0,191.77,10.10336,155.0,187.75,193.0,198.0,213.0
person,100.0,256.0,0.0,256.0,256.0,256.0,256.0,256.0,100.0,256.0,0.0,256.0,256.0,256.0,256.0,256.0


As we can see from the output that the dataset is well balanced.

In [0]:
# Data augmentation and normalization for training and testing
# normalization for validation
# transform.compose clubs all the transforms provided to it.

transform_images = {
    # Train data using data augmentation
    'train':
    transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.95, 1.0)), #This will extract a patch of size (256, 224) from input image and scale down to (.95,1)
        transforms.RandomRotation(degrees=15), #rotating image to an angle to get better view
        transforms.ColorJitter(), #giving default colorjitter so that it does not change  brightness, contrast and saturation of image.
        transforms.RandomHorizontalFlip(), #using default to check what is probability of an image being flipped (default 0.5)
        transforms.CenterCrop(size=224),  # Cropping the center part of image from 256 size. Image net standards
        transforms.ToTensor(), #converting input image to PyTorch tensor
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])  # scaling input data. Imagenet standards
    ]),
    # Validation data using data augmentation
    'val':
    transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    # Test data using data augmentation
    'test':
    transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}


In [5]:
batch_size_list = [128, 256, 512] #setting up many batch size to use them for letter

batch = batch_size_list[0] #selecting a batch size

#getting all the data
full_data = datasets.ImageFolder(root='/content/drive/My Drive/Colab Notebooks/natural_images')

# getting length of train, test and validation dataset length
train_data_len = int(len(full_data)*0.8) #getting 80% of dataset as train
valid_data_len = int((len(full_data) - train_data_len)/2) #getting half of rest data as validation set
test_data_len = int(len(full_data) - train_data_len - valid_data_len) #getting rest of dataset as test data

# splitting dataset into train, test and validation set
train_data, val_data, test_data = random_split(full_data, [train_data_len, valid_data_len, test_data_len])
train_data.dataset.transform = transform_images['train']
val_data.dataset.transform = transform_images['val']
test_data.dataset.transform = transform_images['test']

print("Length of train dataset: {} \nLength of validation dataset: {} \nLength of test dataset: {}".format(len(train_data), len(val_data), len(test_data)))

#loading dataset using dataloader from pytorch
train_loader = DataLoader(train_data, batch_size=batch, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch, shuffle=True)

Length of train dataset: 5519 
Length of validation dataset: 690 
Length of test dataset: 690


In [6]:
#getting training data
train_iterator = iter(train_loader)
features, labels = next(train_iterator)
print(features.shape, labels.shape)

torch.Size([128, 3, 224, 224]) torch.Size([128])


In [7]:
"""Getting pre trained VGGNet Model"""
model = models.vgg16(pretrained=True)
model

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d

As we can see that, this model uses Conv2d with kernel size (3,3), stride (1,1) and padding (1,1). It uses ReLU inplace and MaxPool2d with kernel size 2

In [0]:
# We need to freeze early layers
for param in model.parameters():
    param.requires_grad = False
    

After freezing the pre-trained layers of the network, we need to define classifier layer which we will train to suit our dataset and use case

In [0]:
"""Model classifier with dropout"""

no_classes = 8
no_inputs = model.classifier[6].in_features
# we are going to use 6th classifier
# in which case input_features=4096, output_features=1000

# Add on classifier
model.classifier[6] = nn.Sequential(
    nn.Linear(no_inputs, 256),
    nn.ReLU(),
    nn.Dropout(0.4), #handling dropout
    nn.Linear(256, no_classes),
    nn.LogSoftmax(dim=1))

In [10]:
"""
Here for loss function we are using Negative Logarithmic Likelihood
Because it is used when the model outputs a probability for each class, rather than just the most likely class. It is a “soft” measurement of accuracy that incorporates the idea of probabilistic confidence.
"""
scale = nn.NLLLoss()

"""
For optimizer, we are using Adam optimizer because Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems.
Besides, it is easy to configure
"""
optimizer = optim.Adam(model.parameters(), lr=0.001)

model.cuda()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d

In [11]:
"""Converting class to integer number so that it is easy to execute"""
model.class_to_idx = full_data.class_to_idx
model.idx_to_class = {
    idx: class_
    for class_, idx in model.class_to_idx.items()
}
print("List of class index: ")
list(model.idx_to_class.items())

List of class index: 


[(0, 'airplane'),
 (1, 'car'),
 (2, 'cat'),
 (3, 'dog'),
 (4, 'flower'),
 (5, 'fruit'),
 (6, 'motorbike'),
 (7, 'person')]

In [0]:
def train(model,criterion,optimizer,train_loader,val_loader,save_location,early_stop=3,n_epochs=20,print_every=2):
   
#Initializing some variables
  valid_loss_min = np.Inf
  stop_count = 0
  valid_max_acc = 0
  history = []
  model.epochs = 0
  
  #Loop starts here
  for epoch in range(n_epochs):
    
    train_loss = 0
    valid_loss = 0
    
    train_acc = 0
    valid_acc = 0
    
    model.train()
    ii = 0
    
    for data, label in train_loader:
      ii += 1
      data, label = data.cuda(), label.cuda()
      optimizer.zero_grad()
      output = model(data)
      
      loss = criterion(output, label)
      loss.backward()
      optimizer.step()
      
      # Track train loss by multiplying average loss by number of examples in batch
      train_loss += loss.item() * data.size(0)
      
      # Calculate accuracy by finding max log probability
      _, pred = torch.max(output, dim=1) # first output gives the max value in the row(not what we want), second output gives index of the highest val
      correct_tensor = pred.eq(label.data.view_as(pred)) # using the index of the predicted outcome above, torch.eq() will check prediction index against label index to see if prediction is correct(returns 1 if correct, 0 if not)
      accuracy = torch.mean(correct_tensor.type(torch.FloatTensor)) #tensor must be float to calc average
      train_acc += accuracy.item() * data.size(0)
      if ii%15 == 0:
        print(f'Epoch: {epoch}\t{100 * (ii + 1) / len(train_loader):.2f}% complete.')
      
    model.epochs += 1
    with torch.no_grad():
      model.eval()
      
      for data, label in val_loader:
        data, label = data.cuda(), label.cuda()
        
        output = model(data)
        loss = criterion(output, label)
        valid_loss += loss.item() * data.size(0)
        
        _, pred = torch.max(output, dim=1)
        correct_tensor = pred.eq(label.data.view_as(pred))
        accuracy = torch.mean(correct_tensor.type(torch.FloatTensor))
        valid_acc += accuracy.item() * data.size(0)
        
      train_loss = train_loss / len(train_loader.dataset)
      valid_loss = valid_loss / len(val_loader.dataset)
      
      train_acc = train_acc / len(train_loader.dataset)
      valid_acc = valid_acc / len(val_loader.dataset)
      
      history.append([train_loss, valid_loss, train_acc, valid_acc])
      
      if (epoch + 1) % print_every == 0:
        print(f'\nEpoch: {epoch} \tTraining Loss: {train_loss:.4f} \tValidation Loss: {valid_loss:.4f}')
        print(f'\t\tTraining Accuracy: {100 * train_acc:.2f}%\t Validation Accuracy: {100 * valid_acc:.2f}%')
        
      if valid_loss < valid_loss_min:
        torch.save(model.state_dict(), save_location)
        stop_count = 0
        valid_loss_min = valid_loss
        valid_best_acc = valid_acc
        best_epoch = epoch
        
      else:
        stop_count += 1
        
        # Below is the case where we handle the early stop case
        if stop_count >= early_stop:
          print(f'\nEarly Stopping Total epochs: {epoch}. Best epoch: {best_epoch} with loss: {valid_loss_min:.2f} and acc: {100 * valid_acc:.2f}%')
          model.load_state_dict(torch.load(save_location))
          model.optimizer = optimizer
          history = pd.DataFrame(history, columns=['train_loss', 'valid_loss', 'train_acc','valid_acc'])
          return model, history
        
  model.optimizer = optimizer
  print(f'\nBest epoch: {best_epoch} with loss: {valid_loss_min:.2f} and acc: {100 * valid_acc:.2f}%')
  
  history = pd.DataFrame(history, columns=['train_loss', 'valid_loss', 'train_acc', 'valid_acc'])
  return model, history

In [13]:
model, history = train(model, scale, optimizer, train_loader, val_loader, save_location='./natural_images_using_vggnet.pt', early_stop=6, n_epochs=30, print_every=2)

Epoch: 0	36.36% complete.
Epoch: 0	70.45% complete.
Epoch: 1	36.36% complete.
Epoch: 1	70.45% complete.

Epoch: 1 	Training Loss: 0.0215 	Validation Loss: 0.0500
		Training Accuracy: 99.53%	 Validation Accuracy: 99.28%

Best epoch: 1 with loss: 0.05 and acc: 99.28%


In [19]:
history

Unnamed: 0,train_loss,valid_loss,train_acc,valid_acc
0,0.145062,0.05383,0.964486,0.991304
1,0.021486,0.04999,0.995289,0.992754


In [0]:
"""Getting accuracy on test data"""
def accuracy(model, test_loader, loss):
  with torch.no_grad():
    model.eval()
    test_acc = 0
    for data, label in test_loader:
      data, label = data.cuda(), label.cuda()
      
      output = model(data)
      
      _, pred = torch.max(output, dim=1)
      correct_tensor = pred.eq(label.data.view_as(pred))
      accuracy = torch.mean(correct_tensor.type(torch.FloatTensor))
      test_acc += accuracy.item() * data.size(0)
      
    test_acc = test_acc / len(test_loader.dataset)
    return test_acc

In [23]:
model.load_state_dict(torch.load('./natural_images.pt'))
test_acc = accuracy(model.cuda(), test_loader, scale)
print(f'The model has achieved an accuracy of {100 * test_acc:.2f}% on the test dataset')

The model has achieved an accuracy of 99.71% on the test dataset


<h2>Discussion</h2>

For this problem, we can that
<table>
  <thead>
    <th>Spec</th>
    <th>CNN with cv2</th>
    <th>VGGNet</th>
  </thead>
  <tbody>
    <tr>
      <td>Validation loss</td>
      <td>0.1891</td>
      <td>0.04999</td>
    </tr>
    <tr>
      <td>Test score</td>
      <td> 0.8034</td>
      <td>0.9971</td>
    </tr>
  </tbody>
  </table>
  
  VGGNet is performing better than CNN with CV2 model. Because VGGNet  is characterized by its simplicity, using only 3×3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling.
  Unfortunately, there are two major drawbacks with VGGNet:
  <ul>
  <li>It is slow to train.</li>
  <li>The network architecture weights themselves are quite large</li>
</ul>

There are some difficulties I have faced during this problem. In google colab, to run VGGNet sometimes, runtime memory becomes full and i had to manage session variables.
  

<h2>Reference</h2>

1. https://keras.io/preprocessing/image/
2. https://www.quora.com/What-are-the-advantages-of-the-Adam-and-RMSProp-optimization-algorithms-over-gradient-descent-or-stochastic-gradient-descent
3. https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
4. https://www.quora.com/What-is-the-benefit-of-using-average-pooling-rather-than-max-pooling
5. https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/
6. https://keras.io/layers/convolutional/
7. https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/
8. https://www.quora.com/What-is-the-benefit-of-using-softmax-function-in-the-last-layer-of-DNN-What-is-the-relation-between-cross-entropy-and-loss-functions
9. https://medium.com/themlblog/image-data-augmentation-using-keras-a6a61edbc59f
10. https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/