Implementation of VGG16 Model by using CIFER10 Datatset 
It is also called the OxfordNet model, named after the Visual Geometry Group from Oxford.
Number 16 refers that it has a total of 16 layers that has some weights.
It Only has Conv and pooling layers in it.
Always use a 3 x 3 Kernel for convolution.
2x2 size of the max pool.
has a total of about 138 million parameters.
Trained on ImageNet data
It has an accuracy of 92.7%.
it has one more version of it Vgg 19, a total of 19 layers with weights.

In [None]:
#Liberaries that are needed to build VGG16 model 

import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import time
import torch.nn.functional as F
import torch.nn as nn
import matplotlib.pyplot as plt
from torchvision import models


In [None]:
# check GPU availability
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


In [None]:
# Here We need to convert dataset to tensor because tensor is efficient than numpy. compose function takes two parameters. tensor to convert dataset into array and normalize function to obtain good accuracy.

transform = transforms.Compose(
    [transforms.Resize((224, 224)),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# training dataset. root to save data in directory, train to train data, download to download data, tf to assign above transform function in training data

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
# DataLoader wraps an iterable around the Dataset to enable easy access to the samples. num_workers to increase the number of processes running simultaneously by allowing multiprocessing.

# batch size takes 4 samples of dataset at a time to avoid high computations and to save time
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True)

#same procedure for testing data.
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
                                         shuffle=False)



Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=0.0, max=170498071.0), HTML(value='')))


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


**Preparing VGG16 model with with 5 layer block because VGG16 have 5 block of layers for training**

In [None]:


#Defining Class which is Extending neuralnetwork(nn) module   
class VGG16(torch.nn.Module):

    def __init__(self, num_classes):                  #Initializing constructor of class

        #Initialize Super class

        super().__init__()

        #Note: VGG16 has 5 blocks of layers that's why we are creating 5 layer blocks 

        #Creating layers for VGG16 model

        #creating sequential container where convulational layer is designed.  Sequential method means that all the layers of the model will be arranged in sequence.
        
        
        self.block_1 = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=3,                #in_channels refers to input channels to convulutionallayer. Our image is in RGB form so input channels are 3
                                out_channels=64,              #out_channels refers to output channels to convulutionallayer. Output channels are input of another layer and so on.
                                kernel_size=(3, 3),           #The kernels in the convolutional layer, are the convolutional filters. The kernel size here refers to the widthxheight of the filter mask.          
                                stride=(1, 1),                #Stride is a parameter of the neural network's filter that modifies the amount of movement over the image. here stride is 1 means image is move 1 pixel
                                padding=1),                   #Padding refers to the amount of pixels added to an image when it is being processed by the kernel of a CNN. Here pixel value that is added is 1 pixel
                torch.nn.ReLU(),                              #relu is non linear activation function which help to decide if the neuron would fire or not
                torch.nn.Conv2d(in_channels=64,                 #same as above
                                out_channels=64,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2, 2),        #Max pool convolve most prominent features      
                                   stride=(2, 2))
        )
        
        self.block_2 = torch.nn.Sequential(                     #definig other blocks same as above to train model efficiently
                torch.nn.Conv2d(in_channels=64,
                                out_channels=128,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),
                torch.nn.Conv2d(in_channels=128,
                                out_channels=128,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2, 2),
                                   stride=(2, 2))
        )
        
        self.block_3 = torch.nn.Sequential(        
                torch.nn.Conv2d(in_channels=128,
                                out_channels=256,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),
                torch.nn.Conv2d(in_channels=256,
                                out_channels=256,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),        
                torch.nn.Conv2d(in_channels=256,
                                out_channels=256,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2, 2),
                                   stride=(2, 2))
        )
        
          
        self.block_4 = torch.nn.Sequential(   
                torch.nn.Conv2d(in_channels=256,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),        
                torch.nn.Conv2d(in_channels=512,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),        
                torch.nn.Conv2d(in_channels=512,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),            
                torch.nn.MaxPool2d(kernel_size=(2, 2),
                                   stride=(2, 2))
        )
        
        self.block_5 = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=512,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),            
                torch.nn.Conv2d(in_channels=512,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),            
                torch.nn.Conv2d(in_channels=512,
                                out_channels=512,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.ReLU(),    
                torch.nn.MaxPool2d(kernel_size=(2, 2),
                                   stride=(2, 2))             
        )
            
        height, width = 3, 3                                      #initializing height and width of image
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(512*height*width, 4096),              #Implmenting linear layer 
            torch.nn.ReLU(True),                                  #Relu= true for relu activation function
            torch.nn.Dropout(p=0.5),                              #tO Ignore some neurons during training
            torch.nn.Linear(4096, 4096),
            torch.nn.ReLU(True),
            torch.nn.Dropout(p=0.5),
            torch.nn.Linear(4096, num_classes),
        )

        #Distributing data Uniformly 
            
        for m in self.modules():
            if isinstance(m, torch.torch.nn.Conv2d) or isinstance(m, torch.torch.nn.Linear):
                torch.nn.init.kaiming_uniform_(m.weight, mode='fan_in', nonlinearity='relu')
                if m.bias is not None:
                    m.bias.detach().zero_()
                    
        self.avgpool = torch.nn.AdaptiveAvgPool2d((height, width))                 #AvgPool selects average prominent features. It will not take unimportant features.
        
        
    def forward(self, x):                                             #forward method will use all of the layers we defined inside the constructor. In this way, the forward method explicitly defines the network's transformation.

        x = self.block_1(x)                                      
        x = self.block_2(x)
        x = self.block_3(x)
        x = self.block_4(x)
        x = self.block_5(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1) # flatten
        
        logits = self.classifier(x)                                     #classifier to classify classes

        return logits

In [None]:
model = VGG16(num_classes=10)                                       #limiting our model to 10 classes as our dataset has 10 classes 

model = model.to(device)                                            #assigning GPU to our model 

#Stochastic gradient descent is a optimizer to optimize training 

optimizer = torch.optim.SGD(model.parameters(),                     
                            momentum=0.9,                           #Simply momentum are jumps to find lowest minima. It improves training
                            lr=0.01)                                #the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
                                                       factor=0.1,
                                                       mode='max',
                                                       verbose=True)

#Cross Entropy Loss to measure loss
criterion = nn.CrossEntropyLoss()

In [None]:
# In the validate() method, we are calculating the loss and accuracy. But we are not backpropagating the gradients. Backpropagation is only required during training.
def validate(model, test_dataloader):
    model.eval()
    val_running_loss = 0.0
    val_running_correct = 0
    for int, data in enumerate(test_dataloader):
        data, target = data[0].to(device), data[1].to(device)
        output = model(data)
        loss = criterion(output, target)
        
        val_running_loss += loss.item()
        _, preds = torch.max(output.data, 1)
        val_running_correct += (preds == target).sum().item()
    
    val_loss = val_running_loss/len(test_dataloader.dataset)
    val_accuracy = 100. * val_running_correct/len(test_dataloader.dataset)
    
    return val_loss, val_accuracy

In [None]:
# implementing fit() method for training. We are calculating the gradients and backpropagating. 

def fit(model, train_dataloader):
    model.train()
    train_running_loss = 0.0
    train_running_correct = 0
    for i, data in enumerate(train_dataloader):
        data, target = data[0].to(device), data[1].to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        train_running_loss += loss.item()
        _, preds = torch.max(output.data, 1)
        train_running_correct += (preds == target).sum().item()
        loss.backward()
        optimizer.step()
    train_loss = train_running_loss/len(train_dataloader.dataset)
    train_accuracy = 100. * train_running_correct/len(train_dataloader.dataset)
    print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.2f}')
    
    return train_loss, train_accuracy

In [None]:
# We will train and validate the model for 10 epochs. All the while, both methods, the fit(), and validate() will keep on returning the loss and accuracy values for each epoch.
#For each epoch, we will call the fit() and validate() method.

train_loss , train_accuracy = [], []
val_loss , val_accuracy = [], []
start = time.time()
for epoch in range(10):
    train_epoch_loss, train_epoch_accuracy = fit(model, trainloader)
    val_epoch_loss, val_epoch_accuracy = validate(model, testloader)
    train_loss.append(train_epoch_loss)
    train_accuracy.append(train_epoch_accuracy)
    val_loss.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)
end = time.time()
print((end-start)/60, 'minutes')

Train Loss: 0.0256, Train Acc: 73.44

Validation Loss: 0.0158, Validation Acc: 82.35

Train Loss: 0.0148, Train Acc: 83.38

Validation Loss: 0.0138, Validation Acc: 84.97

Train Loss: 0.0117, Train Acc: 86.85

Validation Loss: 0.0128, Validation Acc: 86.00

Train Loss: 0.0094, Train Acc: 89.54

Validation Loss: 0.0125, Validation Acc: 86.36

Train Loss: 0.0074, Train Acc: 91.75

Validation Loss: 0.0127, Validation Acc: 86.57

Train Loss: 0.0057, Train Acc: 93.75

Validation Loss: 0.0127, Validation Acc: 86.91

Train Loss: 0.0043, Train Acc: 95.23

Validation Loss: 0.0129, Validation Acc: 86.99

Train Loss: 0.0032, Train Acc: 96.53

Validation Loss: 0.0133, Validation Acc: 87.42

Train Loss: 0.0023, Train Acc: 97.67

Validation Loss: 0.0138, Validation Acc: 87.48

Train Loss: 0.0018, Train Acc: 98.32

Validation Loss: 0.0144, Validation Acc: 87.36

