# Implementing VGG-16 on Cat vs Dog Dataset

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.

<img src = "https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png" width = 700>

### Importing required libraries

In [1]:
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2
from tqdm.notebook import tqdm
REBUILD_DATA = False    #Flag that indicates whether data needs preprocessing or not.

import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as transforms
from tqdm.notebook import tqdm

In [2]:
IMG_SIZE = 224

###  Switching to GPU

In [3]:
print("GPU available:",torch.cuda.is_available())
if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print("Switched to GPU")
else:
    device = torch.cuda.device('cpu')
    print("Working on CPU")


GPU available: True
Switched to GPU


T1 = torch.empty((450,1024,1024),dtype = torch.float64)
T1 = T1.to(device)
print("Allocated space:",torch.cuda.memory_allocated(device)/(1024*1024))

T2 = torch.empty((200,1024,1024),dtype = torch.float64)
T2 = T2.to(device)
print("Allocated space:",torch.cuda.memory_allocated(device)/(1024*1024))

T3 = torch.empty((50,1024,1024),dtype = torch.float64)
T3 = T3.to(device)
print("Allocated space:",torch.cuda.memory_allocated(device)/(1024*1024))

T4 = torch.empty((20,1024,1024),dtype = torch.float64)
T4 = T4.to(device)
print("Allocated space:",torch.cuda.memory_allocated(device)/(1024*1024))

T5 = torch.empty((5,1024,1024),dtype = torch.float64)
T5 = T5.to(device)
print("Allocated space:",torch.cuda.memory_allocated(device)/(1024*1024))

In [4]:
torch.cuda.memory_allocated(device)/(1024*1024*1024)

0.0

### Importing and preprocessing images

The Methode below takes a looooong time to load and process data. Maybe its because the vstack function has bad time complexity. So, as the trainind_data gets bigger and bigger, adding new data to it becomes even slower. We'll instead do the preprocessing in the Dataset object, which will make sure the data doesnt get processed all at once. This was, even bad time complexity wont effect so much. This functino is run __only once__

In [5]:
class DogsVsCats():
    X=0
    Y=0
    IMG_SIZE
    def __init__(self, size):
        self.IMG_SIZE = size    #Input size for VGG-16
        self.CATS = os.path.join(os.getcwd(),'PetImages\\Cat')  
        self.DOGS = os.path.join(os.getcwd(),'PetImages\\Dog')
        self.LABELS = {self.CATS:0,self.DOGS:1}
        self.training_data = []
        self.catcount = 0
        self.dogcount = 0
    def __call__(self):
        self.flag = True
        for label in self.LABELS:
            print(label)
            for f in tqdm(os.listdir(label)):
                try:
                    path = os.path.join(label,f)
                    img = cv2.imread(path)
                    img = cv2.resize(img, (self.IMG_SIZE,self.IMG_SIZE))
#                   img = np.transpose(img,(2,0,1))
#                     self.Y = np.array([self.LABELS[label]])
#                     self.X = np.vstack((self.X,img))
#                     self.Y = np.block([self.Y,self.LABELS[label]])
                    self.training_data.append([np.array(img),self.LABELS[label]])
                    if label==self.CATS:
                        self.catcount+=1
                    elif label==self.DOGS: 
                        self.dogcount+=1
                except Exception as e:
                    print("Image ",f," failed to load!")
                    pass
        training_data = (self.X,self.Y)
        np.random.shuffle(self.training_data)
        np.save("training_data.npy",self.training_data)
        print("Cats: ",self.catcount)
        print("Dogs: ",self.dogcount)

Data should be preprocessed only once. On subsequent runs, we set the flag <code>REBUILD_DATA</code> to False.

In [6]:
if REBUILD_DATA:
    dogsvcats = DogsVsCats(IMG_SIZE)
    dogsvcats()

Now lets print an image to see what our data looks like.

cats = os.path.join(os.getcwd(),"PetImages\\Cat")
path = os.path.join(os.getcwd(),cats,os.listdir(cats)[0])
img = plt.imread(path)
plt.imshow(img)
plt.show()
cv2.imshow("Cat",img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Well thats a cute doggo!

### Creating Dataset class

The dataset object recieves numpy array which have images in Channels-last format. Pytorch likes her channels first. So we tranpose the array. Any other transform passed as argument is also performed. Then the image and label is returned as tuple.

In [7]:
class dataset(Dataset):
    def __init__(self, data, transform = None, test = False):
        self.data = data
        self.Size = round(self.data.shape[0]/10)
        if test:
            self.data = self.data[-self.Size:]
        else:
            self.data = self.data[:-self.Size]
        self.len = self.data.shape[0]
        self.transform = transform
    def __len__(self):
        return self.len
    def __getitem__(self, idx):
        X,Y = self.data[idx,0],self.data[idx,1]
        if self.transform:
            X = self.transform(X)
        return X,Y

###  Defining custom Module for VGG-16

The general architecture of VGG-16 is very simple. Instead of varying hyperparameters, it uses repititive blocks with same hyperparameters and relies on its depth for learning complex features. A side effect is that VGG-16 requires a very long time to train. Training locally on ImageNet would probably take a month. We will define our custom module block wise.
<img src = "https://neurohive.io/wp-content/uploads/2018/11/vgg16.png" width = 800>


In [8]:
class VGG16(nn.Module):
    def __init__(self):
        super(VGG16,self).__init__()
        #Block 1
        self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv1.weight, nonlinearity='relu')
        self.conv2 = nn.Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv2.weight, nonlinearity='relu')
        self.pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2, )
        #Block 2
        self.conv3 = nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv3.weight, nonlinearity='relu')
        self.conv4 = nn.Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv4.weight, nonlinearity='relu')
        self.pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        #Block 3
        self.conv5 = nn.Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv5.weight, nonlinearity='relu')
        self.conv6 = nn.Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv6.weight, nonlinearity='relu')
        self.conv7 = nn.Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv7.weight, nonlinearity='relu')
        self.pool3 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        #Block 4
        self.conv8 = nn.Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv8.weight, nonlinearity='relu')
        self.conv9 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv9.weight, nonlinearity='relu')
        self.conv10 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv10.weight, nonlinearity='relu')
        self.pool4 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        #Block 5
        self.conv11 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv11.weight, nonlinearity='relu')
        self.conv12 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv12.weight, nonlinearity='relu')
        self.conv13 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, padding = 1)
        nn.init.kaiming_uniform_(self.conv13.weight, nonlinearity='relu')
        self.pool5 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        #Block 6
        self.fc1 = nn.Linear(in_features = 25088, out_features = 4096)
        self.fc2 = nn.Linear(in_features = 4096, out_features = 4096)
        self.fc2 = nn.Linear(in_features = 4096, out_features = 2)
        self.output = nn.Softmax(dim = 1)
    def forward(self,X):
        #Block 1
        X = self.conv1(X)
        X = F.relu(X)
        X = self.conv2(X)
        X = F.relu(X)
        X = self.pool1(X)
        #Block 2
        X = self.conv3(X)
        X = F.relu(X)
        X = self.conv4(X)
        X = F.relu(X)
        X = self.pool2(X)
        #Block3
        X = self.conv5(X)
        X = F.relu(X)
        X = self.conv6(X)
        X = F.relu(X)
        X = self.conv7(X)
        X = F.relu(X)
        X = self.pool3(X)
        #Block 4
        X = self.conv8(X)
        X = F.relu(X)
        X = self.conv9(X)
        X = F.relu(X)
        X = self.conv10(X)
        X = F.relu(X)
        X = self.pool4(X)
        #Block 5
        X = self.conv11(X)
        X = F.relu(X)
        X = self.conv12(X)
        X = F.relu(X)
        X = self.conv13(X)
        X = F.relu(X)
        X = self.pool5(X)
        #Block 6
        X = torch.flatten(X,1)
        X = self.fc1(X)
        X = F.relu(X)
        X = self.fc2(X)
        X = self.output(X)
        return X
    

That a huge class. Phew!

In [9]:
def train_model(model,optimizer,criterion, train_loader, test_loader, EPOCHS):
    '''
    Function for training the neural network.
    '''
    try:
        assert TEST_SIZE!= None
    except:
        print("Test size is not known. I need it to calculate accuracy.")
    TRAIN_LOSS = []
    ACCURACY = []    #Accuracy on test set
    model.train()
    for epoch in range(EPOCHS):
        
        #torch.cuda.empty_cache()  #Clears cache to make more VRAM available
        print("Epoch: ",epoch)
        print("\tTraining-")
        LOSS = 0        
        for x,y in tqdm(train_loader):
            x = x.to(device)
            y = y.to(device)
            optimizer.zero_grad()
            yhat = model(x)
            loss = criterion(yhat,y)
            loss.backward()
            optimizer.step()
            LOSS+= loss.data
        TRAIN_LOSS.append(LOSS)
    model.eval()
    print("\tValidating")
    correct = 0
    with torch.no_grad():
        for x,y in tqdm(test_loader):
            x = x.to(device)
            y = y.to(device)
            yhat = model(x)
            _,label = torch.max(yhat.data,1)
            correct += (label==y).sum().item()
        accuracy = correct/TEST_SIZE*100
        print("Test set accuracy for epoch ",epoch+1," = ", accuracy,"%")
        ACCURACY.append(accuracy)
    data = {"Loss":TRAIN_LOSS, "Accuracy":ACCURACY}
    return model,optimizer, data        

###  Reading preprocessed data

In [10]:
data = np.load("training_data.npy", allow_pickle = True)

###  Creating custom Dataset objects

In [11]:
tf = transforms.Compose([transforms.ToTensor()])
train_data = dataset(data, transform = tf)
test_data = dataset(data, transform = tf, test=True)

### Creating model, optimizer and loss function objects

In [12]:
model = VGG16()
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters()) #Default learning rate is 0.001

###  Passing a debug data mini-batch for 1 epoch

TEST_SIZE = 256
debug_data = dataset(data[:TEST_SIZE], transform = tf)
debug_loader = DataLoader(debug_data)

#Calling train_model() function
Model, Optimizer, Data = train_model(model, optimizer, criterion,debug_loader, debug_loader,1)
torch.save(Model.state_dict(),os.path.join(os.getcwd(),"debug_model"))

###  Passing whole data for 1 epoch 

In [13]:
torch.cuda.ipc_collect()  #Checks if any unused tensors can be cleared to free up space
#torch.cuda.empty_cache()

In [None]:
TEST_SIZE =  data.shape[0]-round(data.shape[0]/10)
train_loader = DataLoader(train_data)
test_loader = DataLoader(test_data)
#T2.to(torch.device('cpu'))
#Calling train_model function
torch.cuda.empty_cache()
torch.cuda.ipc_collect()  #Checks if any unused tensors can be cleared to free up space
Model, Optimizer, Data = train_model(model, optimizer, criterion,train_loader, test_loader,10)
torch.save(Model.state_dict(),os.path.join(os.getcwd(),"trained_model_1epoch"))

Epoch:  0
	Training-


HBox(children=(FloatProgress(value=0.0, max=22451.0), HTML(value='')))


Epoch:  1
	Training-


HBox(children=(FloatProgress(value=0.0, max=22451.0), HTML(value='')))


Epoch:  2
	Training-


HBox(children=(FloatProgress(value=0.0, max=22451.0), HTML(value='')))


Epoch:  3
	Training-


HBox(children=(FloatProgress(value=0.0, max=22451.0), HTML(value='')))


Epoch:  4
	Training-


HBox(children=(FloatProgress(value=0.0, max=22451.0), HTML(value='')))

In [None]:
torch.cuda.memory_stats(device)

In [None]:
torch.cuda.memory_summary()

In [None]:
torch.cuda.memory_allocated(device)/(1024*1024*1024)