Until now, we created indiviual blocks for convolution, max pooling, activation function and we gave the output of the layer as input to the next layer explicitly. While creating a whole architecture, we can use nn.Sequential() function to apply all the layers given to this function sequentially instead of carrying the output as the input of the next layer each time.

Below, we create a class VGG16 from torch's Module class which will give us some utilities to examine our architecture better in the further steps. By initializing this class we obtain the architecture we want, and in forward() function we are able to give an input to make 1 whole forward pass through our neural network.

In [1]:
from torch import nn
from torchsummary import summary
import cv2
import torchvision.transforms as transforms
import torch

class VGG16(nn.Module):
    
    def __init__(self, n_classes):
        super().__init__()
    
        self.convolutions = nn.Sequential(
            # conv1
            nn.Conv2d(3, 64, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),
            
            # conv2
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),

            # conv3
            nn.Conv2d(128, 256, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),

            # conv4
            nn.Conv2d(256, 512, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),

            # conv5
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )

        self.fully_connected = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, nb_classes),
            nn.Softmax(dim=1)
        )

        
    def forward(self, img):

        # Apply convolution operations
        x = self.convolutions(img)
        # Reshape
        x = x.view(x.size(0), -1)

        # Apply fully connected operations
        x = self.fully_connected(x)

        return x

Create an VGG16 model object with 3 classes: daisy, dandelion and rose. Note that VGG16 architecture is created for ImageNet dataset having 1000 classes originally. Print model summary to see the layers in the architecture with their output shapes and parameter amounts

In [2]:
nb_classes = 3 # 1000 in 
model = VGG16(nb_classes)

summary(model, input_size = (3, 224, 224), batch_size = 1)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [1, 64, 224, 224]           1,792
              ReLU-2          [1, 64, 224, 224]               0
            Conv2d-3          [1, 64, 224, 224]          36,928
              ReLU-4          [1, 64, 224, 224]               0
         MaxPool2d-5          [1, 64, 112, 112]               0
            Conv2d-6         [1, 128, 112, 112]          73,856
              ReLU-7         [1, 128, 112, 112]               0
            Conv2d-8         [1, 128, 112, 112]         147,584
              ReLU-9         [1, 128, 112, 112]               0
        MaxPool2d-10           [1, 128, 56, 56]               0
           Conv2d-11           [1, 256, 56, 56]         295,168
             ReLU-12           [1, 256, 56, 56]               0
           Conv2d-13           [1, 256, 56, 56]         590,080
             ReLU-14           [1, 256,

Send one image through network and check the result

In [3]:
img = cv2.imread("data_flowers/daisy/100080576_f52e8ee070_n.jpg") 

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224,224)),
    transforms.ToTensor()
])  

img = transform(img)
img = img.unsqueeze(0)

output = model(img)

print(output)

print(torch.max(output))


tensor([[0.3363, 0.3355, 0.3283]], grad_fn=<SoftmaxBackward0>)
tensor(0.3363, grad_fn=<MaxBackward1>)


There is 3 possibilities for 3 classes. The values seem to be very close to each other, so even though we choose the maximum among them, it doesnt feel that it will work well to distinguish these 3 types of flower right?
Its because we didnt train our model yet! We just created he architecture and started to use it for inference. Well, it was just to show you the logic, wait until we are done with the training step ;)