## VGG

The VGG architecture was developed in 2014 by Karen Simonyan and Andrew Zisserman from the Visual Geometry Group -and hence named VGG- at Oxford University. The model demonstrated significant improvements over the past models at that time- to be specific 2014 Imagenet challange also known as ILSVRC.



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
import torch.nn as nn


class VGG19(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG19, self).__init__()

        # Feature extraction layers: Convolutional and pooling layers
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(
                3, 64, kernel_size=3, padding=1
            ),  # 3 input channels, 64 output channels, 3x3 kernel, 1 padding
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(
                kernel_size=2, stride=2
            ),  # Max pooling with 2x2 kernel and stride 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        # Fully connected layers for classification
        self.classifier = nn.Sequential(
            nn.Linear(
                512 * 7 * 7, 4096
            ),  # 512 channels, 7x7 spatial dimensions after max pooling
            nn.ReLU(),
            nn.Dropout(0.5),  # Dropout layer with 0.5 dropout probability
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, num_classes),  # Output layer with 'num_classes' output units
        )

    def forward(self, x):
        x = self.feature_extractor(x)  # Pass input through the feature extractor layers
        x = x.view(x.size(0), -1)  # Flatten the output for the fully connected layers
        x = self.classifier(x)  # Pass flattened output through the classifier layers
        return x

### timm

timm (or PyTorch Image Models) is a Python library that provides a collection of pre-trained deep learning models, primarily focused on computer vision tasks, along with utilities for training, fine-tuning, and inference.



In [None]:
!pip install -q timm

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.4/42.4 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import timm
import torch

# Load a pre-trained MobileNet model
model_name = "mobilenetv3_large_100"

model = timm.create_model(model_name, pretrained=True)

# If you want to use the model for inference
model.eval()

# Forward pass with a dummy input
# Batch size 1, 3 color channels, 224x224 image
input_tensor = torch.rand(1, 3, 224, 224)

output = model(input_tensor)
print(output, output.shape)

tensor([[-2.7906e+00,  9.9168e-02,  4.3141e-01, -2.1836e-02,  7.9626e-01,
         -1.2775e-01,  4.4802e-01, -9.9817e-02, -7.1898e-03, -4.3294e-01,
          3.5207e-01,  7.1959e-01,  1.3107e+00,  6.6033e-01,  9.0555e-01,
          1.0772e+00,  1.7348e+00, -6.7611e-03,  2.0565e+00,  1.3547e+00,
          6.1916e-01,  3.6509e+00,  2.3372e+00,  3.2081e+00,  9.6124e-01,
         -2.3309e-01, -1.2464e+00, -6.4247e-01, -8.4977e-01, -1.7333e+00,
         -7.3374e-01, -8.0598e-01, -1.6248e+00,  9.5085e-01,  3.4862e+00,
         -1.3481e+00, -9.9721e-01, -1.0850e+00,  1.5498e-02, -7.8427e-01,
         -3.4210e-01,  9.9255e-03,  1.8521e+00,  6.5071e-01,  2.3228e-01,
          3.6012e-01, -7.1645e-02, -2.5739e-01, -6.5575e-01, -1.5350e+00,
          1.2933e-01,  1.3116e-01, -5.1844e-02,  4.5994e-01,  1.3465e+00,
         -2.3955e-02,  5.6554e-01, -1.0697e+00,  1.8011e+00, -4.9006e-01,
          1.6897e+00, -6.2983e-02,  2.9604e-01,  2.8298e-01, -2.5532e-01,
          3.1485e+00,  1.8067e+00,  6.

# ResNet (residual network)

Neural networks with more layers were assumed to be more effective because adding more layers improves the model performance.

As the networks became deeper, the extracted features could be further enriched, such as seen with VGG16 and VGG19.

A question arose: “Is learning networks as easy as stacking more layers”? An obstacle to answering this question, the gradient vanishing problem, was addressed by normalized initializations and intermediate normalization layers.

However, a new issue emerged: the degradation problem. As the neural networks became deeper, accuracy saturated and degraded rapidly. An experiment comparing shallow and deep plain networks revealed that deeper models exhibited higher training and test errors, suggesting a fundamental challenge in training deeper architectures effectively. This degradation was not because of overfitting but because the training error increased when the network became deeper. The added layers did not approximate the identity function.

ResNet’s residual connections unlocked the potential of the extreme depth, propelling the accuracy upwards compared to the previous architectures.

![](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/ResnetBlock.png)

Shortcut connections perform identity mapping and their output is added to the output of the stacked layers. Identity shortcut connections add neither extra parameters nor computational complexity, these connections bypass layers, creating direct paths for information flow, and they enable neural networks to learn the residual function (F).



https://towardsdatascience.com/what-is-residual-connection-efb07cab0d55