# Exploratory model structure analysis

In this notebook, we shall explore the structure of pre-trained torchvision image classification models, and develop an understanding of how the models are structured (using which layer names). This will help us determine how to extract the name of the "classification head" for each model, and replace it with a custom classification head for our task.

In [1]:
import torch
from torch import nn
import torchvision
import os
from typing import List, Any, Tuple
from torchinfo import summary
from torchvision.datasets import VisionDataset

# List all pre-trained classification models in torchvision

There are 80 pre-trained classification models provided by torchvision. We shall use the [list_models](https://pytorch.org/vision/stable/generated/torchvision.models.list_models.html#torchvision.models.list_models) API for this purpose. The [pre-trained models page](https://pytorch.org/vision/stable/models.html) shows pre-trained models for other image tasks such as segmentation, object detection, and keypoint detection. Also included are pre-trained video classification models.

In [2]:
classification_models = torchvision.models.list_models(module=torchvision.models)
print(len(classification_models), "classification models:", classification_models)

80 classification models: ['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'googlenet', 'inception_v3', 'maxvit_t', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'regnet_x_16gf', 'regnet_x_1_6gf', 'regnet_x_32gf', 'regnet_x_3_2gf', 'regnet_x_400mf', 'regnet_x_800mf', 'regnet_x_8gf', 'regnet_y_128gf', 'regnet_y_16gf', 'regnet_y_1_6gf', 'regnet_y_32gf', 'regnet_y_3_2gf', 'regnet_y_400mf', 'regnet_y_800mf', 'regnet_y_8gf', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext101_64x4d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflene

# View the alexnet model structure

In [3]:
alexnet = torchvision.models.alexnet(weights=None)
alexnet

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

# View the ConvNeXt Base model

[ConvNeXt](https://pytorch.org/vision/stable/models/convnext.html) (and other pre-trained) models come in different variants (sizes). Typically, author(s) names these with a suffix like *base*, *small*, *tiny*, *large*, or *huge* to indicate their size.

In [4]:
convnext_base = torchvision.models.convnext_base(weights=None)
convnext_base

ConvNeXt(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
      (1): LayerNorm2d((128,), eps=1e-06, elementwise_affine=True)
    )
    (1): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=128)
          (1): Permute()
          (2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=128, out_features=512, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=512, out_features=128, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=128)
          (1): Permute()
          (2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (3): Linear(

# Explore the children modules for alexnet

Every model in torchvision is made up of various named layers. Let's try to enumerate the layers that make up the alexnet model.

In [5]:
dict(alexnet.named_children()).keys()

dict_keys(['features', 'avgpool', 'classifier'])

# Explore child layers for all pre-trained classification models in torchvision

In [6]:
layer_count = dict()

for model_name in classification_models:
    m = getattr(torchvision.models, model_name)(weights=None)
    layer_names = dict(m.named_children()).keys()
    for ln in layer_names:
        layer_count[ln] = layer_count.get(ln, 0) + 1
    # end for
# end for

print(layer_count)
    

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


{'features': 39, 'avgpool': 59, 'classifier': 38, 'conv1': 15, 'maxpool1': 2, 'conv2': 1, 'conv3': 1, 'maxpool2': 2, 'inception3a': 1, 'inception3b': 1, 'maxpool3': 1, 'inception4a': 1, 'inception4b': 1, 'inception4c': 1, 'inception4d': 1, 'inception4e': 1, 'maxpool4': 1, 'inception5a': 1, 'inception5b': 1, 'aux1': 1, 'aux2': 1, 'dropout': 2, 'fc': 31, 'Conv2d_1a_3x3': 1, 'Conv2d_2a_3x3': 1, 'Conv2d_2b_3x3': 1, 'Conv2d_3b_1x1': 1, 'Conv2d_4a_3x3': 1, 'Mixed_5b': 1, 'Mixed_5c': 1, 'Mixed_5d': 1, 'Mixed_6a': 1, 'Mixed_6b': 1, 'Mixed_6c': 1, 'Mixed_6d': 1, 'Mixed_6e': 1, 'AuxLogits': 1, 'Mixed_7a': 1, 'Mixed_7b': 1, 'Mixed_7c': 1, 'stem': 16, 'blocks': 1, 'layers': 4, 'trunk_output': 15, 'bn1': 10, 'relu': 10, 'maxpool': 14, 'layer1': 10, 'layer2': 10, 'layer3': 10, 'layer4': 10, 'stage2': 4, 'stage3': 4, 'stage4': 4, 'conv5': 4, 'norm': 6, 'permute': 6, 'flatten': 6, 'head': 6, 'conv_proj': 5, 'encoder': 5, 'heads': 5}


# Focus on classification layer for vgg16, resnet50, resnet152

Since we'll be using these pre-trained models for our custom Flowers 102 classification task, let's focus on the classification heads of these models.

Note that when the classification head is made up of multiple `Linear` layers, it's called `classifier` (like in VGG16), and when it's made up of a single `Linear` layer, it's simply called `fc`. If you wish to use a model from one of the 80 models above, you;ll have to manually list out the layers and determine what the **classification head** for that model is and how it is structured.

In [7]:
vgg16 = torchvision.models.vgg16_bn(weights=None)
resnet50 = torchvision.models.resnet50(weights=None)
resnet152 = torchvision.models.resnet152(weights=None)
print("vgg16\n", vgg16.classifier)
print("resnet50\n", resnet50.fc)
print("resnet152\n", resnet152.fc)

vgg16
 Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)
resnet50
 Linear(in_features=2048, out_features=1000, bias=True)
resnet152
 Linear(in_features=2048, out_features=1000, bias=True)
