<a href="https://www.kaggle.com/code/aisuko/lightweight-networks-and-mobilenet?scriptVersionId=164483767" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

We have seen that complex network require significant resources, such as GPU, for training, and also for fast inference. However, it turns out that a model with significantly smaller number of parameters in most cases can still be trained to perform reasonably well. In other worlds, increase in the model complexity typically results in small(non-proportional) increase in the model performance. Increasing the number of CNN layer and/or number of neurons in the classifier allowed us to gain a few percents of accuracy at most. We have notebooks talk about this see [Converlutional Neural Network](https://www.kaggle.com/code/aisuko/convolutional-neural-network) and [Multilayer Dense Convilutional Neural Network](https://www.kaggle.com/code/aisuko/multilayer-dense-convolutional-neural-network)

This leads us to the idea that we can experiment with `Lightweight network architectures` in order to train faster models. This is especially important if we want to be able to execute our models on mobile devices. This module will rely on the Cats and Dogs dataset.

In [1]:
import os
import torch
import warnings

if torch.cuda.is_available():
    torch_device = 'cuda'
else:
    torch_device = 'cpu'

warnings.filterwarnings('ignore')

print(torch_device)

cuda


In [2]:
if not os.path.exists('data/kagglecatsanddogs_5340.zip'):
    !wget -P data https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip

--2024-02-27 05:15:39--  https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip
Resolving download.microsoft.com (download.microsoft.com)... 104.123.44.196, 2a02:26f0:6d00:3b6::317f, 2a02:26f0:6d00:39f::317f
Connecting to download.microsoft.com (download.microsoft.com)|104.123.44.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 824887076 (787M) [application/octet-stream]
Saving to: ‘data/kagglecatsanddogs_5340.zip’


2024-02-27 05:15:44 (147 MB/s) - ‘data/kagglecatsanddogs_5340.zip’ saved [824887076/824887076]



In [3]:
import glob
import zipfile
import torchvision
from PIL import Image


def check_image(fn):
    try:
        im = Image.open(fn)
        im.verify()
        return True
    except:
        return False

    
def check_image_dir(path):
    for fn in glob.glob(path):
        if not check_image(fn):
            print("Corrupt image: {}".format(fn))
            os.remove(fn)

            
def common_transform():
    # torchvision.transforms.Normalize is used to normalize a tensor image with mean and standard deviation.
    std_normalize = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                     std=[0.229, 0.224, 0.225])
    # torchvision.transforms.Compose is used to compose several transforms together in order to do data augmentation.
    trans = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256), # resize the image to 256x256
        torchvision.transforms.CenterCrop(224), # crop the image to 224x224 about the center
        torchvision.transforms.ToTensor(), # convert the image to a tensor with pixel values in the range [0, 1]
        std_normalize])
    return trans


def load_cats_dogs_dataset():
    if not os.path.exists('data/PetImages'):
        with zipfile.ZipFile('data/kagglecatsanddogs_5340.zip', 'r') as zip_ref:
            zip_ref.extractall('data')
    
    check_image_dir('data/PetImages/Cat/*.jpg')
    check_image_dir('data/PetImages/Dog/*.jpg')

    dataset = torchvision.datasets.ImageFolder('data/PetImages', transform=common_transform())
    trainset, testset = torch.utils.data.random_split(dataset, [20000, len(dataset) - 20000])
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2) # num_workers: how many subprocesses to use for data loading
    testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)
    return dataset, trainloader, testloader

dataset, trainloader, testloader = load_cats_dogs_dataset()

Corrupt image: data/PetImages/Cat/666.jpg
Corrupt image: data/PetImages/Dog/11702.jpg


# MobileNet

In the previous notebook, we habve seen [ResNet architecture](https://www.kaggle.com/code/aisuko/pre-trained-models-and-transfer-learning) for image classification. More lightweight analog of ResNet is **MobileNet**, which uses so-called *Inverted Residual Blocks*. Let's load pre-trained mobilenet and see how it works:

In [4]:
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
model.to(torch_device).eval()

Downloading: "https://github.com/pytorch/vision/zipball/v0.10.0" to /root/.cache/torch/hub/v0.10.0.zip
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 86.9MB/s]


MobileNetV2(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=

In [5]:
# Apply the model to the dataset and visualize the results
sample_image = dataset[0][0].unsqueeze(0).to(torch_device) # unsqueeze(0): add a dimension of size 1 at the 0th position
res = model(sample_image) # apply the model to the sample image
print(res[0].argmax()) # get the index of the highest probability

tensor(281, device='cuda:0')


# Transfer Learning(Fine-tuning)

In [6]:
# Frezze the original weights
for x in model.parameters():
    x.requires_grad = False

In [7]:
import torch.nn as nn
from torchinfo import summary

# Replace the final classifier
model.classifier = nn.Linear(1280,2)  # change the last layer to a linear layer with 2 outputs
model = model.to(torch_device)
summary(model, input_size=(1, 3, 224, 224))

Layer (type:depth-idx)                             Output Shape              Param #
MobileNetV2                                        [1, 2]                    --
├─Sequential: 1-1                                  [1, 1280, 7, 7]           --
│    └─Conv2dNormActivation: 2-1                   [1, 32, 112, 112]         --
│    │    └─Conv2d: 3-1                            [1, 32, 112, 112]         (864)
│    │    └─BatchNorm2d: 3-2                       [1, 32, 112, 112]         (64)
│    │    └─ReLU6: 3-3                             [1, 32, 112, 112]         --
│    └─InvertedResidual: 2-2                       [1, 16, 112, 112]         --
│    │    └─Sequential: 3-4                        [1, 16, 112, 112]         (896)
│    └─InvertedResidual: 2-3                       [1, 24, 56, 56]           --
│    │    └─Sequential: 3-5                        [1, 24, 56, 56]           (5,136)
│    └─InvertedResidual: 2-4                       [1, 24, 56, 56]           --
│    │    └─Sequential

# Training

In [8]:
def validate(net, dataloader, loss_fn=nn.NLLLoss()):
    net.eval() # put the network into evaluation mode to deactivate the dropout layers
    count,acc,loss =0,0,0
    with torch.no_grad(): # deactivate autograd to save memory and speed up computations
        for features, labels in dataloader:
            features,labels = features.to(torch_device), labels.to(torch_device)
            out=net(features) # forward pass of the mini-batch through the network to obtain the outputs
            loss += loss_fn(out,labels) # compute the loss
            preds=torch.max(out,dim=1)[1] # compute the predictions to obtain the accuracy
            acc+=(preds==labels).sum() # accumulate the correct predictions
            count+=len(labels) # accumulate the total number of examples
    return loss.item()/count, acc.item()/count # return the loss and accuracy

def train_long(net, train_loader, test_loader, epochs=5, lr=0.01, optimizer=None, loss_fn=nn.NLLLoss(), print_freq=10):
    optimizer = optimizer or torch.optim.Adam(net.parameters(), lr=lr) # use Adam optimizer if not provided
    for epoch in range(epochs):
        net.train() # put the network into training mode make sure the parameters are trainable
        total_loss,acc,count =0,0,0
        for i, (features, labels) in enumerate(train_loader):
            lbls = labels.to(torch_device)
            optimizer.zero_grad() # reset the gradients to zero before each batch to avoid accumulation
            out=net(features.to(torch_device)) # forward pass of the mini-batch through the network to obtain the outputs
            loss = loss_fn(out, lbls) # compute the loss
            loss.backward() # compute the gradients of the loss with respect to all the parameters of the network
            optimizer.step() # update the parameters of the network using the gradients to minimize the loss
            total_loss+=loss # accumulate the loss for inspection
            _,preds=torch.max(out,dim=1) # compute the predictions to obtain the accuracy
            acc+=(preds==lbls).sum() # accumulate the correct predictions
            count+=len(lbls) # accumulate the total number of examples
            if i%print_freq==0:
                print(f'Epoch {epoch}, iter {i}, loss={total_loss.item()/count:.3f}, acc={acc.item()/count:.3f}')
        vl, va = validate(net, test_loader, loss_fn=loss_fn)
        print(f'Epoch {epoch}, val_loss={vl:.3f}, val_acc={va:.3f}')

train_long(model, trainloader, testloader, loss_fn=torch.nn.CrossEntropyLoss(),epochs=1, print_freq=90)

Epoch 0, iter 0, loss=0.024, acc=0.438
Epoch 0, iter 90, loss=0.007, acc=0.925
Epoch 0, iter 180, loss=0.005, acc=0.940
Epoch 0, iter 270, loss=0.006, acc=0.941
Epoch 0, iter 360, loss=0.006, acc=0.945
Epoch 0, iter 450, loss=0.006, acc=0.949
Epoch 0, iter 540, loss=0.006, acc=0.949
Epoch 0, val_loss=0.008, val_acc=0.944


# Summary

Notice that MobileNet results in almost the same accuracy as VGG-16, and just slightly lower than full-scale ResNet. The main advantage of small models, such as MobileNet or ResNet-18 is that they can be used on mobile devices, 