# Res Net - Repurposing/Finetuning
## Introduction

This notebook is an attempt to repurpose and finetune an ResNet model to the task of American Sign Language detection for the DSPRO2 project at HSLU.

## Setup
In this section all the necessary libraries are imported.

In [1]:
%pip install -r requirements.txt -q

Note: you may need to restart the kernel to use updated packages.


In [1]:
import wandb
import torch
import torch.nn as nn
import torchvision.models as visionmodels
import torchvision.transforms.v2 as transforms
import lightning as L

from lightning.pytorch.loggers import WandbLogger

import nbformat

from typing import Callable

import os

# Our own modules
import models.sweep_helper as sweep_helper

from datapipeline.asl_image_data_module import ASLImageDataModule
from datapipeline.asl_kaggle_image_data_module import ASLKaggleImageDataModule, DEFAULT_TRANSFORMS
from models.asl_model import ASLModel
from models.training import sweep, train_model, PROJECT_NAME, ENTITY_NAME

Downloading split 'train' to 'C:\Users\kybur\fiftyone\open-images-v7\train' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'train' to 'C:\Users\kybur\fiftyone\open-images-v7\train' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'train' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'train' is sufficient


Loading existing dataset 'open-images-v7-train-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-train-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Downloading split 'validation' to 'C:\Users\kybur\fiftyone\open-images-v7\validation' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to 'C:\Users\kybur\fiftyone\open-images-v7\validation' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'validation' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'validation' is sufficient


Loading existing dataset 'open-images-v7-validation-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-validation-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Downloading split 'test' to 'C:\Users\kybur\fiftyone\open-images-v7\test' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'test' to 'C:\Users\kybur\fiftyone\open-images-v7\test' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'test' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'test' is sufficient


Loading existing dataset 'open-images-v7-test-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-test-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Downloading split 'train' to 'C:\Users\kybur\fiftyone\open-images-v7\train' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'train' to 'C:\Users\kybur\fiftyone\open-images-v7\train' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'train' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'train' is sufficient


Loading existing dataset 'open-images-v7-train-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-train-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Downloading split 'validation' to 'C:\Users\kybur\fiftyone\open-images-v7\validation' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to 'C:\Users\kybur\fiftyone\open-images-v7\validation' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'validation' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'validation' is sufficient


Loading existing dataset 'open-images-v7-validation-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-validation-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Downloading split 'test' to 'C:\Users\kybur\fiftyone\open-images-v7\test' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'test' to 'C:\Users\kybur\fiftyone\open-images-v7\test' if necessary


Necessary images already downloaded


INFO:fiftyone.utils.openimages:Necessary images already downloaded


Existing download of split 'test' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'test' is sufficient


Loading existing dataset 'open-images-v7-test-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'open-images-v7-test-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
  from .autonotebook import tqdm as notebook_tqdm


In [2]:
os.environ["WANDB_NOTEBOOK_NAME"] = "./dspro2/efficientnet.ipynb"

## Preprocessing
No general data preprocessing is necessary, however there will be random transforms applied to the images during training. The images are resized to 224x224 pixels, which is the input size of the EfficientNet model. The images are also normalized using the mean and standard deviation of the ImageNet dataset, which is the dataset on which the EfficientNet model was pretrained.

The following cells will show the loading of the dataset and the preparation of the mentioned transforms.

In [3]:
PATH = "/exchange/dspro2/silent-speech/ASL_Pictures_Dataset"
PATH = r"C:\Temp\silent-speech"

In [4]:
# datamodule = ASLImageDataModule(path=PATH, val_split_folder="Validation", batch_size=32, num_workers=128)
datamodule = ASLKaggleImageDataModule(path=PATH, train_transforms=DEFAULT_TRANSFORMS.TRAIN, valid_transforms=DEFAULT_TRANSFORMS.VALID, test_transforms=DEFAULT_TRANSFORMS.TEST, batch_size=32, num_workers=20)

## Models

In [5]:
NUM_CLASSES = 28

In [6]:
class ASLResNet34(nn.Module):
    def __init__(self, resnet_model: visionmodels.resnet.ResNet, dropout: float = 0.2, unfreeze_layers: int = 0, num_classes: int = NUM_CLASSES):
        super().__init__()

        self.resnet_num_layers = 4
        self.model = resnet_model
        self.model.requires_grad_(False)

        self.model.fc = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(self.model.fc.in_features, num_classes)
        )

        unfreeze_layers = min(unfreeze_layers, self.resnet_num_layers)

        for layer_num in range(self.resnet_num_layers, self.resnet_num_layers - unfreeze_layers, -1):
            self.model.get_submodule(f"layer{layer_num}").requires_grad_(True)

    def forward(self, x):
        return self.model(x)

    def get_main_params(self):
        yield from self.model.fc.parameters()

    def get_finetune_params(self):
        for name, param in self.model.named_parameters():
            if not name.startswith("fc") and param.requires_grad:
                yield param

## Training

In [7]:
DROPOUT = "dropout"

In [8]:
def get_pretrained_resnet_model():
    resnet_model = visionmodels.resnet34(weights=visionmodels.ResNet34_Weights.DEFAULT)
    return resnet_model

In [9]:
UNFREEZE_LAYERS = "unfreeze_layers"


def get_asl_resnet_model(resnet_model: visionmodels.resnet.ResNet, dropout: float, unfreeze_layers: int = 0) -> nn.Module:
    model = ASLResNet34(resnet_model, dropout=dropout, unfreeze_layers=unfreeze_layers)
    return model

In [10]:
def get_resnet_model_from_config(config: dict) -> nn.Module:
    resnet_model = get_pretrained_resnet_model()
    model = get_asl_resnet_model(resnet_model, config[DROPOUT], config[UNFREEZE_LAYERS])
    return model

In [12]:
run_id = 0
SEED = 42


def train_resnet():
    train_model("resnet", get_resnet_model_from_config, datamodule, get_optimizer=sweep_helper.get_optimizer_with_finetune_group, seed=SEED)

In [13]:
sweep_config = {
    "name": "ResNet34",
    "method": "bayes",
    "metric": {
        "name": f"{ASLModel.VALID_ACCURACY}",
        "goal": "maximize"
    },
    "early_terminate": {
        "type": "hyperband",
        "min_iter": 5
    },
    "parameters": {
        UNFREEZE_LAYERS: {
            "value": 1
        },
        DROPOUT: {
            "min": 0.1,
            "max": 0.5
        },
        sweep_helper.OPTIMIZER: {
            "parameters": {
                sweep_helper.TYPE: {
                    "value": sweep_helper.OptimizerType.RMSPROP
                },
                sweep_helper.LEARNING_RATE: {
                    "min": 1e-5,
                    "max": 1e-3,
                    "distribution": "log_uniform_values"
                },
                sweep_helper.FINETUNE_LEARNING_RATE: {
                    "min": 1e-7,
                    "max": 1e-5,
                    "distribution": "log_uniform_values"
                },
                sweep_helper.WEIGHT_DECAY: {
                    "min": 0,
                    "max": 1e-3,
                },
                sweep_helper.MOMENTUM: {
                    "min": 0.8,
                    "max": 0.99
                }
            }
        },
        sweep_helper.LEARNING_RATE_SCHEDULER: {
            "parameters": {
                sweep_helper.TYPE: {
                    "values": [sweep_helper.LearningRateSchedulerType.STEP, sweep_helper.LearningRateSchedulerType.EXPONENTIAL]
                },
                sweep_helper.STEP_SIZE: {"value": 5},
                sweep_helper.GAMMA: {
                    "min": 0.1,
                    "max": 0.9
                }
            }
        }
    }
}

In [16]:
sweep(sweep_config=sweep_config, count=30, training_procedure=train_resnet)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Create sweep with ID: 936doymo
Sweep URL: https://wandb.ai/dspro2-silent-speech/silent-speech/sweeps/936doymo


[34m[1mwandb[0m: Agent Starting Run: hh8kuici with config:
[34m[1mwandb[0m: 	dropout: 0.2988697278905561
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.16548742023922475, 'step_size': 5, 'type': 'step'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 5.791687333197608e-06, 'learning_rate': 4.671117090322782e-05, 'momentum': 0.8479938708012824, 'type': 'rmsprop', 'weight_decay': 0.0008869032076292996}
[34m[1mwandb[0m: 	unfreeze_layers: 1
Seed set to 42
[34m[1mwandb[0m: Currently logged in as: [33mv8-luky[0m ([33mdspro2-silent-speech[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3080') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


Split folders already exist, skipping distribution.


c:\Users\kybur\Repos\HSLU\dspro2\.venv\Lib\site-packages\lightning\pytorch\loggers\wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type               | Params | Mode 
--------------------------------------------------------------
0 | model          | ASLResNet34        | 21.3 M | train
1 | criterion      | CrossEntropyLoss   | 0      | train
2 | train_accuracy | MulticlassAccuracy | 0      | train
3 | valid_accuracy | MulticlassAccuracy | 0      | train
4 | test_accuracy  | MulticlassAccuracy | 0      | train
--------------------------------------------------------------
13.1 M    Trainable params
8.2 M     Non-trainable params
21.3 M    Total params
85.196    Total estimated model params size (MB)
123       Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\kybur\Repos\HSLU\dspro2\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:420: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.


## Evaluation

In [57]:
from models.evaluation import Evaluation

In [58]:
architecture = get_asl_resnet_model(get_pretrained_resnet_model(), 0, 0)

evaluation = Evaluation("resnet-8-eval", project=PROJECT_NAME, entity=ENTITY_NAME, model_architecture=architecture, artifact="dspro2-silent-speech/silent-speech/model-28omy6zd:v2", datamodule=datamodule)

In [59]:
model = evaluation.get_model()

[34m[1mwandb[0m: Downloading large artifact model-28omy6zd:v2, 233.66MB. 1 files... 
[34m[1mwandb[0m:   1 of 1 files downloaded.  
Done. 0:0:0.8


In [60]:
model.eval()

ASLModel(
  (model): ASLResNet34(
    (model): ResNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): BatchNor

In [61]:
trainer = L.Trainer(accelerator="auto")
trainer.test(model, datamodule)

You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Split folders already exist, skipping distribution.


c:\Users\kybur\Repos\HSLU\dspro2\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:420: Consider setting `persistent_workers=True` in 'test_dataloader' to speed up the dataloader worker initialization.

Detected KeyboardInterrupt, attempting graceful shutdown ...


NameError: name 'exit' is not defined

In [65]:
import torchvision.datasets as datasets
import lightning as L
from torch.utils.data import Dataset, DataLoader

data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToImage(),
    transforms.ToDtype(torch.float32, scale=True),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # ImageNet stats
])

class PredictDataModule(L.LightningDataModule):
    def __init__(self, path: str, batch_size: int = 32, num_workers: int = 0):
        super().__init__()
        self.path = path
        self.batch_size = batch_size
        self.num_workers = num_workers

    def setup(self, stage: str):
        self.predict_dataset = datasets.ImageFolder(root=r"C:\Users\kybur\Downloads", transform=data_transforms, allow_empty=True)

    def predict_dataloader(self):
        return torch.utils.data.DataLoader(self.predict_dataset, batch_size=self.batch_size, num_workers=self.num_workers)

In [66]:
predict_datamodule = PredictDataModule(path=PATH, batch_size=32, num_workers=20)
preds = trainer.predict(model, datamodule=predict_datamodule)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
c:\Users\kybur\Repos\HSLU\dspro2\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:420: Consider setting `persistent_workers=True` in 'predict_dataloader' to speed up the dataloader worker initialization.


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00,  9.13it/s]


In [69]:
for pred in preds:
    probabilities = nn.Softmax(dim=-1)(pred)
    pred = torch.argmax(probabilities, dim=-1)
    # print(probabilities)
    print(pred)

tensor([ 0,  0, 14,  0,  0,  0, 21,  0,  0])


In [71]:
datamodule.test_dataset.classes[14],datamodule.test_dataset.classes[21]

('Nothing', 'T')