# CNN
## Introduction

This notebook is an attempt to repurpose and finetune an EfficientNet model to the task of American Sign Language detection for the DSPRO2 project at HSLU.

## Setup
In this section all the necessary libraries are imported.

In [2]:
%pip install multidict==6.0.4
%pip install fiftyone
%pip install kornia

Collecting multidict==6.0.4
  Using cached multidict-6.0.4-cp312-cp312-linux_x86_64.whl
Installing collected packages: multidict
  Attempting uninstall: multidict
    Found existing installation: multidict 6.3.2
    Uninstalling multidict-6.3.2:
      Successfully uninstalled multidict-6.3.2
Successfully installed multidict-6.0.4
Note: you may need to restart the kernel to use updated packages.
Collecting scipy (from fiftyone)
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting typing-extensions>=4.5.0 (from strawberry-graphql>=0.262.4->fiftyone)
  Using cached typing_extensions-4.13.2-py3-none-any.whl.metadata (3.0 kB)
Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.3/37.3 MB[0m [31m118.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached typing_extensions-4.13.2-py3-none-any.whl (45 kB)
Installin

In [3]:
%pip install --upgrade --force-reinstall -r requirements.txt

Collecting aiohappyeyeballs==2.6.1 (from -r requirements.txt (line 1))
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiohttp==3.11.16 (from -r requirements.txt (line 2))
  Using cached aiohttp-3.11.16-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting aiosignal==1.3.2 (from -r requirements.txt (line 3))
  Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting annotated-types==0.7.0 (from -r requirements.txt (line 4))
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting asttokens==3.0.0 (from -r requirements.txt (line 5))
  Downloading asttokens-3.0.0-py3-none-any.whl.metadata (4.7 kB)
Collecting attrs==25.3.0 (from -r requirements.txt (line 6))
  Using cached attrs-25.3.0-py3-none-any.whl.metadata (10 kB)
Collecting certifi==2025.1.31 (from -r requirements.txt (line 7))
  Downloading certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)
Collecting charset-normal

In [1]:
import wandb
import torch
import torch.nn as nn
import torchvision.models as visionmodels
import torchvision.transforms.v2 as transforms
import lightning as L

from lightning.pytorch.loggers import WandbLogger

import nbformat

from typing import Callable

import os

# Our own modules
import models.sweep_helper as sweep_helper

from datapipeline.asl_image_data_module import ASLImageDataModule
from datapipeline.asl_kaggle_image_data_module import ASLKaggleImageDataModule, DEFAULT_TRANSFORMS
from datapipeline.asl_transforms import ExtractHand, RandomBackgroundNoise, RandomRealLifeBackground
from models.asl_model import ASLModel
from models.training import sweep, train_model

Downloading split 'train' to '/home/jovyan/fiftyone/open-images-v7/train' if necessary
Necessary images already downloaded
Existing download of split 'train' is sufficient
Loading 'open-images-v7' split 'train'
 100% |███████████████| 1000/1000 [5.6s elapsed, 0s remaining, 223.5 samples/s]      
Dataset 'open-images-v7-train-1000' created
Downloading split 'validation' to '/home/jovyan/fiftyone/open-images-v7/validation' if necessary
Necessary images already downloaded
Existing download of split 'validation' is sufficient
Loading 'open-images-v7' split 'validation'
 100% |███████████████| 1000/1000 [2.1s elapsed, 0s remaining, 468.2 samples/s]      
Dataset 'open-images-v7-validation-1000' created
Downloading split 'test' to '/home/jovyan/fiftyone/open-images-v7/test' if necessary
Necessary images already downloaded
Existing download of split 'test' is sufficient
Loading 'open-images-v7' split 'test'
 100% |███████████████| 1000/1000 [2.0s elapsed, 0s remaining, 514.3 samples/s]      


In [2]:
os.environ["WANDB_NOTEBOOK_NAME"] = "cnn.ipynb"

## Preprocessing
No general data preprocessing is necessary, however there will be random transforms applied to the images during training. The images are resized to 224x224 pixels, which is the input size of the EfficientNet model. The images are also normalized using the mean and standard deviation of the ImageNet dataset, which is the dataset on which the EfficientNet model was pretrained.

The following cells will show the loading of the dataset and the preparation of the mentioned transforms.

In [3]:
PATH = "/exchange/dspro2/silent-speech/ASL_Pictures_Dataset"
PATH = r"C:\Temp\silent-speech"

In [14]:
# datamodule = ASLImageDataModule(path=PATH, val_split_folder="Validation", batch_size=32, num_workers=128)
datamodule = ASLKaggleImageDataModule(path=PATH, train_transforms=DEFAULT_TRANSFORMS.TRAIN, valid_transforms=DEFAULT_TRANSFORMS.VALID, test_transforms=DEFAULT_TRANSFORMS.TEST, batch_size=32, num_workers=20)

## Models

In [5]:
NUM_CLASSES = 28

In [6]:
class ASLCNN_fixed_layers(nn.Module):
    def __init__(self, kernel_size: int, dropout: float =0.2, hidden_dim: int =128, adaptive_pool_size: int =4):
        super().__init__()
        self.kernel_size = kernel_size
        self.hidden_dim = hidden_dim
        self.adaptive_pool_size = adaptive_pool_size
        self.model = nn.Sequential(
            # nn.Conv2d(input_channel, output) output can be chosen freely.
            nn.Conv2d(3, 32, kernel_size=self.kernel_size, stride=1, padding=1),
            nn.ReLU(),            
            nn.MaxPool2d(2),
            
            nn.Conv2d(32, 64, kernel_size=self.kernel_size, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            
            nn.Conv2d(64, 128, kernel_size=self.kernel_size, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            
            nn.AdaptiveAvgPool2d((self.adaptive_pool_size, self.adaptive_pool_size)), # To prevent OOM
            nn.Flatten(),
            
            nn.LazyLinear(self.hidden_dim), # Automatic adaptation to output -"compression"
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(self.hidden_dim, NUM_CLASSES),
        )


    def forward(self, x):
        x = self.model(x)
        return x

    # TODO CHECK
    def get_main_params(self):
        yield from self.model.classifier.parameters()

    def get_finetune_params(self):
        yield from self.model.features.parameters()

## Training

In [7]:
def get_asl_cnn_model(kernel_size:int, dropout:float, hidden_dim:int, adaptive_pool_size:int):
    cnn_model = ASLCNN_fixed_layers(kernel_size=kernel_size, dropout=dropout, hidden_dim=hidden_dim, adaptive_pool_size=adaptive_pool_size)
    print('cnn_model instantiated')
    return cnn_model

def get_cnn_model_from_config(config: dict) -> nn.Module:
    cnn_model = get_asl_cnn_model(config[KERNEL_SIZE], config[DROPOUT], config[HIDDEN_DIM], config[ADAPTIVE_POOL_SIZE])
    print('cnn model from config returned')
    return cnn_model

In [9]:
run_id = 0
SEED = 42

def train_cnn():
    train_model("cnn_with_fixed_layers", get_cnn_model_from_config, datamodule, get_optimizer=sweep_helper.get_optimizer, seed=SEED)

In [13]:
DROPOUT = "DROPOUT"
HIDDEN_DIM = "HIDDEN_DIM"
KERNEL_SIZE = "KERNEL_SIZE"
ADAPTIVE_POOL_SIZE = "adaptive_pool_size"

cnn_sweep_config = {
    "name": "cnn-fixed-layer",
    "method": "bayes",
    "metric": {
        "name": f"{ASLModel.VALID_ACCURACY}",
        "goal": "maximize"
    },
    "early_terminate": {
        "type": "hyperband",
        "min_iter": 5
    },
    "parameters": {
        KERNEL_SIZE: {
            "values": [3, 5]
        },
        DROPOUT: {
            "min": 0.1,
            "max": 0.5
        },
        HIDDEN_DIM: {
            "values": [64, 128, 256]
        },
        ADAPTIVE_POOL_SIZE: {
            "values": [2, 4, 8]
        },
        sweep_helper.OPTIMIZER: {
            "parameters": {
                sweep_helper.TYPE: {
                    "values": [sweep_helper.OptimizerType.RMSPROP,
                              sweep_helper.OptimizerType.ADAMW,]
                },
                sweep_helper.LEARNING_RATE: {
                    "min": 1e-5,
                    "max": 1e-3,
                    "distribution": "log_uniform_values"
                },
                sweep_helper.FINETUNE_LEARNING_RATE: {
                    "min": 1e-7,
                    "max": 1e-5,
                    "distribution": "log_uniform_values"
                },
                sweep_helper.WEIGHT_DECAY: {
                    "min": 0,
                    "max": 1e-3,
                },
                sweep_helper.MOMENTUM: {
                    "min": 0.8,
                    "max": 0.99
                },
            }
        },
        sweep_helper.LEARNING_RATE_SCHEDULER: {
            "parameters": {
                sweep_helper.TYPE: {
                    "values": [
                        sweep_helper.LearningRateSchedulerType.STEP,
                        sweep_helper.LearningRateSchedulerType.EXPONENTIAL
                    ]
                },
                sweep_helper.STEP_SIZE: {"value": 5},
                sweep_helper.GAMMA: {
                    "min": 0.1,
                    "max": 0.9
                }
            }
        }
    }
}


In [14]:
# Donner le nom de l'équipe et du projet comme dans le modèle, pour que ça envoie au bon endroit.
#Uncomment line below to start the sweep
#sweep(sweep_config=cnn_sweep_config, count=5, training_procedure=train_cnn)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Create sweep with ID: a78tm5hq
Sweep URL: https://wandb.ai/dspro2-silent-speech/silent-speech/sweeps/a78tm5hq


[34m[1mwandb[0m: Agent Starting Run: wh3yz7k1 with config:
[34m[1mwandb[0m: 	DROPOUT: 0.28915505934977037
[34m[1mwandb[0m: 	HIDDEN_DIM: 64
[34m[1mwandb[0m: 	KERNEL_SIZE: 3
[34m[1mwandb[0m: 	adaptive_pool_size: 2
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.37531067053629064, 'step_size': 5, 'type': 'exponential'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 2.90483873151357e-07, 'learning_rate': 5.391722838593032e-05, 'momentum': 0.8218145887916356, 'type': 'adamw', 'weight_decay': 0.0007671746539818698}
Seed set to 42
[34m[1mwandb[0m: Currently logged in as: [33mshse13[0m ([33mdspro2-silent-speech[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA A16') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


cnn_model instantiated
cnn model from config returned


/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/site-packages/lightning/pytorch/utilities/model_summary/model_summary.py:477: The total number of parameters detected may be inaccurate because the model contains an instance of `UninitializedParameter`. To get an accurate number, set `self.example_input_array` in your LightningModule.

  | Name           | Type                | Params | Mode 
---------------------------------------------------------------
0 | model          | ASLCNN_fixed_layers | 95.1 K | train
1 | criterion      | CrossEntropyLoss    | 0      | train
2 | train_accuracy | MulticlassAccuracy  | 0      | train
3 | valid_accuracy | MulticlassAccuracy  | 0      | train
4 | test_accuracy  | 

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved. New best score: 0.034
Metric train_accuracy improved. New best score: 0.000


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.051 >= min_delta = 0.0. New best score: 0.085
Metric train_accuracy improved by 0.167 >= min_delta = 0.0. New best score: 0.167


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.009 >= min_delta = 0.0. New best score: 0.094


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.009 >= min_delta = 0.0. New best score: 0.103
Metric train_accuracy improved by 0.167 >= min_delta = 0.0. New best score: 0.333


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric valid_accuracy did not improve in the last 5 records. Best score: 0.103. Signaling Trainer to stop.
Monitored metric train_accuracy did not improve in the last 5 records. Best score: 0.333. Signaling Trainer to stop.


0,1
epoch,▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇██
lr-AdamW,█████▄▄▄▄▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-momentum,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-weight_decay,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▂▂▂▂█▁▅▃▂▅▃▃▁▂▂▅▃▆▂▃▇▅▁▆▄▄▃▃▄▃▂▄▆▂▃▅▃▅▃▃
train_loss,▇▇▇▆▄▄▅▅▅▅▅▅▁▄▄▄▇▄▄▂▅▅█▇▅▄▆▃▁▂▃▃▂▁▃▅▄▄▃▂
trainer/global_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▅▅▅▅▅▆▆▆▇▇▇▇████
valid_accuracy,▁▆▅▅▇█▆▆▇▇▆
valid_loss,█▅▅▁▂▁▃▃▄▄▃

0,1
epoch,10.0
lr-AdamW,0.0
lr-AdamW-momentum,0.9
lr-AdamW-weight_decay,0.00077
train_accuracy,0.0625
train_loss,3.28128
trainer/global_step,56957.0
valid_accuracy,0.08547
valid_loss,3.24425


[34m[1mwandb[0m: Agent Starting Run: fcxce3fk with config:
[34m[1mwandb[0m: 	DROPOUT: 0.3569691689929215
[34m[1mwandb[0m: 	HIDDEN_DIM: 128
[34m[1mwandb[0m: 	KERNEL_SIZE: 3
[34m[1mwandb[0m: 	adaptive_pool_size: 8
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.157316896126446, 'step_size': 5, 'type': 'exponential'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 5.255982421089441e-07, 'learning_rate': 1.200842714958225e-05, 'momentum': 0.8293977200953548, 'type': 'rmsprop', 'weight_decay': 0.0005727954169725348}
Seed set to 42


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


cnn_model instantiated
cnn model from config returned


/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type                | Params | Mode 
---------------------------------------------------------------
0 | model          | ASLCNN_fixed_layers | 96.9 K | train
1 | criterion      | CrossEntropyLoss    | 0      | train
2 | train_accuracy | MulticlassAccuracy  | 0      | train
3 | valid_accuracy | MulticlassAccuracy  | 0      | train
4 | test_accuracy  | MulticlassAccuracy  | 0      | train
---------------------------------------------------------------
96.9 K    Trainable params
0         Non-trainable params
96.9 K    Total params
0.387     Total estimated model params size (MB)
21        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved. New best score: 0.402
Metric train_accuracy improved. New best score: 0.167


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.103 >= min_delta = 0.0. New best score: 0.504
Metric train_accuracy improved by 0.167 >= min_delta = 0.0. New best score: 0.333


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric train_accuracy improved by 0.333 >= min_delta = 0.0. New best score: 0.667


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric valid_accuracy did not improve in the last 5 records. Best score: 0.504. Signaling Trainer to stop.


0,1
epoch,▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█████
lr-RMSprop,██████▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-RMSprop-momentum,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-RMSprop-weight_decay,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▁▃▃▂▃▅▆▄▄▄▆▆▄▆▅▅▆▆▆▅▆▆▅▄▆▇▆▅▆█▆▄▆▄▄▄▄▅▆▅
train_loss,██▆▇▅▄▃▄▃▄▄▃▂▃▄▄▁▃▁▅▃▂▅▃▁▄▃▄▃▃▃▆▃▁▃▃▄▁▃▃
trainer/global_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█
valid_accuracy,▁█▅▇▆▆▅
valid_loss,█▃▄▁▄▃▃

0,1
epoch,6.0
lr-RMSprop,0.0
lr-RMSprop-momentum,0.8294
lr-RMSprop-weight_decay,0.00057
train_accuracy,0.09375
train_loss,2.58141
trainer/global_step,36245.0
valid_accuracy,0.46154
valid_loss,1.78255


[34m[1mwandb[0m: Agent Starting Run: 5em0k8l2 with config:
[34m[1mwandb[0m: 	DROPOUT: 0.3873718073995073
[34m[1mwandb[0m: 	HIDDEN_DIM: 256
[34m[1mwandb[0m: 	KERNEL_SIZE: 3
[34m[1mwandb[0m: 	adaptive_pool_size: 8
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.7557015208586656, 'step_size': 5, 'type': 'step'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 1.985679900059601e-07, 'learning_rate': 2.971796904025888e-05, 'momentum': 0.957088053459289, 'type': 'adamw', 'weight_decay': 0.0005969174483618739}
Seed set to 42


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


cnn_model instantiated
cnn model from config returned


/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type                | Params | Mode 
---------------------------------------------------------------
0 | model          | ASLCNN_fixed_layers | 100 K  | train
1 | criterion      | CrossEntropyLoss    | 0      | train
2 | train_accuracy | MulticlassAccuracy  | 0      | train
3 | valid_accuracy | MulticlassAccuracy  | 0      | train
4 | test_accuracy  | MulticlassAccuracy  | 0      | train
---------------------------------------------------------------
100 K     Trainable params
0         Non-trainable params
100 K     Total params
0.402     Total estimated model params size (MB)
21        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved. New best score: 0.197
Metric train_accuracy improved. New best score: 0.000


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.154 >= min_delta = 0.0. New best score: 0.350
Metric train_accuracy improved by 0.333 >= min_delta = 0.0. New best score: 0.333


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.128 >= min_delta = 0.0. New best score: 0.479


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.188 >= min_delta = 0.0. New best score: 0.667


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric train_accuracy improved by 0.500 >= min_delta = 0.0. New best score: 0.833


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.085 >= min_delta = 0.0. New best score: 0.752


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric train_accuracy did not improve in the last 5 records. Best score: 0.833. Signaling Trainer to stop.


0,1
epoch,▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▅▆▆▆▇▇▇█████
lr-AdamW,████████████████▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▁▁▁▁▁
lr-AdamW-momentum,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-weight_decay,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▂▁▂▂▃▂▃▂▃▄▅▄▅▅▇▆▆▇▅▇▇▇█▆█▆▇▅▆█▇█▆████▆▇▆
train_loss,██▇█▇▇▆▆▆▅▅▅▆▄▅▅▄▅▄▄▅▄▄▄▄▃▃▂▃▂▂▂▃▂▂▁▁▁▂▁
trainer/global_step,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▃▄▄▄▅▅▆▆▆▆▆▇▇▇▇▇▇█████
valid_accuracy,▁▃▅▇▆▇▇███▇
valid_loss,█▆▅▃▃▂▂▁▂▁▁

0,1
epoch,10.0
lr-AdamW,2e-05
lr-AdamW-momentum,0.9
lr-AdamW-weight_decay,0.0006
train_accuracy,0.5625
train_loss,1.18163
trainer/global_step,56957.0
valid_accuracy,0.70085
valid_loss,0.75875


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: e18zxurx with config:
[34m[1mwandb[0m: 	DROPOUT: 0.36439995516415624
[34m[1mwandb[0m: 	HIDDEN_DIM: 128
[34m[1mwandb[0m: 	KERNEL_SIZE: 5
[34m[1mwandb[0m: 	adaptive_pool_size: 2
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.5347582644491832, 'step_size': 5, 'type': 'step'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 2.821299765457936e-07, 'learning_rate': 0.0005821971270945832, 'momentum': 0.9614346108293133, 'type': 'adamw', 'weight_decay': 0.0007264262597527481}
Seed set to 42


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


cnn_model instantiated
cnn model from config returned


/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type                | Params | Mode 
---------------------------------------------------------------
0 | model          | ASLCNN_fixed_layers | 262 K  | train
1 | criterion      | CrossEntropyLoss    | 0      | train
2 | train_accuracy | MulticlassAccuracy  | 0      | train
3 | valid_accuracy | MulticlassAccuracy  | 0      | train
4 | test_accuracy  | MulticlassAccuracy  | 0      | train
---------------------------------------------------------------
262 K     Trainable params
0         Non-trainable params
262 K     Total params
1.049     Total estimated model params size (MB)
21        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved. New best score: 0.632
Metric train_accuracy improved. New best score: 0.333


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.085 >= min_delta = 0.0. New best score: 0.718
Metric train_accuracy improved by 0.500 >= min_delta = 0.0. New best score: 0.833


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.094 >= min_delta = 0.0. New best score: 0.812
Metric train_accuracy improved by 0.167 >= min_delta = 0.0. New best score: 1.000


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.060 >= min_delta = 0.0. New best score: 0.872


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.009 >= min_delta = 0.0. New best score: 0.880


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.051 >= min_delta = 0.0. New best score: 0.932


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.043 >= min_delta = 0.0. New best score: 0.974
Monitored metric train_accuracy did not improve in the last 5 records. Best score: 1.000. Signaling Trainer to stop.


0,1
epoch,▁▁▁▁▁▂▂▂▂▂▃▃▃▃▄▄▄▅▅▅▅▅▅▅▆▆▆▇▇▇▇▇▇███████
lr-AdamW,███████████████████████████▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-momentum,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-weight_decay,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▁▃▅▆▆▇▇▅▆▆▆▇▅▇▇▇▆▆▇▇▆▇▆▆█▇▇█▆▇█▇▇▇▇▇██▇█
train_loss,█▇▅▄▃▄▄▄▄▃▄▂▄▄▃▂▂▁▁▁▂▁▂▂▂▁▁▂▂▂▁▁▁▁▁▁▁▁▂▁
trainer/global_step,▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
valid_accuracy,▁▃▅▆▆▇▇█
valid_loss,█▅▅▃▄▂▃▁

0,1
epoch,7.0
lr-AdamW,0.00031
lr-AdamW-momentum,0.9
lr-AdamW-weight_decay,0.00073
train_accuracy,0.9375
train_loss,0.2793
trainer/global_step,41423.0
valid_accuracy,0.97436
valid_loss,0.10888


[34m[1mwandb[0m: Agent Starting Run: f2ls0247 with config:
[34m[1mwandb[0m: 	DROPOUT: 0.3921593467720055
[34m[1mwandb[0m: 	HIDDEN_DIM: 256
[34m[1mwandb[0m: 	KERNEL_SIZE: 5
[34m[1mwandb[0m: 	adaptive_pool_size: 2
[34m[1mwandb[0m: 	learning_rate_scheduler: {'gamma': 0.5700421186762996, 'step_size': 5, 'type': 'step'}
[34m[1mwandb[0m: 	optimizer: {'finetune_learning_rate': 2.507693347899161e-07, 'learning_rate': 0.0003640022645743407, 'momentum': 0.8812241020684453, 'type': 'adamw', 'weight_decay': 0.0006674434055487764}
Seed set to 42


Seed set to 42
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


cnn_model instantiated
cnn model from config returned


/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type                | Params | Mode 
---------------------------------------------------------------
0 | model          | ASLCNN_fixed_layers | 265 K  | train
1 | criterion      | CrossEntropyLoss    | 0      | train
2 | train_accuracy | MulticlassAccuracy  | 0      | train
3 | valid_accuracy | MulticlassAccuracy  | 0      | train
4 | test_accuracy  | MulticlassAccuracy  | 0      | train
---------------------------------------------------------------
265 K     Trainable params
0         Non-trainable params
265 K     Total params
1.063     Total estimated model params size (MB)
21        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved. New best score: 0.701
Metric train_accuracy improved. New best score: 0.500


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.034 >= min_delta = 0.0. New best score: 0.735
Metric train_accuracy improved by 0.500 >= min_delta = 0.0. New best score: 1.000


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.154 >= min_delta = 0.0. New best score: 0.889


Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.026 >= min_delta = 0.0. New best score: 0.915


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_accuracy improved by 0.068 >= min_delta = 0.0. New best score: 0.983


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric train_accuracy did not improve in the last 5 records. Best score: 1.000. Signaling Trainer to stop.


0,1
epoch,▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▃▃▃▅▅▅▅▅▅▅▆▆▇▇▇██████████
lr-AdamW,██████████████████████████████████▁▁▁▁▁▁
lr-AdamW-momentum,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW-weight_decay,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▁▁▂▃▄▄▅▄▅▆▆▅▅▆▆▆▇▆▆▇▆▇▇▇█▇█▇▇█▇▆███▇▇███
train_loss,█▆▆▆█▅▅▅▅▄▃▃▄▃▂▃▃▂▂▃▂▂▂▁▂▂▃▂▂▂▂▂▁▂▁▁▁▁▁▁
trainer/global_step,▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
valid_accuracy,▁▂▆▆▆██
valid_loss,█▇▃▂▃▁▃

0,1
epoch,6.0
lr-AdamW,0.00021
lr-AdamW-momentum,0.9
lr-AdamW-weight_decay,0.00067
train_accuracy,0.96875
train_loss,0.13267
trainer/global_step,36245.0
valid_accuracy,0.96581
valid_loss,0.32556


## Evaluation

In [8]:
from models.evaluation import Evaluation
from models.training import PROJECT_NAME, ENTITY_NAME

In [15]:
#architecture = get_asl_efficientnet_model("finetune", get_pretrained_efficientnet_model("b0"), 0, 1)
#architecture = get_asl_cnn_model(config[KERNEL_SIZE], config[DROPOUT], config[HIDDEN_DIM], config[ADAPTIVE_POOL_SIZE])
architecture = get_asl_cnn_model(5,  0.36439995516415624, 128, 2)
PATH = "/exchange/dspro2/silent-speech/Test_Images"
#Rerun datamodule cell to update path

evaluation = Evaluation("cnn-with-fixed-layers-4-eval", project=PROJECT_NAME, entity=ENTITY_NAME, model_architecture=architecture, artifact="dspro2-silent-speech/silent-speech/model-e18zxurx:v3", datamodule=datamodule)

cnn_model instantiated


In [16]:
evaluation()

[34m[1mwandb[0m:   1 of 1 files downloaded.  
You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Downloading from https://www.kaggle.com/api/v1/datasets/download/kapillondhe/american-sign-language?dataset_version_number=1...


100%|██████████| 4.64G/4.64G [02:35<00:00, 32.0MB/s]

Extracting files...



/opt/conda/lib/python3.12/site-packages/lightning/pytorch/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      test_accuracy         0.9285714030265808
        test_loss           0.1564255803823471
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


0,1
epoch,▁
test_accuracy,▁
test_loss,▁
trainer/global_step,▁

0,1
epoch,0.0
test_accuracy,0.92857
test_loss,0.15643
trainer/global_step,0.0


In [30]:
trainer = L.Trainer(accelerator="auto")

You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


In [46]:
trainer.test(model, datamodule)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Split folders already exist, skipping distribution.


c:\Users\kybur\Repos\HSLU\dspro2\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:420: Consider setting `persistent_workers=True` in 'test_dataloader' to speed up the dataloader worker initialization.


Testing DataLoader 0: 100%|██████████| 1037/1037 [01:39<00:00, 10.43it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   test_accuracy_epoch      0.9975572228431702
     test_loss_epoch       0.012213773094117641
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'test_loss_epoch': 0.012213773094117641,
  'test_accuracy_epoch': 0.9975572228431702}]

In [43]:
import torchvision.datasets as datasets
import lightning as L
from torch.utils.data import Dataset, DataLoader

data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToImage(),
    transforms.ToDtype(torch.float32, scale=True),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # ImageNet stats
])

class PredictDataModule(L.LightningDataModule):
    def __init__(self, path: str, batch_size: int = 32, num_workers: int = 0):
        super().__init__()
        self.path = path
        self.batch_size = batch_size
        self.num_workers = num_workers

    def setup(self, stage: str):
        self.predict_dataset = datasets.ImageFolder(root=r"C:\Users\kybur\Downloads", transform=data_transforms, allow_empty=True)

    def predict_dataloader(self):
        return torch.utils.data.DataLoader(self.predict_dataset, batch_size=self.batch_size, num_workers=self.num_workers)

In [44]:
predict_datamodule = PredictDataModule(path=PATH, batch_size=32, num_workers=20)
preds = trainer.predict(model, datamodule=predict_datamodule)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00,  8.35it/s]


In [45]:
for pred in preds:
    probabilities = nn.Softmax(dim=-1)(pred)
    pred = torch.argmax(probabilities, dim=-1)
    # print(probabilities)
    print(pred)

tensor([17, 17, 14, 19, 14, 14,  7, 14,  7])


In [40]:
datamodule.test_dataset.classes[7],datamodule.test_dataset.classes[17],datamodule.test_dataset.classes[19]

('H', 'Q', 'S')