[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Deep Learning - Image to Image - Image Segmentation

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 28/11/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0099DeepLearningObjectDetection.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Image Processing and Computer Vision
import skimage as ski

# Deep Learning
import torch
import torch.nn            as nn
import torch.nn.functional as F
from torch.optim.optimizer import Optimizer
from torch.optim.lr_scheduler import LRScheduler
from torch.utils.data import DataLoader, Dataset
from torch.utils.tensorboard import SummaryWriter
import torchinfo
from torchmetrics.classification import MulticlassAccuracy
import torchvision
from torchvision.transforms import v2 as TorchVisionTrns

# Miscellaneous
from enum import auto, Enum
from enum import unique
import math
import os
from platform import python_version
import random
import shutil
from zipfile import ZipFile

# Typing
from typing import Callable, Dict, Generator, List, Optional, Self, Set, Tuple, Union
from numpy.typing import NDArray
from torch import Tensor

# Visualization
import matplotlib.pyplot as plt

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Improve performance by benchmarking
torch.backends.cudnn.benchmark = True

# Reproducibility (Per PyTorch Version on the same device)
# torch.manual_seed(seedNum)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark     = False #<! Makes things slower

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

PROJECT_NAME     = 'FixelCourses'
DATA_FOLDER_PATH = 'DataSets'
BASE_FOLDER      = os.getcwd()[:(len(os.getcwd()) - (os.getcwd()[::-1].lower().find(PROJECT_NAME.lower()[::-1])))]

TENSOR_BOARD_BASE   = 'TB'

In [None]:
# Download Auxiliary Modules for Google Colab
if runInGoogleColab:
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataManipulation.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DataVisualization.py
    !wget https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/master/AIProgram/2024_02/DeepLearningPyTorch.py

In [None]:
# Courses Packages

from DataManipulation import DownloadKaggleDataset
from DataVisualization import PlotLabelsHistogram, PlotMnistImages, PlotScatterData
from DeepLearningBlocks import NNMode
from DeepLearningPyTorch import ToTensor
from DeepLearningPyTorch import TrainModel

* <font color='blue'>(**!**)</font> Go through `GenLabeldDataEllipse()`.
* <font color='blue'>(**!**)</font> Go through `ObjectDetectionDataset`.
* <font color='blue'>(**!**)</font> Go through `YoloGrid`.

In [None]:
# General Auxiliary Functions

def TensorImageNumpy( tZ: Tensor ) -> NDArray:
    """
    Converts a PyTorch Tensor to a Numpy Array.
    """
    mZ = tZ.squeeze()
    mX = mZ.detach().cpu().numpy()

    return mX

## Auto Encoder

The _Object Detection_ task generalizes the _Object Localization_ task in 2 manners:

1. Support for many objects at the same image.
2. Detection as if there is any object at all.

This notebook demonstrates:
 - Generating a synthetic data set.
 - Generating the _target_ data in the YOLO form.
 - Building a model for _Object Detection_.
 - Training a model with a composed objective.

</br>

* <font color='brown'>(**#**)</font> The _Object Detection_ in this notebook is applies in _YOLO_ style: Single Pass, Grid and Anchors.

In [None]:
# Parameters

# Data
dataFolder      = os.path.join(BASE_FOLDER, DATA_FOLDER_PATH)
kggleUser       = 'girish17019'
kaggleDataset   = 'mobile-phone-defect-segmentation-dataset'
tmpFileName     = 'TMP.zip'
dCls            = {0: 'None', 1: 'Scratch', 2: 'Stain', 3: 'Oil'} #<! Defect Classes
lCls            = list(dCls.keys())
numCls          = len(lCls) #<! Number of classes

# Model
latDim     = 2
latDimFctr = 8 #<! Linear Layer

# Training
batchSize   = 512
numWorkers  = 2 #<! Number of workers
numEpochs   = 5

# Visualization
numImg = 3

## Generate / Load Data

The data is the [Mobile Phone Screen Surface Defect Segmentation Dataset](https://github.com/jianzhang96/MSD).  

* <font color='brown'>(**#**)</font> The data is downloaded by the Kaggle version of the dataset: [Kaggle - Mobile Phone Screen Surface Defect Segmentation Dataset](https://www.kaggle.com/datasets/girish17019/mobile-phone-defect-segmentation-dataset).

In [None]:
# Download and Parse the Dataset

DownloadKaggleDataset(kggleUser, kaggleDataset, tmpFileName)

In [None]:
# Extract Files
# Will create:
# - `FixelCourses/DataSets/MSD/Images` - Contains all images.
# - `FixelCourses/DataSets/MSD/Masks` - Contains all masks.
# Files are: `<Class>_<ImgIdxCls>.png`

datasetFolderPath = os.path.join(dataFolder, 'MSD')
imgDatasetFolderPath = os.path.join(datasetFolderPath, 'Images')
maskDatasetFolderPath = os.path.join(datasetFolderPath, 'Masks')

# Delete existing folders if any
if os.path.exists(datasetFolderPath):
    shutil.rmtree(datasetFolderPath)

os.makedirs(imgDatasetFolderPath, exist_ok = True)
os.makedirs(maskDatasetFolderPath, exist_ok = True)

with ZipFile(tmpFileName, 'r') as zipFile:
    lFiles = zipFile.namelist()
    for file in lFiles:
        fileName = os.path.basename(file)
        # Make the class explicit in the name
        fileName = fileName.replace('Scr', 'Scratch')
        fileName = fileName.replace('Sta', 'Stain')
        if file.startswith('oil/') or file.startswith('scratch/') or file.startswith('stain/'):
            with open(os.path.join(imgDatasetFolderPath, fileName), 'wb') as hFile:
                hFile.write(zipFile.read(file))
        elif file.startswith('good/'):
            # Do not have masks, create an empty mask
            fileName = 'None_' + fileName
            with open(os.path.join(imgDatasetFolderPath, fileName), 'wb') as hFile:
                hFile.write(zipFile.read(file))
            
            mI = ski.io.imread(os.path.join(imgDatasetFolderPath, fileName))
            mI = np.zeros_like(mI)
            ski.io.imsave(os.path.join(maskDatasetFolderPath, fileName), mI, check_contrast = False)
        elif file.startswith('ground_truth_1/') or file.startswith('ground_truth_2/'):
            with open(os.path.join(maskDatasetFolderPath, fileName), 'wb') as hFile:
                hFile.write(zipFile.read(file))

* <font color='red'>(**?**)</font> Go through files using the OS's image viewer. Specifically the mask images. What can you say about the classes per image?

<!-- Each image mask contain only a single class. Hence predicting the mask class can be done in global manner and not in a per pixel manner. -->

### DataSet

Generate a `DataSet` class as a loader of the data.

* <font color='brown'>(**#**)</font> Since each image contains a single class, the mask can be represented as a binary mask for the defect area and a global class label by a classification head.
* <font color='brown'>(**#**)</font> There images with no defects. Hence a `None` class should be added.

In [None]:
# The Dataset Class

class MSDDataset(Dataset):
    def __init__( self, imgFolderPath: str, maskFolderPath: str, /, *, hTrns: Optional[Callable] = None, lImgFormats: List = ['jpg', 'jpeg', 'png'] ) -> None:
        """
        Mobile Phone Defect Segmentation Dataset.

        Parameters
        ----------
        imgFolderPath : str
            Path to the folder containing images.
        maskFolderPath : str
            Path to the folder containing masks.
        hTrns : Optional[Callable], optional
            Transform to be applied on a sample, by default None. Must be TorchVision v2 transforms compatible.
        """
        super().__init__()

        lImgFiles = os.listdir(imgFolderPath)
        lImgFiles = [ff for ff in lImgFiles if (ff.split('.')[-1].lower() in lImgFormats) and (os.path.isfile(os.path.join(maskFolderPath, ff.split('.')[0] + '.png')))]

        self._imgFolderPath  = imgFolderPath
        self._maskFolderPath = maskFolderPath
        self._hTrns          = hTrns

        self.lImgFiles = os.listdir(self._imgFolderPath)




In [None]:
zipPath

In [None]:
zipFile.extract(zipPath)

In [None]:
# Loader Transform

oTrns = TorchVisionTrns.Compose([
    TorchVisionTrns.ToImage(),
    TorchVisionTrns.ToDtype(torch.float, scale = True),
])

In [None]:
# Data Set

dsTrain = torchvision.datasets.MNIST(root = dataFolder, train = True,  transform = oTrns, download = True)
dsVal   = torchvision.datasets.MNIST(root = dataFolder, train = False, transform = oTrns, download = True)

print(f'The training data set RAW data shape: {dsTrain.data.shape}')
print(f'The validation data set RAW data shape: {dsVal.data.shape}')

* <font color='brown'>(**#**)</font> One could use negative values for the bounding box. The model will extrapolate the object dimensions.

In [None]:
# Data Loader

dlTrain = torch.utils.data.DataLoader(dsTrain, shuffle = True, batch_size = batchSize, num_workers = 2, persistent_workers = True)
dlVal   = torch.utils.data.DataLoader(dsVal, shuffle = False, batch_size = 2 * batchSize, num_workers = 2, persistent_workers = True)

* <font color='red'>(**?**)</font> Why are lists used instead of arrays for the labels and the bounding boxes?

In [None]:
# Element of the Data Set / Data Sample

tX, valY = dsTrain[0]

print(f'The features shape: {tX.shape}')
print(f'The label         : {valY}')

* <font color='brown'>(**#**)</font> Since the labels are in the same contiguous container as the bounding box parameters, their type is `Float`.
* <font color='brown'>(**#**)</font> The bounding box is using absolute values. In practice it is commonly normalized to the image dimensions.

### Plot the Data

In [None]:
# Plot the Data

mX = np.reshape(dsTrain.data.numpy(), (dsTrain.data.shape[0], -1))
vY = dsTrain.targets.numpy()

hF = PlotMnistImages(mX, vY, 3, 3)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY, lClass = L_CLASSES)
plt.show()

* <font color='red'>(**?**)</font> Explain the amount of samples in the histogram per class and in total.

## Train Classifier

As scorer

In [None]:
oClsModel = nn.Sequential(
#===========================Fill This===========================#
# 1. Create the 3rd model.
# 2. Use 3 layers.
# !! You may use different kernel size, dropout probability, max pooling, etc...

    nn.Identity(),
    
    nn.Conv2d(in_channels = 1, out_channels = 30, kernel_size = 7, bias = False),
    nn.MaxPool2d(kernel_size = 2),
    nn.BatchNorm2d(num_features = 30),
    nn.ReLU(),
    
    nn.Conv2d(in_channels = 30, out_channels = 60, kernel_size = 5, bias = False),
    nn.MaxPool2d(kernel_size = 2),
    nn.BatchNorm2d(num_features = 60),
    nn.ReLU(),
            
    nn.Conv2d(in_channels = 60,  out_channels = 120, kernel_size = 3, bias = False),
    nn.BatchNorm2d(num_features = 120),
    nn.ReLU(),
    
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten(),
    nn.Linear(120, 10),
#===============================================================#
)

torchinfo.summary(oClsModel, (256, *(TU_IMG_SIZE[::-1])), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu', row_settings = ['depth', 'var_names'])

In [None]:
# Check GPU Availability

runDevice   = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device

In [None]:
# Loss and Score
hL = nn.CrossEntropyLoss()
hS = MulticlassAccuracy(num_classes = 10, average = 'micro')
hL = hL.to(runDevice) #<! Not required!
hS = hS.to(runDevice)

In [None]:
oClsModel = oClsModel.to(runDevice) #<! Transfer model to device
oOpt = torch.optim.AdamW(oClsModel.parameters(), lr = 6e-4, betas = (0.9, 0.99), weight_decay = 1e-3) #<! Define optimizer
oRunModel, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate = TrainModel(oClsModel, dlTrain, dlVal, oOpt, numEpochs, hL, hS)

In [None]:
# Plot Training Phase

hF, vHa = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 5))
vHa = np.ravel(vHa)

hA = vHa[0]
hA.plot(lTrainLoss, lw = 2, label = 'Train')
hA.plot(lValLoss, lw = 2, label = 'Validation')
hA.set_title(f'Classification Loss')
hA.set_xlabel('Epoch')
hA.set_ylabel('Loss')
hA.legend()

hA = vHa[1]
hA.plot(lTrainScore, lw = 2, label = 'Train')
hA.plot(lValScore, lw = 2, label = 'Validation')
hA.set_title('Classification Score')
hA.set_xlabel('Epoch')
hA.set_ylabel('Score')
hA.legend()

hA = vHa[2]
hA.plot(lLearnRate, lw = 2)
hA.set_title('Learn Rate Scheduler')
hA.set_xlabel('Epoch')
hA.set_ylabel('Learn Rate');

### The Model

aa

In [None]:
# Encoder Model

def BuildModelEncoder( inSize: int, numChnl: int, latDim: int, latDimFctr: int = latDimFctr ) -> nn.Module:

    numLayers  = math.floor(math.log2(inSize))
    lLayers    = [nn.Identity()]
    inChannels = numChnl
    
    for ii in range(numLayers):
        outChannels = 2 * inChannels
        lLayers.append(nn.Conv2d(inChannels, outChannels, 3, padding = 'same', bias = False))
        lLayers.append(nn.BatchNorm2d(outChannels))
        lLayers.append(nn.ReLU())
        lLayers.append(nn.MaxPool2d(2))
        inChannels = outChannels

    lLayers.append(nn.Conv2d(outChannels, latDimFctr * latDim, 1))
    lLayers.append(nn.AdaptiveAvgPool2d((1, 1)))
    lLayers.append(nn.Flatten())
    lLayers.append(nn.Linear(latDimFctr * latDim, latDim))

    oModel = nn.Sequential(*lLayers)

    return oModel 

In [None]:
# Build Encoder
oModel = BuildModelEncoder(TU_IMG_SIZE[0], 1, latDim)

# Model Information
torchinfo.summary(oModel, (batchSize, *(TU_IMG_SIZE[::-1])), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu', row_settings = ['depth', 'var_names'])

In [None]:
# Decoder Model

def BuildModelDecoder( latDim: int, numChnl: int, outSize: int, latDimFctr: int = latDimFctr ) -> nn.Module:

    numLayers  = math.floor(math.log2(outSize))
    lLayers    = [nn.Identity(), nn.Linear(latDim, latDimFctr * latDim), nn.Unflatten(1, (latDimFctr * latDim, 1, 1))]
    inChannels = latDimFctr * latDim
    
    for ii in range(numLayers):
        outChannels = 2 * inChannels
        lLayers.append(nn.Upsample(scale_factor = 2))
        lLayers.append(nn.Conv2d(inChannels,  outChannels,  kernel_size = 3, padding = 'same', bias = False))
        lLayers.append(nn.BatchNorm2d(outChannels))
        lLayers.append(nn.ReLU())
        inChannels = outChannels    

    lLayers.append(nn.Conv2d(outChannels, numChnl, 1))
    lLayers.append(nn.AdaptiveAvgPool2d((outSize, outSize)))
    lLayers.append(nn.Sigmoid()) #<! Force into [0, 1]

    oModel = nn.Sequential(*lLayers)

    return oModel 

In [None]:
# Build Decoder
oModel = BuildModelDecoder(latDim, 1, TU_IMG_SIZE[0])

# Model Information
torchinfo.summary(oModel, (batchSize, 2), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu', row_settings = ['depth', 'var_names'])

In [None]:
# The Model

def BuildModel( inSize: int, latDim: int, numChnl: int ) -> nn.Module:

    oModel = nn.Sequential(
        BuildModelEncoder(inSize, numChnl, latDim),
        BuildModelDecoder(latDim, numChnl, inSize),
    )

    # Trick to name a module (Does not work, changes the name of any `Sequential` module)
    # oModel.__class__.__name__ = 'AutoEncoder'

    return oModel 

In [None]:
# Build Model
oModel = BuildModel(TU_IMG_SIZE[0], latDim, TU_IMG_SIZE[-1])


# Model Information
torchinfo.summary(oModel, (batchSize, *(TU_IMG_SIZE[::-1])), col_names = ['kernel_size', 'output_size', 'num_params'], device = 'cpu', row_settings = ['depth', 'var_names'])

In [None]:
# Residual Block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        return torch.relu(out)

# Adaptive Encoder
class Encoder(nn.Module):
    def __init__(self, imgChnls: int, imgSize: int, latDim: int, lNumChnl: List[int], minSize: int):
        super(Encoder, self).__init__()
        self.init = nn.Identity()
        self.initial = nn.Conv2d(imgChnls, lNumChnl[0], kernel_size = 3, stride = 1, padding = 1)

        # Compute how many downsampling steps to take
        self.numLayers = int(math.log2(imgSize // minSize))  #<! Number of UpSamples

        lNumChnl = lNumChnl[:self.numLayers]  # Increase channels

        lLayers = []
        inChnls = lNumChnl[0]
        for outChnls in lNumChnl:
            lLayers.append(ResidualBlock(inChnls, outChnls, stride = 2))
            lLayers.append(ResidualBlock(outChnls, outChnls))  # Extra residual layer for depth
            inChnls = outChnls
        
        self.res_blocks = nn.Sequential(*lLayers)

        self.final = nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)),  # Global pooling
            nn.Flatten(),
            nn.Linear(inChnls, latDim)
        )

    def forward(self, x):
        x = self.init(x)
        x = torch.relu(self.initial(x))
        x = self.res_blocks(x)
        x = self.final(x)
        return x

# Adaptive Decoder
class Decoder(nn.Module):
    def __init__(self, latDim: int, imgChnl: int, imgSize: int, lNumChnl: List[int], minSize: int):
        super(Decoder, self).__init__()
        
        self.init      = nn.Identity()
        self.numLayers = int(math.log2(imgSize // minSize)) + 1 # Mirror the encoder
        self.minSize   = minSize

        lNumChnl = lNumChnl[::-1][:self.numLayers]  # Reverse order

        self.linear = nn.Linear(latDim, lNumChnl[0] * minSize * minSize)  # Map to feature space

        lLayers = []
        inChnls = lNumChnl[0]
        for outChnls in lNumChnl:
            lLayers.append(nn.Identity())
            lLayers.append(nn.ConvTranspose2d(inChnls, outChnls, kernel_size = 3, stride = 2, padding = 1))
            lLayers.append(nn.ReLU())
            inChnls = outChnls

        lLayers.append(nn.Conv2d(inChnls, imgChnl, kernel_size = 3, stride = 1, padding = 1))
        lLayers.append(nn.ReLU())
        lLayers.append(nn.AdaptiveAvgPool2d((imgSize, imgSize)))
        lLayers.append(nn.Sigmoid())  # Output in range [0, 1]

        self.deconv_blocks = nn.Sequential(*lLayers)

    def forward(self, x):
        x = self.init(x)
        x = self.linear(x)
        x = x.view(x.shape[0], -1, self.minSize, self.minSize)  # Reshape to feature map
        x = self.deconv_blocks(x)
        return x

# Adaptive AutoEncoder
class AutoEncoder(nn.Module):
    def __init__(self, inChnls: int, imgSize: int, latDim: int, lNumChnl: List[int] = [16 * (ii + 1) for ii in range(8)], minSize: int = 4):
        super(AutoEncoder, self).__init__()
        self.init    = nn.Identity()
        self.encoder = Encoder(inChnls, imgSize, latDim, lNumChnl, minSize)
        self.decoder = Decoder(latDim, inChnls, imgSize, lNumChnl, minSize)

    def forward(self, x):
        x = self.init(x)
        latent = self.encoder(x)
        reconstructed = self.decoder(latent)
        return reconstructed

In [None]:
# Build Model
oModel = AutoEncoder(TU_IMG_SIZE[-1], TU_IMG_SIZE[0], latDim)

# Model Information
torchinfo.summary(oModel, (batchSize, *(TU_IMG_SIZE[::-1])), col_names = ['kernel_size', 'output_size', 'num_params'], depth = 3, device = 'cpu', row_settings = ['depth', 'var_names'])

In [None]:
# Encoder
class Encoder(nn.Module):
    def __init__(self, latDim: int ) -> None:
        super(Encoder, self).__init__()
        
        self.InputLayer = nn.Identity()
        self.InitConv   = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size = 3, stride = 1, padding = 'same'),
            nn.ReLU(),
        )
        self.DownSample001 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 'same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2),
        )
        self.DownSample002 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 'same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2),
        )

        self.Latent = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 'same'),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(256 * 7 * 7, latDim),
        )

    def forward(self, x):
        x = self.InputLayer(x)
        x = self.InitConv(x)
        x = self.DownSample001(x)
        x = self.DownSample002(x)
        x = self.Latent(x)
        return x

# Adaptive Decoder
class Decoder(nn.Module):
    def __init__(self, latDim: int) -> None:
        super(Decoder, self).__init__()
        
        self.InputLayer = nn.Identity()
        self.Latent = nn.Sequential(
            nn.Linear(latDim, 256 * 7 * 7),
            nn.Unflatten(1, (256, 7, 7)),
        )

        self.UpSample001 = nn.Sequential(
            nn.ConvTranspose2d(256, 128, kernel_size = 3, stride = 2, padding = 1, output_padding = 1),
            nn.ReLU(),
        )

        self.UpSample002 = nn.Sequential(
            nn.ConvTranspose2d(128, 64, kernel_size = 3, stride = 2, padding = 1, output_padding = 1),
            nn.ReLU(),
        )

        self.FinalConv = nn.Sequential(
            nn.Conv2d(64, 1, kernel_size = 3, stride = 1, padding = 'same'),
            nn.Sigmoid(),
        )

    def forward(self, x):
        x = self.InputLayer(x)
        x = self.Latent(x)
        x = self.UpSample001(x)
        x = self.UpSample002(x)
        x = self.FinalConv(x)
        return x

# Adaptive AutoEncoder
class AutoEncoder(nn.Module):
    def __init__(self, latDim: int) -> None:
        super(AutoEncoder, self).__init__()
        self.InputLayer = nn.Identity()
        self.Encoder    = Encoder(latDim)
        self.Decoder    = Decoder(latDim)

    def forward(self, x):
        x = self.InputLayer(x)
        x = self.Encoder(x)
        x = self.Decoder(x)
        return x

In [None]:
# Build Model
oModel = AutoEncoder(latDim)

# Model Information
torchinfo.summary(oModel, (batchSize, *(TU_IMG_SIZE[::-1])), col_names = ['kernel_size', 'output_size', 'num_params'], depth = 3, device = 'cpu', row_settings = ['depth', 'var_names'])

## Train the Model

This section trains the model.  

* <font color='brown'>(**#**)</font> The training loop must be adapted to the new loss function.

### Image Localization Loss

The loss is a composite of 2 loss functions:

$$\ell\left(\hat{\boldsymbol{y}},\boldsymbol{y}\right)=\lambda_{\text{MSE}}\cdot\ell_{\text{MSE}}\left(\hat{\boldsymbol{y}}_{\text{bbox}},\boldsymbol{y}_{\text{bbox}}\right)+\lambda_{\text{CE}}\cdot\ell_{\text{CE}}\left(\hat{\boldsymbol{y}}_{\text{label}},\boldsymbol{y}_{\text{label}}\right)$$

Where $\lambda_{\text{MSE}}$ and $\lambda_{\text{CE}}$ are the weights of each loss.

* <font color='brown'>(**#**)</font> In practice a single $\lambda$ is required.
* <font color='brown'>(**#**)</font> The MSE is not optimal loss function. It will be replaced by the _Log Euclidean_ loss.

In [None]:
# Run Device

runDevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #<! The 1st CUDA device

In [None]:
# Classification Loss
class RecLoss( nn.Module ):
    def __init__( self ) -> None:
        super(RecLoss, self).__init__()

        self.oMseLoss = nn.MSELoss()
    
    def forward( self: Self, mZ: torch.Tensor, mX: torch.Tensor, mY: torch.Tensor ) -> Tuple[float, float, float]:

        valLoss = self.oMseLoss(mZ, mX)
        
        return valLoss

In [None]:
# Classification Score
class ClsScore( nn.Module ):
    def __init__( self, oModel: nn.Module, numCls: int ) -> None:
        super(ClsScore, self).__init__()

        self.oModel   = oModel.eval()
        self.AccScore = MulticlassAccuracy(num_classes = numCls, average = 'micro')
    
    def forward( self: Self, mZ: torch.Tensor, mX: torch.Tensor, mY: torch.Tensor ) -> Tuple[float, float, float]:

        mYHat    = self.oModel(mZ)
        valScore = self.AccScore(mYHat, mY)
        
        return valScore

In [None]:
# Loss and Score Function

hL = RecLoss()
hS = ClsScore(oClsModel, numCls)

hL = hL.to(runDevice)
hS = hS.to(runDevice)

In [None]:
# Training Loop

numEpochs = 100

oModel = oModel.to(runDevice)
oOpt = torch.optim.AdamW(oModel.parameters(), lr = 1e-5, betas = (0.9, 0.99), weight_decay = 1e-5) #<! Define optimizer
oSch = torch.optim.lr_scheduler.OneCycleLR(oOpt, max_lr = 5e-3, total_steps = numEpochs)
oModel, lTrainLoss, lTrainScore, lValLoss, lValScore, lLearnRate = TrainModelSelfSupervised(oModel, dlTrain, dlVal, oOpt, numEpochs, hL, hS, oSch = oSch)

In [None]:
# Plot Training Phase

hF, vHa = plt.subplots(nrows = 1, ncols = 3, figsize = (12, 5))
vHa = np.ravel(vHa)

hA = vHa[0]
hA.plot(lTrainLoss, lw = 2, label = 'Train')
hA.plot(lValLoss, lw = 2, label = 'Validation')
hA.set_title(f'Reconstruction Loss')
hA.set_xlabel('Epoch')
hA.set_ylabel('Loss')
hA.legend()

hA = vHa[1]
hA.plot(lTrainScore, lw = 2, label = 'Train')
hA.plot(lValScore, lw = 2, label = 'Validation')
hA.set_title('Classification Score')
hA.set_xlabel('Epoch')
hA.set_ylabel('Score')
hA.legend()

hA = vHa[2]
hA.plot(lLearnRate, lw = 2)
hA.set_title('Learn Rate Scheduler')
hA.set_xlabel('Epoch')
hA.set_ylabel('Learn Rate');

In [None]:
# Inference Mode

oModel = oModel.eval()

In [None]:
# Sample from Train
tX, valY = dsTrain[7]

tX = tX.to(runDevice).unsqueeze(0)

with torch.inference_mode():
    tZ = oModel(tX)
mZ = TensorImageNumpy(tZ)
mX = TensorImageNumpy(tX)

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (6, 3))
vHa = vHa.flat

hA = vHa[0]
hA.imshow(mX, cmap = 'gray');
hA = vHa[1]
hA.imshow(mZ, cmap = 'gray');

In [None]:
# Sample from Validation
tX, valY = dsVal[23]

tX = tX.to(runDevice).unsqueeze(0)

with torch.inference_mode():
    tZ = oModel(tX)
mZ = TensorImageNumpy(tZ)
mX = TensorImageNumpy(tX)

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (6, 3))
vHa = vHa.flat

hA = vHa[0]
hA.imshow(mX, cmap = 'gray');
hA = vHa[1]
hA.imshow(mZ, cmap = 'gray');

In [None]:
# Sample from Validation
tX, valY = dsVal[23]

tX = tX.to(runDevice).unsqueeze(0)

with torch.inference_mode():
    tE = oModel.Encoder(tX)

tE

In [None]:
# Plot Prediction
# TODO: Check classification

# rndIdx = np.random.randint(numSamplesVal)

# mX, vY = dsVal[rndIdx]
# valY    = int(vY[0])
# vB      = vY[1:]
# with torch.no_grad():
#     tX = torch.tensor(mX)
#     tX = torch.unsqueeze(tX, 0)
#     tX = tX.to(runDevice)
#     mYHat = oModel(tX).detach().cpu().numpy()

# vYHat       = mYHat[0]
# valYHat     = np.argmax(vYHat[:numCls])
# vBHat       = vYHat[numCls:]

# hA = PlotBox(np.transpose(mX, (1, 2, 0)), L_CLASSES[valY], vB)
# hA = PlotBBox(hA, L_CLASSES[valYHat], vBHat)

* <font color='red'>(**?**)</font> What would be the results if the generated data had more small ellipses?
* <font color='green'>(**@**)</font> Display the _accuracy_ and _IoU_ scores and _MSE_ and _CE_ loss over the epochs.   
  It will require updating the Loss, Score classes and the training function.

In [None]:
# Encoding

lEnc = []
lY   = []

for tX, vY in dlTrain:
    with torch.inference_mode():
        vZ = oModel.Encoder(tX.to(runDevice))
        vZ = vZ.squeeze().detach().cpu().numpy()
        vY = vY.squeeze().detach().cpu().numpy()
        lEnc.append(vZ)
        lY.append(vY)

In [None]:
mE = np.vstack(lEnc)
vY = np.hstack(lY)

In [None]:
hA = PlotScatterData(mE, vY)

In [None]:
tX = torch.tensor([-20.0, 0.0])
tX = torch.unsqueeze(tX, 0)
tX = tX.to(runDevice)

with torch.inference_mode():
    tZ = oModel.Decoder(tX)
mZ = TensorImageNumpy(tZ)

hF, hA = plt.subplots(nrows = 1, ncols = 1, figsize = (3, 3))

hA.imshow(mZ, cmap = 'gray');

Each image contain only a single class of segment. Hence a model which predicts any class with 