<h1>Inference o produkci<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Lightning-Module" data-toc-modified-id="Lightning-Module-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Modul Lightning</a></span></li><li><span><a href="#Get-the-Checkpoint" data-toc-modified-id="Get-the-Checkpoint-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get the Checkpoint</a></span></li><li><span><a href="#Convert-to-ONNX-Format" data-toc-modified-id="Convert-to-ONNX-Format-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Převedeme do formátu ONNX</a></span></li><li><span><a href="#Sample-Inference" data-toc-modified-id="Sample-Inference-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Ukázka závěru</a></span></li><li><span><a href="#References" data-toc-modified-id="References-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>References</a></span></li></ul></div>

1. Předpokládejme, že váš tým pracuje na projektu, kde potřebujete pracovat na některých problémech ML.
2. Co když si vyberete jeden problém a vyřešíte ho pomocí rámce PyTorch, zatímco váš kolega udělá to samé pomocí Tensorflow.
3. Oba problémy, o kterých víme, jsou součástí většího projektu. Nyní, jak dospět ke společnému formátu pro sdílení modelů ML.


<a target="_blank" href="https://onnx.ai/">ONNX: Open Neural Network Exchange</a> je jeden takový otevřený formát, který umožňuje výměnu modelů mezi různými <a target="_blank" href="https://onnx.ai/supported-tools">ML frameworky a nástroji</a>.


**V tomto notebooku uvidíme, jak převést uložený kontrolní bod PyTorch Lightning na model ONNX. Vezměme si příklad kontrolního bodu uloženého notebookem tréninku na MNIST.**

## ***Instalace knihovny [PyThorch Lightning](https://lightning.ai/docs/pytorch/stable/)***

In [1]:
!pip install lightning
!pip install onnx
!pip install onnxruntime


Collecting lightning
  Downloading lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Downloading lightning-2.4.0-py3-none-any.whl (810 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m811.0/811.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: lightning
Successfully installed lightning-2.4.0
Collecting onnxruntime
  Downloading onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting coloredlogs (from onnxruntime)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Downloading onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.2/13.2 MB[0m [31m61.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading 

## ***Stáhnutí souborů s modelem***

In [2]:
!mkdir /kaggle/working/lightning_logs

!wget "https://raw.githubusercontent.com/RadimKozl/OpenCV_academy_my_work/refs/heads/main/lightning_logs.zip" -O /kaggle/working/lightning_logs/lightning_logs.zip

!ls /kaggle/working/

!unzip /kaggle/working/lightning_logs/lightning_logs.zip -d /kaggle/working/lightning_logs/

!rm /kaggle/working/lightning_logs/lightning_logs.zip

--2024-10-04 18:41:10--  https://raw.githubusercontent.com/RadimKozl/OpenCV_academy_my_work/refs/heads/main/lightning_logs.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 234749 (229K) [application/zip]
Saving to: '/kaggle/working/lightning_logs/lightning_logs.zip'


2024-10-04 18:41:10 (4.06 MB/s) - '/kaggle/working/lightning_logs/lightning_logs.zip' saved [234749/234749]

lightning_logs
Archive:  /kaggle/working/lightning_logs/lightning_logs.zip
   creating: /kaggle/working/lightning_logs/version_0/
   creating: /kaggle/working/lightning_logs/version_0/checkpoints/
  inflating: /kaggle/working/lightning_logs/version_0/events.out.tfevents.1728052476.5246e9c3bd7b.30.0  
  inflating: /kaggle/working/lightning_logs/version_0/hparams.yaml  
  inflating: /ka

In [3]:
import onnxruntime
import pytorch_lightning as pl
from torchmetrics import Accuracy
from torchmetrics import MeanMetric
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import warnings
warnings.filterwarnings("ignore", category=UserWarning)  # filter UserWarning

torch.multiprocessing.set_start_method('spawn', force=True)

## Modul Lightning

Modul Lighting nám poskytuje definici modelu pro načtení modelu z kontrolních bodů.

In [4]:
class LeNet5(pl.LightningModule):  # here nn.Module is replaced by LightningModule
    def __init__(self, learning_rate=0.01, num_classes=10):
        super().__init__()

        # Save the arguments as hyperparameters.
        self.save_hyperparameters()
        self.num_classes = num_classes

        # convolution layers
        self._body = nn.Sequential(
            # First convolution Layer
            # input size = (32, 32), output size = (28, 28)
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
            # ReLU activation
            nn.ReLU(inplace=True),
            # Max pool 2-d
            nn.MaxPool2d(kernel_size=2),

            # Second convolution layer
            # input size = (14, 14), output size = (10, 10)
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
            # output size = (5, 5)
        )

        # Fully connected layers
        self._head = nn.Sequential(
            # First fully connected layer
            # in_features = total number of weights in last conv layer = 16 * 5 * 5
            nn.Linear(in_features=16 * 5 * 5, out_features=120),

            # ReLU activation
            nn.ReLU(inplace=True),

            # second fully connected layer
            # in_features = output of last linear layer = 120
            nn.Linear(in_features=120, out_features=84),

            # ReLU activation
            nn.ReLU(inplace=True),

            # Third fully connected layer. It is also the output layer
            # in_features = output of last linear layer = 84
            # and out_features = number of classes = 10 (MNIST data 0-9)
            nn.Linear(in_features=84, out_features=self.num_classes))

        acc_obj = Accuracy(task="multiclass", num_classes=self.num_classes)
        # use .clone() so that each metric can maintain its own state
        self.train_acc = acc_obj.clone()
        self.valid_acc = acc_obj.clone()

        # Using average meter to accumulate losses and get mean of the metrics
        average_meter = MeanMetric()
        self.train_loss = average_meter.clone()
        self.valid_loss = average_meter.clone()

    def forward(self, x):
        # apply feature extractor
        x = self._body(x)
        # flatten the output of conv layers
        # dimension should be batch_size * number_of weights_in_last conv_layer
        x = x.view(x.size()[0], -1)
        # apply classification head
        x = self._head(x)
        return x

    def on_train_epoch_start(self):
        super().on_train_epoch_start()

        # Reset state variables for train metrics to 
        # their default values before start of each epoch
    
        self.train_acc.reset()
        self.train_loss.reset()

    def on_validation_epoch_start(self):
        super().on_validation_epoch_start()
        
        # Reset state variables for validation metrics to 
        # their default values before start of each epoch
        
        self.valid_acc.reset()
        self.valid_loss.reset()
        
    def training_step(self, batch, batch_idx):

        # get data and labels from batch
        data, target = batch

        # get prediction
        output = self(data)

        # calculate batch loss
        loss = F.cross_entropy(output, target)

        # get probability score using softmax
        prob = F.softmax(output, dim=1)

        # get the index of the max probability
        pred = prob.data.max(dim=1)[1]

        # Using Module API
        # calculate and accumulate batch accuracy
        acc = self.train_acc(pred, target)

        # accumulate batch loss
        self.train_loss(loss)
        # # -----------------

        # LOG METRICS to a logger. Default: Tensorboard
        self.log("train/batch_loss", loss, prog_bar=False)

        # logging and adding current batch_acc to progress_bar
        self.log("train/batch_acc", acc, prog_bar=True)

        # Using Module API, we only need to return the loss
        return loss
       
    def training_epoch_end(self, training_step_outputs):
        # Using Module API
        # Compute epoch loss and accuracy
        avg_train_loss = self.train_loss.compute()
        avg_train_acc = self.train_acc.compute()
        # # -----------------

        self.log("train/loss", avg_train_loss, prog_bar=True)
        self.log("train/acc", avg_train_acc, prog_bar=True)
        # Set X-axis as epoch number for epoch-level metrics
        self.log("step", self.current_epoch)

    def validation_step(self, batch, batch_idx):

        # get data and labels from batch
        data, target = batch

        # get prediction
        output = self(data)

        # calculate loss
        loss = F.cross_entropy(output, target)

        # get probability score using softmax
        prob = F.softmax(output, dim=1)

        # get the index of the max probability
        pred = torch.argmax(prob, dim=1)
        
        # Using Module API
        # accumulate validation accuracy and loss
        self.valid_acc(pred, target)
        self.valid_loss(loss)
        
        
    def validation_epoch_end(self, validation_step_outputs):
        # Using Module API
        avg_val_loss = self.valid_loss.compute()
        avg_val_acc = self.valid_acc.compute()
        
        self.log("valid/acc", avg_val_acc, prog_bar=True)
        self.log("valid/loss", avg_val_loss, prog_bar=True)
        # use epoch as X-axis
        self.log("step", self.current_epoch)

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(),
                               lr=self.hparams.learning_rate)

## Získejte kontrolní bod

Načteme jeden z kontrolních bodů uložených při posledním tréninku.

Napsali jsme pro něj pomocnou funkci. Vezme adresář protokolu tréninku PyTorch Lighting a spustí číslo verze, aby vrátil odpovídající cestu `.ckpt`.

Tuto funkci budete znát z poslední lekce.

In [5]:
import os

def get_latest_run_version_ckpt_epoch_no(lightning_logs_dir='/kaggle/working/lightning_logs', run_version=None):
    if run_version is None:
        run_version = 0
        for dir_name in os.listdir(lightning_logs_dir):
            if 'version' in dir_name:
                if int(dir_name.split('_')[1]) > run_version:
                    run_version = int(dir_name.split('_')[1])
                
    checkpoints_dir = os.path.join(lightning_logs_dir, 'version_{}'.format(run_version), 'checkpoints')
    
    files = os.listdir(checkpoints_dir)
    ckpt_filename = None
    for file in files:
        if file.endswith('.ckpt'):
            ckpt_filename = file
        
    if ckpt_filename is not None:
        ckpt_path = os.path.join(checkpoints_dir, ckpt_filename)
    else:
        print('CKPT file is not present')
    
    return ckpt_path

**Získejte cestu modelu `.ckpt`.**

In [6]:
# get checkpoint path
ckpt_path = get_latest_run_version_ckpt_epoch_no(run_version=0)
print('ckpt_path: {}'.format(ckpt_path))

ckpt_path: /kaggle/working/lightning_logs/version_0/checkpoints/ckpt_009.ckpt


## Převést do formátu ONNX

Napsali jsme funkci pro převod modelu `.ckpt` na model `.onnx`. 

Funkce bere jako argumenty definici modelu, cestu `.ckpt` a cestu k souboru `.onnx`. A převeďte soubor `.ckpt` na `.onnx` a vraťte cestu `.onnx`. 

Zjistili jsme, že `input_sample` se používá s metodou konverze `.ckpt` na `.onnx` `to_onnx`. Tento vzorový vstup fixuje vstupní velikost a zavazuje nás, abychom ji použili v době odvození.

Získejte podrobnosti <a target="_blank" href="https://pytorch-lightning.readthedocs.io/en/stable/common/production_inference.html">here</a>.

In [7]:
def convert_to_onnx_model(model_class, ckpt_path, onnx_path=None):
    
    # ONNX filename
    if onnx_path is None:
        onnx_path = ckpt_path[:-4] + 'onnx'
        
    # Load the checkpoint
    ckpt_model = model_class.load_from_checkpoint(ckpt_path)
    
    # Freeze the network
    ckpt_model.freeze()
    
    ckpt_model.eval()
    
    # Add a sample input. Here input shape = (batch_size, num_channel, height, width)
    input_sample = torch.randn((1, 1, 32, 32))
    
    # convert to ONNX model
    ckpt_model.to_onnx(onnx_path, input_sample, export_params=True)
    
    return onnx_path

**Převeďte `.ckpt` na `.onnx`.**

In [8]:
# initiate the model
model = LeNet5()

# convert the checkpoint to onnx format
onnx_model_path = convert_to_onnx_model(LeNet5, ckpt_path)
print('onnx_model_path: {}'.format(onnx_model_path))

onnx_model_path: /kaggle/working/lightning_logs/version_0/checkpoints/ckpt_009.onnx


## Ukázka závěru

**Kroky pro odvození s modelem `.onnx`:**

- Zahajte relaci. Jedná se o jednorázovou operaci.

- Získejte název vstupu z relace. Opět jednorázová operace.

- Připravte vstup.

- Spusťte relaci se vstupem.

In [9]:
import numpy as np

# init a session
sess = onnxruntime.InferenceSession(onnx_model_path)

# get input name from session
input_name = sess.get_inputs()[0].name

# prepare inputs
inputs = {input_name: np.random.randn(1, 1, 32, 32).astype(np.float32)}

# get output
outputs = sess.run(None, inputs)

print(outputs)

[array([[-0.3288347 , -1.1747322 ,  0.5060428 , -0.27771673,  0.2461939 ,
        -0.70036477, -0.47910607,  0.8947681 ,  0.10573294,  0.03755153]],
      dtype=float32)]


## Reference


1. <a target="_blank" href="https://pytorch-lightning.readthedocs.io/en/stable/common/production_inference.html">https://pytorch-lightning.readthedocs.io/en/stable/common/production_inference.html</a>
2. <a target="_blank" href="https://docs.microsoft.com/en-us/windows/ai/windows-ml/get-onnx-model">https://docs.microsoft.com/en-us/windows/ai/windows-ml/get-onnx-model</a>
3. <a target="_blank" href="https://onnx.ai/">https://onnx.ai/</a>
