<a href="https://colab.research.google.com/github/PacktPublishing/Modern-Computer-Vision-with-PyTorch-2E/blob/main/Chapter18/convert_to_onnx.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
try:
  from torch_snippets import *
except:
  %pip install torch-snippets gitPython lovely-tensors
  from torch_snippets import *

from git import Repo

repository_url = 'https://github.com/sizhky/quantization'
destination_directory = '/content/quantization'
if exists(destination_directory):
  repo = Repo(destination_directory)
else:
  repo = Repo.clone_from(repository_url, destination_directory)

%cd {destination_directory}
%pip install -qq -r requirements.txt # this will take about 5 min of time
%pip install onnxruntime-gpu onnx
%pip install -U torchvision
# print(repo.git.pull('origin', 'main'))

# Train

In [None]:
# Change to `Debug=false` in the line below
# to train on a larger dataset
%env DEBUG=true
!make train

env: DEBUG=true
python -m src.defect_classification.train
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100% 528M/528M [00:07<00:00, 78.3MB/s]
Downloading readme: 100% 495/495 [00:00<00:00, 2.57MB/s]
Downloading data: 100% 306M/306M [00:07<00:00, 38.9MB/s]
Downloading data: 100% 305M/305M [00:06<00:00, 46.0MB/s]
Downloading data: 100% 263M/263M [00:06<00:00, 41.7MB/s]
Generating train split: 100% 2331/2331 [00:02<00:00, 1049.98 examples/s]
Generating valid split: 100% 1004/1004 [00:01<00:00, 884.39 examples/s]
Class Balance
 
```↯ AttrDict ↯
train
  non_defect - 50 (int)
  defect - 50 (int)
valid
  non_defect - 50 (int)
  defect - 50 (int)

```

Map: 100% 100/100 [00:19<00:00,  5.22 examples/s]
Map: 100% 100/100 [00:19<00:00,  5.03 examples/s]
Epoch: 1 train_epoch_loss=0.689
Epoch: 11 train_epoch_loss=0.592
Epoch: 21 train_epoch_loss=0.478
Saved model to model.pth


# Conversion to ONNX

In [None]:
sys.path.append('src')
from defect_classification.model import SDD
from defect_classification.train import process_example, DefectsDataset
from datasets import load_dataset

In [None]:
# Load the dataset
val_ds = load_dataset('sizhkhy/kolektor_sdd2', split="valid[:50]+valid[-50:]")
val_ds = val_ds.map(process_example).remove_columns(['split', 'path'])
val_ds.set_format("pt", columns=["image", "label"], output_all_columns=True)
val_ds = DefectsDataset(val_ds)
val_dl = DataLoader(val_ds, batch_size=32, shuffle=True, drop_last=True)

In [None]:
# Load the model
device = 'cpu'
model = torch.load('model.pth').to(device)

Before converting our model into ONNX format, we will now create an input tensor and use it to make predictions. There are two primary motivations for this exercise.

Firstly, by comparing the outputs of the PyTorch model with those of the ONNX model, we can verify that both models produce identical results. This ensures that our conversion from PyTorch to ONNX has been successful
and that we can rely on the converted model for inference purposes.

Secondly, by measuring the time taken by each model to generate predictions, we can compare their performance in terms of speed and efficiency. This information will be useful in determining which model is better
suited for deployment in a production environment, where rapid processing times are critical.

In the following cell, we will create an input tensor and use it to predict using the PyTorch model. We will then compare this outputs and timing results to draw conclusions about it's performance with ONNX in a few cells below.

In [None]:
# prompt: export to onnx with dynamic axes
model.eval()
i, _ = next(iter(val_dl))
with torch.no_grad():
    # first prediction is model warmup
    model(i.to(device))
    print(f'Time taken by pytorch model on sample input')
    %time pred_pytorch_model = model(i.to(device))
    pred_pytorch_model = pred_pytorch_model.to(device).numpy().reshape(-1)


Time taken by pytorch model on sample input
CPU times: user 15.2 s, sys: 2.81 s, total: 18 s
Wall time: 18 s


Let's convert the model in to ONNX format - 

Specifying input and output names for the model. By specifying input and output names, we're a clear understanding of what data types are expected as inputs and outputs for the model. This makes it easier for other frameworks or tools to consume the exported model, regardless of their native data type representations. The input and output names also help provide a common language for all frameworks to understand and work with the model.
  - input_names = ['image']: This sets the input name(s) of the model. In this case, there is only one input named 'image'.
  - output_names = ['label']: This sets the output name(s) of the model. Again, there is only one output named 'label'.

Defining dynamic axes

The dynamic_axes dictionary defines dynamic axes for the ONNX model. In this case:
  - {0: 'batch_size'}: This specifies that the first axis (axis 0) of both the input and output tensors should be labeled as 'batch_size'. Dynamic axes are used to specify axes that will have varying sizes depending on the batch size.

Specifying the ONNX file path
  - onnx_file_path = 'sdd_base.onnx': This sets the file name and path for the exported ONNX model, which will be named 'sdd_base.onnx'.

The final line exports the PyTorch model to the specified ONNX file:
  - torch.onnx.export(model, i[:1].to(device), f, export_params=True, verbose=False, opset_version=13, do_constant_folding=True, input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes)

Here's what each argument does:
  - model: This is the PyTorch model to be exported.
  - i[:1].to(device): This specifies an example input tensor(s) for the export. This allows the export process to generate a more accurate and complete ONNX graph structure.
  - f: This is the file object opened in write binary mode ('wb') earlier.
  - export_params=True: This tells PyTorch to export the model's parameters along with the graph structure.
  - verbose=False: This sets the verbosity level for the export process. In this case, it will not print any output.
  - opset_version=13: This specifies the Open Neural Network Exchange (ONNX) opset version that will be used for the exported model. Opsets are used to specify the version of the ONNX format being used.
  - do_constant_folding=True: This tells PyTorch to identify constant values in the model, and pack the architecture in a way which can reduce the size of the exported graph.
  - input_names=input_names, output_names=output_names, and dynamic_axes=dynamic_axes : These specify the input names, output names, and dynamic axes for the exported ONNX model, respectively.

In [None]:
input_names = ['image']
output_names = ['label']
dynamic_axes = {'image': {0: 'batch_size'}, 'label': {0: 'batch_size'}}
onnx_file_path = 'sdd_base.onnx'
with open(onnx_file_path, 'wb') as f:
    torch.onnx.export(
        model,
        i[:1].to(device),
        f,
        export_params=True,
        verbose=False,
        opset_version=13,
        do_constant_folding=True,
        input_names=input_names,
        output_names=output_names,
        dynamic_axes=dynamic_axes
    )


Now that we have an onnx model, let's load it, predict and measure the time for prediction. 

Loading the ONNX model on GPU
The next line loads the ONNX model sdd_base.onnx using the InferenceSession constructor. The providers parameter is set to ['CPUExecutionProvider'], which means that the model will be executed on the CPU (not GPU, as specified).

Getting input and output names
The following two lines get the input and output names from the loaded ONNX model:
  - input_name = session.get_inputs()[0].name: This gets the name of the first input tensor in the model.
  - output_name = session.get_outputs()[0].name: This gets the name of the first output tensor in the model.

Preparing sample input data
The next line prepares a sample input tensor for making predictions:
  - input = i.numpy(): Assuming i is a PyTorch tensor, this line converts it to a NumPy array, which can be used as input to the ONNX model.

Making the first prediction (model warmup)
The final lines make the first prediction using the loaded ONNX model:
  - pred_onnx = session.run(None, {input_name: input})[0]: This runs the ONNX model on the prepared input data. The None argument indicates that there are no additional inputs to provide. The input_name: input dictionary maps the input name to the prepared input data.

Warming up a model involves running it with sample inputs to initialize its internal state so that the subsequet predictions are consistently faster.

In [None]:
from onnxruntime import InferenceSession
# load the onnx model on gpu
session = InferenceSession('sdd_base.onnx', providers=['CPUExecutionProvider'])
# make sample prediction
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

input = i.numpy()

# first prediction is model warmup
pred_onnx = session.run(None, {input_name: input})[0]
print(f'Time taken by ONNX model on same input')
%time pred_onnx = session.run(None, {input_name: input})
pred____onnx_model = pred_onnx[0].reshape(-1)

Time taken by ONNX model on same input
CPU times: user 13.7 s, sys: 468 ms, total: 14.1 s
Wall time: 15.3 s


In [None]:
print('Both the pytorch and onnx model\'s predictions are identical - ')
np.allclose(
    pred_pytorch_model,
    pred____onnx_model,
)

Both the pytorch and onnx model's predictions are identical - 


True