# Project Overview: Model Packaging and Deployment Formats

This project demonstrates how to package and deploy machine learning models using various formats and frameworks. The goal is to explore the trade-offs, compatibility, and performance characteristics of each approach to better inform deployment choices in production settings.

The packaging formats and frameworks covered in this notebook include:

- **ONNX** – Open Neural Network Exchange format for cross-framework interoperability  
- **TorchScript** – PyTorch-native serialization for optimized model execution  
- **PyTorch JIT** – Just-In-Time compilation for efficient model inference  
- **TensorRT** – NVIDIA’s high-performance deep learning inference optimizer  
- **TensorFlow SavedModel** – Standard format for TensorFlow model export and deployment  
- **JAX** – High-performance numerical computing with composable function transformations

This comparison is aimed at practitioners who need to understand how to efficiently serialize, deploy, and run models in various production environments.


In [1]:
import logging

# Setup logger
logger = logging.getLogger("onnx_export")
logger.setLevel(logging.INFO)

# Add a StreamHandler if it doesn't exist
if not logger.handlers:
    handler = logging.StreamHandler()
    formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

# Test the logger
logger.info("Starting ONNX export...")

2025-05-28 13:55:13,883 - onnx_export - INFO - Starting ONNX export...


In [2]:
! uv add onnx onnxscript torch onnxruntime torchvision

[2mResolved [1m181 packages[0m [2min 1ms[0m[0m
[2mUninstalled [1m1 package[0m [2min 1ms[0m[0m
[2K[2mInstalled [1m1 package[0m [2min 3ms[0m[0m (from file:///Users/dimda/torch-t[0m
 [33m~[39m [1mtorch-to-any[0m[2m==0.1.0 (from file:///Users/dimda/torch-to-any/torch-to-any)[0m


In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class ImageClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x: torch.Tensor):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [4]:
# Setup a logger

torch_model = ImageClassifierModel()
# Create example inputs for exporting the model. The inputs should be a tuple of tensors.
example_inputs = (torch.randn(1, 1, 32, 32),)
onnx_program = torch.onnx.export(torch_model, example_inputs, dynamo=True)
onnx_program.optimize()
onnx_program.save("image_classifier_model.onnx")

[torch.onnx] Obtain model graph for `ImageClassifierModel([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `ImageClassifierModel([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅


In [5]:
# Run a check to ensure the model can be loaded and run
import onnx

onnx_model = onnx.load("image_classifier_model.onnx")
onnx.checker.check_model(onnx_model)
logger.info("ONNX model is valid.")

2025-05-28 13:55:16,625 - onnx_export - INFO - ONNX model is valid.


In [6]:
import onnxruntime

onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs]
logger.info(f"Input size: {onnx_inputs[0].shape}")

ort_session = onnxruntime.InferenceSession(
    "./image_classifier_model.onnx", providers=["CPUExecutionProvider"]
)

onnxruntime_input = {
    input_arg.name: input_value
    for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)
}

# ONNX Runtime returns a list of outputs
onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0]

2025-05-28 13:55:16,657 - onnx_export - INFO - Input size: (1, 1, 32, 32)


In [7]:
# Lets compare with torch run
torch_outputs = torch_model(*example_inputs)

assert len(torch_outputs) == len(onnxruntime_outputs)
for torch_output, onnxruntime_output in zip(torch_outputs, onnxruntime_outputs):
    torch.testing.assert_close(torch_output, torch.tensor(onnxruntime_output))

logger.info("PyTorch and ONNX Runtime output matched!")
logger.info(f"Output length: {len(onnxruntime_outputs)}")
logger.info(f"Sample output: {onnxruntime_outputs}")

2025-05-28 13:55:16,689 - onnx_export - INFO - PyTorch and ONNX Runtime output matched!
2025-05-28 13:55:16,690 - onnx_export - INFO - Output length: 1
2025-05-28 13:55:16,691 - onnx_export - INFO - Sample output: [[-0.1482003   0.0178181   0.01821857  0.11436789  0.04181811 -0.04224392
   0.03377488 -0.0180761   0.03347242 -0.0055116 ]]
