# ONNX API Tutorial

## Introduction

ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It enables interoperability between different ML frameworks, allowing you to train a model in one framework (TensorFlow, PyTorch, scikit-learn) and deploy it in another for optimized inference.

ONNX provides essential tools for production ML deployment:

- **Model Conversion**: Convert models from TensorFlow, PyTorch, and other frameworks to ONNX format
- **Model Verification**: Validate model structure and ensure correctness
- **Optimized Inference**: Run models with ONNX Runtime for 2-5x faster inference
- **Hardware Acceleration**: Support for CPU, GPU, and specialized accelerators
- **Cross-Platform Deployment**: Deploy on cloud, edge devices, mobile, and browsers

This document provides a concise overview of ONNX's core APIs and usage patterns, along with wrapper utilities for time series forecasting.

---

## Imports

In [None]:
import onnx
from onnx import checker, helper, numpy_helper
import onnxruntime as ort
import tf2onnx
import tensorflow as tf
from tensorflow import keras
import numpy as np
import time
import os

print("ONNX version:", onnx.__version__)
print("ONNX Runtime version:", ort.__version__)
print("Available execution providers:", ort.get_available_providers())

2025-12-11 08:09:25.519903: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765458565.535856  130713 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765458565.540607  130713 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-11 08:09:25.557689: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


ONNX version: 1.17.0
ONNX Runtime version: 1.23.2
Available execution providers: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']


---

## Part 1: Native ONNX API

### 1.1 Model Loading and Inspection

The `onnx` package provides the core API for loading and inspecting ONNX models.

`onnx.load()` reads an ONNX model from disk and returns a `ModelProto` object that contains:
- Graph structure (nodes, edges)
- Input/output specifications
- Operator definitions
- Metadata

In [None]:
# Create a simple model for demonstration
# (In practice, you'd convert an existing trained model)

# First, let's create and save a simple Keras model
# Note: use_cudnn=False is required for ONNX compatibility
model = keras.Sequential([
    keras.layers.Input(shape=(10, 5)),
    keras.layers.LSTM(32, use_cudnn=False),
    keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

# Save it
os.makedirs('models', exist_ok=True)
model.save('models/demo_model.keras')

# Note: Keras 3 requires the model to be called at least once before export
dummy_input = np.random.randn(1, 10, 5).astype(np.float32)
_ = model(dummy_input)
model.export('models/demo_model.onnx', format='onnx')
print("Model converted to ONNX successfully")

onnx_model = onnx.load('models/demo_model.onnx')

print("Model loaded successfully")
print(f"IR Version: {onnx_model.ir_version}")
print(f"Producer: {onnx_model.producer_name}")
print(f"Opset Version: {onnx_model.opset_import[0].version}")

I0000 00:00:1765458569.123925  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458570.574366  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458570.574799  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458570.575547  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458570.708100  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458570.750852  130713 devices.cc:67] Nu

Saved artifact at 'models/demo_model.onnx'.
Model converted to ONNX successfully
Model loaded successfully
IR Version: 8
Producer: tf2onnx
Opset Version: 15


I0000 00:00:1765458570.858362  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458570.859840  130713 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled


### 1.2 Model Verification

`onnx.checker.check_model()` validates the ONNX model structure to ensure it conforms to the ONNX specification. This is critical before deployment.

In [None]:
try:
    onnx.checker.check_model(onnx_model)
    print("Model is valid")
except onnx.checker.ValidationError as e:
    print(f"Model is invalid: {e}")

Model is valid


### 1.3 Graph Inspection

The ONNX graph contains nodes (operations) and initializers (weights/parameters). You can inspect the graph structure to understand the model architecture.

In [4]:
# Inspect graph inputs
print("\nGraph Inputs:")
for input_tensor in onnx_model.graph.input:
    print(f"  Name: {input_tensor.name}")
    shape = [dim.dim_value for dim in input_tensor.type.tensor_type.shape.dim]
    print(f"  Shape: {shape}")
    print(f"  Type: {input_tensor.type.tensor_type.elem_type}")

# Inspect graph outputs
print("\nGraph Outputs:")
for output_tensor in onnx_model.graph.output:
    print(f"  Name: {output_tensor.name}")
    shape = [dim.dim_value for dim in output_tensor.type.tensor_type.shape.dim]
    print(f"  Shape: {shape}")

# Inspect nodes (operations)
print(f"\nNumber of nodes: {len(onnx_model.graph.node)}")
print("First 5 operations:")
for i, node in enumerate(onnx_model.graph.node[:5]):
    print(f"  {i+1}. {node.op_type}: {node.name}")


Graph Inputs:
  Name: keras_tensor
  Shape: [0, 10, 5]
  Type: 1

Graph Outputs:
  Name: Identity:0
  Shape: [0, 1]

Number of nodes: 13
First 5 operations:
  1. Transpose: sequential_1/lstm_1/transpose
  2. Shape: sequential_1/lstm_1/Shape
  3. Cast: sequential_1/lstm_1/Shape__45
  4. Slice: sequential_1/lstm_1/strided_slice
  5. Concat: sequential_1/lstm_1/zeros/packed_Concat__61


---

## Part 2: ONNX Runtime API

ONNX Runtime is a high-performance inference engine for ONNX models. It provides optimized execution across different hardware platforms.

### 2.1 Creating an Inference Session

`onnxruntime.InferenceSession()` creates a session for running inference. You can specify:
- **Execution providers**: CPU, CUDA, TensorRT, DirectML, etc.
- **Session options**: Optimization level, thread settings, profiling

In [None]:
session = ort.InferenceSession(
    'models/demo_model.onnx',
    providers=['CPUExecutionProvider']
)

print("Inference session created")
print(f"Providers: {session.get_providers()}")

Inference session created
Providers: ['CPUExecutionProvider']


### 2.2 Getting Input/Output Metadata

The session provides methods to query input and output specifications.

In [None]:
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type

print(f"\nInput name: {input_name}")
print(f"Input shape: {input_shape}")
print(f"Input type: {input_type}")

output_name = session.get_outputs()[0].name
output_shape = session.get_outputs()[0].shape

print(f"\nOutput name: {output_name}")
print(f"Output shape: {output_shape}")


Input name: keras_tensor
Input shape: ['unk__71', 10, 5]
Input type: tensor(float)

Output name: Identity:0
Output shape: ['unk__72', 1]


### 2.3 Running Inference

`session.run()` executes the model on input data. It takes:
- Output names (or None for all outputs)
- Input dictionary mapping input names to numpy arrays

In [None]:
sample_input = np.random.randn(1, 10, 5).astype(np.float32)

outputs = session.run([output_name], {input_name: sample_input})

print(f"\nInput shape: {sample_input.shape}")
print(f"Output shape: {outputs[0].shape}")
print(f"Prediction: {outputs[0][0]}")


Input shape: (1, 10, 5)
Output shape: (1, 1)
Prediction: [0.03550987]


### 2.4 Session Options and Optimization

`SessionOptions` allows you to configure optimization levels and execution settings.

**Graph Optimization Levels:**
- `ORT_DISABLE_ALL`: No optimizations
- `ORT_ENABLE_BASIC`: Basic optimizations (constant folding, redundant node elimination)
- `ORT_ENABLE_EXTENDED`: Extended optimizations (layout optimizations)
- `ORT_ENABLE_ALL`: All optimizations (includes layout transformations)

In [None]:
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.intra_op_num_threads = 4

optimized_session = ort.InferenceSession(
    'models/demo_model.onnx',
    sess_options=sess_options,
    providers=['CPUExecutionProvider']
)

print("Optimized session created")
print(f"Optimization level: ENABLE_ALL")
print(f"Threads: {sess_options.intra_op_num_threads}")

Optimized session created
Optimization level: ENABLE_ALL
Threads: 4


### 2.5 Execution Providers

ONNX Runtime supports multiple execution providers for hardware acceleration.

In [None]:
available_providers = ort.get_available_providers()
print("\nAvailable Execution Providers:")
for provider in available_providers:
    print(f"  - {provider}")

# Create session with preferred provider order
# ONNX Runtime will use the first available provider from the list
preferred_providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']

# Use only providers that are actually available to avoid warnings
available_preferred = [p for p in preferred_providers if p in ort.get_available_providers()]
if not available_preferred:
    available_preferred = ['CPUExecutionProvider']

session_gpu = ort.InferenceSession(
    'models/demo_model.onnx',
    providers=available_preferred
)

print(f"\nActive provider: {session_gpu.get_providers()[0]}")


Available Execution Providers:
  - TensorrtExecutionProvider
  - CUDAExecutionProvider
  - CPUExecutionProvider

Active provider: CPUExecutionProvider


[1;31m2025-12-11 08:09:31.100303120 [E:onnxruntime:Default, provider_bridge_ort.cc:2251 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1844 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcurand.so.10: cannot open shared object file: No such file or directory
[m
[0;93m2025-12-11 08:09:31.100392108 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1013 CreateExecutionProviderFactoryInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.[m


### 2.6 Performance Profiling

ONNX Runtime can profile execution to identify bottlenecks.

In [None]:
sess_options = ort.SessionOptions()
sess_options.enable_profiling = True

profiled_session = ort.InferenceSession(
    'models/demo_model.onnx',
    sess_options=sess_options,
    providers=['CPUExecutionProvider']
)

# Run inference with profiling
for _ in range(10):
    _ = profiled_session.run([output_name], {input_name: sample_input})

prof_file = profiled_session.end_profiling()
print(f"\nProfiling results saved to: {prof_file}")


Profiling results saved to: onnxruntime_profile__2025-12-11_08-09-31.json


---

## Part 3: Model Conversion API

### 3.1 TensorFlow to ONNX Conversion

`tf2onnx` provides the API for converting TensorFlow models to ONNX format.

**Method 1: Using Keras model export (Keras 3)**

In [None]:
keras_model = keras.models.load_model('models/demo_model.keras')

# Call the model once before export (required by Keras 3)
dummy_input = np.random.randn(1, 10, 5).astype(np.float32)
_ = keras_model(dummy_input)

keras_model.export('models/keras_export.onnx', format='onnx')

print("Model exported to ONNX via Keras")

  saveable.load_own_variables(weights_store.get(inner_path))
I0000 00:00:1765458571.292588  130821 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1765458571.428354  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458571.428594  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458571.429251  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458571.556093  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458571.600541  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458571.600770  13

Saved artifact at 'models/keras_export.onnx'.
Model exported to ONNX via Keras


**Method 2: Using tf2onnx programmatically**

Note: `tf2onnx.convert.from_keras()` has compatibility issues with Keras 3. Use Method 1 or convert via SavedModel format instead.

In [12]:
import tf2onnx

# First export to SavedModel
keras_model.export('models/temp_saved_model')

# Then convert SavedModel to ONNX
import subprocess
result = subprocess.run([
    'python', '-m', 'tf2onnx.convert',
    '--saved-model', 'models/temp_saved_model',
    '--output', 'models/tf2onnx_export.onnx',
    '--opset', '13'
], capture_output=True, text=True)

if result.returncode == 0:
    print("Model converted with tf2onnx via SavedModel")
else:
    print("Conversion failed:", result.stderr)

INFO:tensorflow:Assets written to: models/temp_saved_model/assets


INFO:tensorflow:Assets written to: models/temp_saved_model/assets


Saved artifact at 'models/temp_saved_model'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 10, 5), dtype=tf.float32, name='input_layer')
Output Type:
  TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)
Captures:
  131828792251536: TensorSpec(shape=(), dtype=tf.resource, name=None)
  131828792260752: TensorSpec(shape=(), dtype=tf.resource, name=None)
  131828792262096: TensorSpec(shape=(), dtype=tf.resource, name=None)
  131828792262480: TensorSpec(shape=(), dtype=tf.resource, name=None)
  131828792261328: TensorSpec(shape=(), dtype=tf.resource, name=None)
Model converted with tf2onnx via SavedModel


### 3.2 Handling CuDNN Layers

CuDNN-optimized LSTM layers in TensorFlow are not directly compatible with ONNX. The solution is to recreate the model with `use_cudnn=False`.

In [13]:
def convert_keras_to_onnx_without_cudnn(model_path, onnx_path):
    """
    Convert Keras model to ONNX by disabling CuDNN optimization.

    CuDNN LSTM operations are not supported in ONNX, so we recreate
    the model with use_cudnn=False before conversion.
    """
    # Load original model
    original_model = keras.models.load_model(model_path, compile=False)
    config = original_model.get_config()

    # Modify LSTM layers to disable CuDNN
    for layer_config in config['layers']:
        if layer_config['class_name'] == 'LSTM':
            layer_config['config']['use_cudnn'] = False
        elif layer_config['class_name'] == 'Bidirectional':
            if layer_config['config']['layer']['class_name'] == 'LSTM':
                layer_config['config']['layer']['config']['use_cudnn'] = False

    # Recreate model from modified config
    if isinstance(original_model, keras.Sequential):
        new_model = keras.Sequential.from_config(config)
    else:
        new_model = keras.Model.from_config(config)

    # Copy weights
    new_model.set_weights(original_model.get_weights())

    # Build the model by calling it once
    input_shape = new_model.input_shape
    dummy_shape = tuple(1 if dim is None else dim for dim in input_shape)
    dummy_input = np.zeros(dummy_shape, dtype=np.float32)
    new_model(dummy_input)

    # Export to ONNX
    new_model.export(onnx_path, format='onnx')

    return onnx_path

# Use the function
onnx_path = convert_keras_to_onnx_without_cudnn(
    'models/demo_model.keras',
    'models/demo_no_cudnn.onnx'
)
print(f"ONNX model saved: {onnx_path}")

I0000 00:00:1765458578.802209  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458578.802415  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458578.803073  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458578.928417  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458578.970338  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458578.970616  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458578.971851  130713 gpu_device.cc:2022] Created device /job:localhost/replic

Saved artifact at 'models/demo_no_cudnn.onnx'.
ONNX model saved: models/demo_no_cudnn.onnx


I0000 00:00:1765458579.100957  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


---

## Part 4: Wrapper Utilities

To simplify common ONNX operations for time series forecasting, we provide wrapper utilities that abstract away boilerplate code.

### 4.1 Conversion Wrapper

`convert_to_onnx()` wraps the conversion process with automatic CuDNN handling.

In [14]:
def convert_to_onnx(model_path: str, onnx_path: str) -> str:
    """
    Convert Keras model to ONNX format.

    Automatically handles CuDNN LSTM layers by recreating the model
    with use_cudnn=False before conversion.

    Args:
        model_path: Path to Keras model (.keras or .h5)
        onnx_path: Path to save ONNX model

    Returns:
        Path to saved ONNX model
    """
    os.makedirs(os.path.dirname(onnx_path), exist_ok=True)

    if model_path.endswith('.keras'):
        onnx_path = convert_keras_to_onnx_without_cudnn(model_path, onnx_path)

    return onnx_path

# Usage
onnx_model_path = convert_to_onnx(
    model_path='models/demo_model.keras',
    onnx_path='models/wrapper_demo.onnx'
)
print(f"Converted: {onnx_model_path}")

I0000 00:00:1765458579.499945  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458579.500164  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458579.500813  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458579.627611  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1765458579.672496  130713 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765458579.672717  130713 single_machine.cc:361] Starting new session
I0000 00:00:1765458579.673772  130713 gpu_device.cc:2022] Created device /job:localhost/replic

Saved artifact at 'models/wrapper_demo.onnx'.
Converted: models/wrapper_demo.onnx


I0000 00:00:1765458579.793903  130713 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3582 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


### 4.2 Verification Wrapper

`verify_onnx()` wraps the verification process and returns structured results.

In [15]:
def verify_onnx(onnx_path: str) -> dict:
    """
    Verify ONNX model and return structured results.

    Args:
        onnx_path: Path to ONNX model

    Returns:
        Dictionary with validation results:
        - is_valid: bool
        - error: str or None
        - opset_version: int
        - num_nodes: int
    """
    model = onnx.load(onnx_path)

    try:
        onnx.checker.check_model(model)
        is_valid = True
        error = None
    except Exception as e:
        is_valid = False
        error = str(e)

    return {
        'is_valid': is_valid,
        'error': error,
        'opset_version': model.opset_import[0].version,
        'num_nodes': len(model.graph.node)
    }

# Usage
verification = verify_onnx('models/wrapper_demo.onnx')
print("\nVerification Results:")
print(f"  Valid: {verification['is_valid']}")
print(f"  Opset: {verification['opset_version']}")
print(f"  Nodes: {verification['num_nodes']}")


Verification Results:
  Valid: True
  Opset: 15
  Nodes: 13


### 4.3 Inference Session Wrapper

`ONNXInferenceSession` provides a simplified interface for inference with helpful utility methods.

In [16]:
class ONNXInferenceSession:
    """
    Wrapper around onnxruntime.InferenceSession for simplified inference.

    Provides convenient methods for getting input/output shapes and
    running predictions without manually managing input/output names.
    """

    def __init__(self, model_path: str, providers=None):
        """
        Initialize inference session.

        Args:
            model_path: Path to ONNX model
            providers: List of execution providers (default: CPUExecutionProvider)
        """
        if providers is None:
            providers = ['CPUExecutionProvider']

        self.session = ort.InferenceSession(model_path, providers=providers)
        self.input_name = self.session.get_inputs()[0].name
        self.output_name = self.session.get_outputs()[0].name

    def get_input_shape(self):
        """Get expected input shape."""
        return self.session.get_inputs()[0].shape

    def get_output_shape(self):
        """Get output shape."""
        return self.session.get_outputs()[0].shape

    def predict(self, X):
        """
        Run inference on input data.

        Args:
            X: Input numpy array

        Returns:
            Predictions as numpy array
        """
        X = X.astype(np.float32)
        outputs = self.session.run([self.output_name], {self.input_name: X})
        return outputs[0]

# Usage
session = ONNXInferenceSession('models/wrapper_demo.onnx')

print(f"Input shape: {session.get_input_shape()}")
print(f"Output shape: {session.get_output_shape()}")

predictions = session.predict(sample_input)
print(f"Predictions: {predictions[0]}")

Input shape: ['unk__260', 10, 5]
Output shape: ['unk__261', 1]
Predictions: [0.03550987]


### 4.4 Framework Comparison Utility

`compare_frameworks_inference()` benchmarks TensorFlow vs ONNX Runtime.

In [17]:
def compare_frameworks_inference(keras_model_path, onnx_model_path, test_input):
    """
    Compare TensorFlow and ONNX Runtime inference performance.

    Args:
        keras_model_path: Path to Keras model
        onnx_model_path: Path to ONNX model
        test_input: Input data for inference

    Returns:
        Dictionary with comparison results:
        - tensorflow_time: float (seconds)
        - onnx_time: float (seconds)
        - speedup: float
        - max_difference: float
        - mean_difference: float
        - numerically_close: bool
    """
    # Load models
    tf_model = keras.models.load_model(keras_model_path)
    onnx_session = ONNXInferenceSession(onnx_model_path)

    test_input = test_input.astype(np.float32)

    # TensorFlow inference
    start = time.time()
    tf_pred = tf_model.predict(test_input, verbose=0)
    tf_time = time.time() - start

    # ONNX inference
    start = time.time()
    onnx_pred = onnx_session.predict(test_input)
    onnx_time = time.time() - start

    # Compare results
    max_diff = np.max(np.abs(tf_pred - onnx_pred))
    mean_diff = np.mean(np.abs(tf_pred - onnx_pred))
    numerically_close = np.allclose(tf_pred, onnx_pred, rtol=1e-5, atol=1e-6)

    return {
        'tensorflow_time': tf_time,
        'onnx_time': onnx_time,
        'speedup': tf_time / onnx_time,
        'max_difference': max_diff,
        'mean_difference': mean_diff,
        'numerically_close': numerically_close
    }

# Usage
comparison = compare_frameworks_inference(
    keras_model_path='models/demo_model.keras',
    onnx_model_path='models/wrapper_demo.onnx',
    test_input=np.random.randn(100, 10, 5).astype(np.float32)
)

print("\nPerformance Comparison:")
print(f"  TensorFlow: {comparison['tensorflow_time']:.6f}s")
print(f"  ONNX:       {comparison['onnx_time']:.6f}s")
print(f"  Speedup:    {comparison['speedup']:.2f}x")
print(f"  Numerically equivalent: {comparison['numerically_close']}")

  saveable.load_own_variables(weights_store.get(inner_path))



Performance Comparison:
  TensorFlow: 0.428387s
  ONNX:       0.000549s
  Speedup:    780.87x
  Numerically equivalent: True


---

## Summary

### Native ONNX API

| Component | Purpose | Key Methods |
|-----------|---------|-------------|
| `onnx` | Model loading and inspection | `load()`, `checker.check_model()` |
| `onnxruntime` | Optimized inference | `InferenceSession()`, `run()` |
| `tf2onnx` | TensorFlow conversion | `convert.from_keras()` |

### Wrapper Utilities

| Function | Purpose | Benefits |
|----------|---------|----------|
| `convert_to_onnx()` | Model conversion | Automatic CuDNN handling |
| `verify_onnx()` | Model validation | Structured results |
| `ONNXInferenceSession` | Simplified inference | Clean API, no name management |
| `compare_frameworks_inference()` | Benchmarking | Performance and accuracy validation |

### Key Advantages

- **Performance**: 2-5x faster inference than native frameworks
- **Portability**: Deploy on any platform (cloud, edge, mobile)
- **Interoperability**: Convert from any framework
- **Optimization**: Hardware-specific optimizations
- **Production Ready**: Built-in verification and profiling

For a complete end-to-end stock price forecasting example using ONNX, see `onnx_forecasting.example.ipynb`.