# 🚀 ONNX Optimization Benchmark

This notebook benchmarks the performance of the ONNX runtime for the KubeSentiment project.

## 🎯 Learning Objectives

By the end of this notebook, you will:
1. Understand the benefits of ONNX for model inference.
2. Learn how to convert a PyTorch model to ONNX.
3. Benchmark the performance of the ONNX runtime against the PyTorch runtime.

## 📦 Setup and Dependencies

First, let's install the required dependencies and set up our environment.

In [None]:
# Install required packages for this notebook
!pip install -r ../requirements.txt

### ✅ Version Check
Let's check the versions of the installed libraries to ensure our environment is reproducible.

In [None]:
# List installed packages to ensure reproducibility
!pip list

## 🤖 What is ONNX?

ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It allows you to convert models between different frameworks, such as PyTorch, TensorFlow, and Keras. ONNX also provides a runtime that can execute models in a highly optimized way.

### Converting the Model to ONNX

The first step is to convert the PyTorch model to the ONNX format. We can do this using the `torch.onnx.export()` function.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

# Create a dummy input for the model
dummy_input = tokenizer("This is a test sentence.", return_tensors="pt")

# Export the model to ONNX
torch.onnx.export(
    model,
    (dummy_input['input_ids'], dummy_input['attention_mask']),
    "sentiment_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["logits"],
    dynamic_axes={
        "input_ids": {0: "batch_size", 1: "sequence"},
        "attention_mask": {0: "batch_size", 1: "sequence"},
        "logits": {0: "batch_size"},
    },
    opset_version=11,
)

### Benchmarking the ONNX Runtime

Now that we have the model in ONNX format, we can benchmark its performance against the PyTorch runtime.

In [None]:
import time
import onnxruntime as ort
import numpy as np

# Create an ONNX session
ort_session = ort.InferenceSession("sentiment_model.onnx")

# Prepare the input
input_ids = dummy_input["input_ids"].numpy()
attention_mask = dummy_input["attention_mask"].numpy()

# Benchmark the PyTorch runtime
pytorch_times = []
for _ in range(100):
    start_time = time.time()
    with torch.no_grad():
        model(**dummy_input)
    pytorch_times.append(time.time() - start_time)

# Benchmark the ONNX runtime
onnx_times = []
for _ in range(100):
    start_time = time.time()
    ort_session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
    onnx_times.append(time.time() - start_time)

# Compare the results
print(f"PyTorch average inference time: {np.mean(pytorch_times) * 1000:.2f} ms")
print(f"ONNX average inference time: {np.mean(onnx_times) * 1000:.2f} ms")