### ONNX - Open Neural Network Exchange

ONNX is a format for saving model for interoperability and performance. It allows models built on one framework to be run on any machine using the ONNX runtime. With ONNX models built with pytorch, scikit learn or tensorflow can be shared and run in any environment. This notebook shows how to convert different ML models to ONNX. And compares inference with onnxruntime and python scikit learn runtime.

In [None]:
!pip install skl2onnx onnxruntime pandas numpy joblib


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m



### Standard Scitikit-learn models example using SVC

In [2]:
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx import convert_sklearn
import joblib # The model was saved using joblib so we need to use joblib to load it
import onnxruntime as rt

skl2onnx.get_latest_tested_opset_version()

21

 Load trained model and save in ONNX format
The model is a pipeline consisting of a StandardScaler and a SVC classifier

In [3]:

with open("models/SVC_model.pkl", "rb") as f:
    model = joblib.load(f)


# The input expects a dictionary with 18 fields which
# are converted to a datafrae before passing to the model
initial_type = [("input", FloatTensorType([None, 18]))]  

# The option zipmap=False is needed to disable a dictionary output. Instead
# the output should be a list of predictions. This is important for rust onnxruntime
# to be able to parse the model.
# raw_types=True is to ensure that the model weight raw type is preserved and not coerced into 
# onnxruntime types.
model_onnx = convert_sklearn(
    model,
    "pipeline_svc",
    [("input", FloatTensorType([None, 18]))],
    target_opset=14,
    options={"zipmap": False, "raw_types": True},
)

# And save.
with open("onnx_models/pipeline_svc.onnx", "wb") as f:
    f.write(model_onnx.SerializeToString())

with open("../inference_server/onnx_models/pipeline_svc.onnx", "wb") as f:
    f.write(model_onnx.SerializeToString())

print("Model converted to ONNX.")
model_onnx

Model converted to ONNX.


ir_version: 7
producer_name: "skl2onnx"
producer_version: "1.18.0"
domain: "ai.onnx"
model_version: 0
doc_string: ""
graph {
  node {
    input: "input"
    output: "variable"
    name: "Scaler"
    op_type: "Scaler"
    attribute {
      name: "offset"
      floats: 41.910767
      floats: 0.20064285
      floats: 0.20182857
      floats: 0.19885714
      floats: 0.19875714
      floats: 0.19991429
      floats: 0.07495714
      floats: 0.0392
      floats: 27.327028
      floats: 5.528383
      floats: 138.13716
      floats: 0.41451427
      floats: 0.00015714286
      floats: 0.0926
      floats: 0.040114287
      floats: 0.09314286
      floats: 0.3501
      floats: 0.06472857
      type: FLOATS
    }
    attribute {
      name: "scale"
      floats: 0.04439394
      floats: 2.4969952
      floats: 2.4914982
      floats: 2.5053847
      floats: 2.5058584
      floats: 2.500402
      floats: 3.797629
      floats: 5.1527667
      floats: 0.15104596
      floats: 0.9331418
      fl

### Run the model in a ONNX runtime

In [5]:
import pandas as pd
import numpy as np

test_false =  {
            "age": 32.0,
            "race_african_american": 0,
            "race_asian": 0,
            "race_caucasian": 0,
            "race_hispanic": 0,
            "race_other": 1,
            "hypertension": 0,
            "heart_disease": 0,
            "bmi": 27.32,
            "hbA1c_level": 5.0,
            "blood_glucose_level": 100,
            "gender_Male": False,
            "gender_Other": False,
            "smoking_history_current": False,
            "smoking_history_ever": False,
            "smoking_history_former": False,
            "smoking_history_never": True,
            "smoking_history_not_current": False
    }

test_true =  {
      "age": 56.0,
      "race_african_american": 0,
      "race_asian": 1,
      "race_caucasian": 0,
      "race_hispanic": 0,
      "race_other": 0,
      "hypertension": 0,
      "heart_disease": 0,
      "bmi": 27.32,
      "hbA1c_level": 7.5,
      "blood_glucose_level": 155,
      "gender_Male": True,
      "gender_Other": False,
      "smoking_history_current": False,
      "smoking_history_ever": False,
      "smoking_history_former": False,
      "smoking_history_never": False,
      "smoking_history_not_current": True
}


session = rt.InferenceSession("onnx_models/pipeline_svc.onnx", providers=["CPUExecutionProvider"])

# Get model input details
input_name = session.get_inputs()[0].shape  # Input layer name
output_name = session.get_outputs()[0].shape  # Output layer name

print(input_name)
print(output_name)


[None, 18]
[None]


In [57]:
# Check input details
for input_tensor in session.get_inputs():
    print(f"Input Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

Input Name: input, Data Type: tensor(float), Shape: [None, 18]


In [8]:
## Lets test the model with some test data - Expects negative result
# Create test input as a Pandas DataFrame
df = pd.DataFrame([test_false])

# Convert DataFrame to NumPy (recommended)
X_test = df.to_numpy(dtype=np.float32)
X_test

array([[ 32.  ,   0.  ,   0.  ,   0.  ,   0.  ,   1.  ,   0.  ,   0.  ,
         27.32,   5.  , 100.  ,   0.  ,   0.  ,   0.  ,   0.  ,   0.  ,
          1.  ,   0.  ]], dtype=float32)

### Test predictions with onnxruntime on 100k rows

Onnx runtime is able to make inference on 100k rows in ~25 seconds.

In [6]:
### Load csv
import pandas as pd
df = pd.read_csv('assets/processed_data.csv')
df.drop(columns=['diabetes'], inplace=True)
X_test = df.to_numpy(dtype=np.float32)
session = rt.InferenceSession("onnx_models/pipeline_svc.onnx", providers=["CPUExecutionProvider"])
pred_onx = session.run(None, {"input": X_test})
print("predict", pred_onx[0])
print("predict_proba", pred_onx[1][:1])



predict [0 0 0 ... 0 0 0]
predict_proba [[0.96662086 0.03337909]]


## Test Prediction using joblib pickled model on 100k rows

With Joblib pickled model, the inference took 55 to 60 seconds. 

In [9]:
### Load csv
import pandas as pd
df = pd.read_csv('assets/processed_data.csv')
df.drop(columns=['diabetes'], inplace=True)
X_test = df.to_numpy(dtype=np.float32)

model = joblib.load('models/SVC_model.pkl')
pred = model.predict(X_test)

print("predict", pred[0])
print("predict_proba", pred[1])
pred



predict 0
predict_proba 0


array([0, 0, 0, ..., 0, 0, 0])

## Discussion

This notebook tries to compare inference speed between onnx format models and traditional pickled models. From the previous cells we can see that ONNX models run in ONNX runtime performs better than .pkl models. ONNX (Open neural network exchange) runtime is 2.5 times faster (on 16GB, 4 cpu machine )than traditional python env in making inference. This is because ONNX runtime is multithreaded by default and leverages on all the cpu cores in making inference as its able to bypass the python GIL.


It is expected that ONNX would be much more faster on a more compute resource (say 20 core) machine or GPU. ONNX models can also be run on GPU (CUDA) for accelerated compute.

In the rust crate here in, similar results was seen using the same support vector classifier model on Rust onnxruntime. There is no significant difference between running the onnx model on rust and on python onnxruntime. 



## Conclusion

ONNX runtime is 2.5 times faster than python-scikit learn runtime on Ubuntu 22.04, 16GB RAM , 4 CPU machine. There is no signicant difference between running onnx model in rust ONNX runtime and python onnx runtime. 

### Reference

- https://onnx.ai/sklearn-onnx/introduction.html
- https://github.com/microsoft/onnxruntime