### Convert models to ONNX
This notebook shows how to convert trained `sklearn` and `lightgbm` models to `.onnx` format. <br>
Note that we need to use different convert packages for different models, for instance:
- `sklearn` models -> `skl2onnx` -> `onnx`
- `lightgbm` models -> `onnxmltools` -> `onnx`

First let's load some testing data (ERA5 monthly) and train a random forest model & a lightgbm model using `pycaret`.

In [2]:
from excited_workflow.source_datasets import datasets

ds_era5 = datasets["era5_monthly"].load(freq="monthly")

In [3]:
# simply for a demo purpose, just select a small subset of data
region_na = {
    "time": slice("2011-01", "2020-12"),
    "latitude": slice(35, 45),
    "longitude": slice(-100, -80),
}

ds_na = ds_era5.sel(region_na)
ds_na = ds_na.compute()

In [4]:
# convert to pandas dataframe
df_train = ds_na.to_dataframe().dropna().reset_index()
df_train.head(3)

Unnamed: 0,longitude,latitude,time,d2m,mslhf,msshf,sp,ssr,str,t2m,tp,tvh,tvl
0,-100.0,35.0,2011-01-01,265.97821,-9.946487,-23.942078,94947.640625,9983198.0,-6937689.0,275.272034,0.000137,0.0,2.000061
1,-100.0,35.0,2011-02-01,267.362335,-16.657578,-33.071938,94850.507812,11984707.0,-7182347.0,277.808563,0.000757,0.0,2.000061
2,-100.0,35.0,2011-03-01,273.300537,-31.519485,-54.177166,94689.757812,16281364.0,-8147597.5,285.099396,0.000285,0.0,2.000061


In [5]:
X_keys = ["d2m", "mslhf", "msshf", "ssr", "str", ]
y_key = "t2m"

df_pycaret = df_train[X_keys + [y_key]]
df_reduced = df_pycaret[::10]

import pycaret.regression
pycs = pycaret.regression.setup(df_reduced, target=y_key)
best = pycs.compare_models(include=["rf", "lightgbm"], n_select=2, round=2)

Unnamed: 0,Description,Value
0,Session id,7581
1,Target,t2m
2,Target type,Regression
3,Original data shape,"(39852, 6)"
4,Transformed data shape,"(39852, 6)"
5,Transformed train set shape,"(27896, 6)"
6,Transformed test set shape,"(11956, 6)"
7,Numeric features,5
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
rf,Random Forest Regressor,0.25,0.14,0.37,1.0,0.0,0.0,3.57
lightgbm,Light Gradient Boosting Machine,0.37,0.25,0.5,1.0,0.0,0.0,0.16


Pycaret supports ONNX and model trained with pycaret can be converted to an ONNX model easiliy following this tutorial:
https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/deploy-pycaret-models-on-edge-with-onnx-runtime

With `skl2onnx` we can easily convert our random forest regressor (`sklearn` model) to onnx model.

More details can be found in the API summary:
https://onnx.ai/sklearn-onnx/api_summary.html#skl2onnx.to_onnx

In [5]:
rfr = best[0]
rfr

In [6]:
# convert model from pycaret to onnx
from skl2onnx import to_onnx
X_sample = pycs.get_config('X_train')[:1]
model_onnx = to_onnx(rfr, X_sample.to_numpy())

In [None]:
# save model
with open("./rfr.onnx", "wb") as f:
    f.write(model_onnx.SerializeToString())

In [7]:
# generate inference with onnx model
from onnxruntime import InferenceSession
sess = InferenceSession(model_onnx.SerializeToString())
X_test = pycs.get_config('X_test').to_numpy()
predictions_onnx = sess.run(None, {'X': X_test})[0]

Verify saved model.

In [8]:
import numpy as np
predictions_best = rfr.predict(X_test)
np.allclose(predictions_onnx[:, 0], predictions_best, equal_nan=True)

True

For `lightGBM` model, we need `onnxmltools` to convert it to onnx. <br>

Note that `onnxmltools` is only compatible with `lightgbm<=3.3.5`.

In [9]:
lightgbm = best[1]
lightgbm

Note that the user needs to tell the converter the initial type of input. For instance,
`initial_types=[('X', FloatTensorType([None, X_test.shape[1]]))])`
indicates that input name will be `X`, input type is `FloatTensorType`, the batch size is unknown (`None`) and the number of channels are `X_test.shape[1]`.

In [10]:
import onnxmltools
from skl2onnx.common.data_types import FloatTensorType

# Convert the LightGBM model into ONNX
# The initial_types argument is a python list.
# Each element is a tuple of a variable name and a type defined
#  in onnxconverter_common/data_types.py
lightgbm_onnx = onnxmltools.convert_lightgbm(lightgbm,
                                             initial_types=[('X', FloatTensorType([None, X_test.shape[1]]))])

The maximum opset needed by this model is only 8.


In [None]:
# save model
with open("./lightgbm.onnx", "wb") as f:
    f.write(lightgbm_onnx.SerializeToString())

In [11]:
sess = InferenceSession(lightgbm_onnx.SerializeToString())
predictions_onnx = sess.run(None, {'X': X_test})[0]

Verify saved model.

In [16]:
predictions_lightgbm = lightgbm.predict(X_test)
np.allclose(predictions_onnx[:, 0], predictions_lightgbm, atol=1e-2, equal_nan=True)

True