## BYOM In-Vantage Scoring with PMML and ONNX

In this notebook, we will show you how to work with the Bring Your Own Model (BYOM) pattern and BYOM In-Vantage Scoring. This pattern allows you to use whatever data science platform you want to perform model development and experimentation. You can use the vast majority of popular data science libraries and transformations. The only constraint is that you can convert it to one of the following open formats

- ONNX
- PMML
- H2O (MOJO)
- H2O (Driverless AI)

ONNX is become more popular by the day. It is a very efficient model format which was created and is maintained by Microsoft and its adoption by other companies and libraries as the standard open format is incresingly rapidly. While the name suggests it is primarily related to neural networks, it can be used with most sklearn libraries and algorithms. 


In this example, we will show you how you can develop in a notebook or other third-party tooling, produce a model and convert it to both `onnx` and `pmml` formats for deploying in Vantage with ModelOps.

In [1]:
import os
import pandas as pd
import getpass

from teradataml import (
    create_context, 
    remove_context,
    get_context,
    get_connection,
    DataFrame,
    retrieve_byom,
    PMMLPredict,
    configure)

In [2]:
username = input("Username")
password = getpass.getpass("password")

create_context(host="3.238.151.85", username=username, password=password, logmech="TDNEGO")

Password ··········


Engine(teradatasql://AOA_MASTER:***@3.238.151.85)

In [3]:
from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


train_pdf = DataFrame.from_query("""
SELECT 
    F.*, D.hasdiabetes 
FROM pima_patient_features F
JOIN pima_patient_diagnoses D
    ON F.patientid = D.patientid
""").to_pandas(all_rows=True)

features = ["NumTimesPrg", "Age", "PlGlcConc", "BloodP", "SkinThick", "TwoHourSerIns", "BMI", "DiPedFunc"]
target = "HasDiabetes"

# split data into X and y
X_train = train_pdf[features]
y_train = train_pdf[target]

model = Pipeline([('scaler', MinMaxScaler()),
                  ('xgb', XGBClassifier(eta=0.2, max_depth=6))])

model.fit(X_train, y_train)


Pipeline(steps=[('scaler', MinMaxScaler()),
                ('xgb', XGBClassifier(eta=0.2, max_depth=6))])

#### Convert the model to PMML

You can use the sklearn2pmml or the nyoka python libraries to convert to pmml. The nyoka is a python only package and so it is preferrable. 

In [4]:
from nyoka import xgboost_to_pmml

xgboost_to_pmml(pipeline=model, col_names=features, target_name=target, pmml_f_name="model.pmml")

#### Convert the model to ONNX

We can also convert the model to onnx format. This is a bit more involved as the client libraries for converting from sklearn/xgboost to onnx are not yet as mature.

In [7]:
import numpy as np
from skl2onnx import to_onnx
from skl2onnx import convert_sklearn, to_onnx, update_registered_converter
from skl2onnx.common.shape_calculator import (
    calculate_linear_classifier_output_shapes,
    calculate_linear_regressor_output_shapes)
from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost
from onnxmltools.convert import convert_xgboost as convert_xgboost_booster

update_registered_converter(
    XGBClassifier, 'XGBoostXGBClassifier',
    calculate_linear_classifier_output_shapes, convert_xgboost,
    options={'nocl': [True, False], 'zipmap': [True, False, 'columns']})


model_onnx = to_onnx(model, X_train.astype(np.float32), target_opset=15)
with open("model.onnx", "wb") as f:
    f.write(model_onnx.SerializeToString())


## Import into ModelOps to Operationalize

Go to the ModelOps UI and import this as a new model version. Then follow the workflow to deploy. Note that you can also import programatically via the ModelOps Python SDK. 

You may be wondering why you can't just directly insert the onnx or pmml model directly into the database table. And the answer is you can. However, with ModelOps, you get full governance around this model deployment, including data drift and model monitoring and alerting. 


### View Published Models

Once deployed via ModelOps, we can view the models published to vantage by querying the table they are published to. Note this information is available via the AOA APIs also.



In [13]:
pd.options.display.max_colwidth = 250
pd.read_sql("SELECT TOP 2 * FROM aoa_byom_models", get_connection())

Unnamed: 0,model_version,model_id,model_type,project_id,deployed_at,model
0,e613d303-4fd0-41f1-ba36-39821023e5c1,6b973430-46f9-42b0-8f42-1b8e23aa2b10,PMML,23e1df4b-b630-47a1-ab80-7ad5385fcd8d,2022-02-01 13:16:01.400,"b'<?xml version=""1.0"" encoding=""UTF-8""?>\n<PMML xmlns=""http://www.dmg.org/PMML-4_4"" version=""4.4"">\n <Header copyright=""Copyright (c) 2018 Software AG"" description=""Default Description"">\n <Application name=""Nyoka"" version=""4.3.0""/>\n ..."
1,540fe2b2-d0a6-4843-8d74-9c0c102773ea,6b973430-46f9-42b0-8f42-1b8e23aa2b10,PMML,23e1df4b-b630-47a1-ab80-7ad5385fcd8d,2022-02-01 15:06:41.000,"b'<?xml version=""1.0"" encoding=""UTF-8""?>\n<PMML xmlns=""http://www.dmg.org/PMML-4_4"" version=""4.4"">\n <Header copyright=""Copyright (c) 2018 Software AG"" description=""Default Description"">\n <Application name=""Nyoka"" version=""4.3.0""/>\n ..."


## On-Demand Scoring

In [6]:
configure.byom_install_location = "mldb"

model_version="6f2c99e6-8d05-48e3-953d-bd37ace233b1"

model = DataFrame.from_query(f"""
SELECT * FROM aoa_byom_models 
    WHERE model_version='{model_version}'
""")


preds = PMMLPredict(
        modeldata=model,
        newdata=DataFrame("pima_test"),
        accumulate=['PatientId'])

preds.result.to_pandas().head(10)

Unnamed: 0,PatientId,prediction,json_report
0,1,0,"{""probability_0"":0.9850747504768098,""predicted_HasDiabetes"":0,""probability_1"":0.014925249523190227}"
1,3,0,"{""probability_0"":0.9945255942085638,""predicted_HasDiabetes"":0,""probability_1"":0.005474405791436156}"
2,8,1,"{""probability_0"":0.05020155260184678,""predicted_HasDiabetes"":1,""probability_1"":0.9497984473981532}"
3,9,1,"{""probability_0"":0.28263442350828416,""predicted_HasDiabetes"":1,""probability_1"":0.7173655764917158}"
4,13,1,"{""probability_0"":0.08177880479006427,""predicted_HasDiabetes"":1,""probability_1"":0.9182211952099357}"
5,17,1,"{""probability_0"":0.23832592730256774,""predicted_HasDiabetes"":1,""probability_1"":0.7616740726974323}"
6,19,1,"{""probability_0"":0.37077762620144294,""predicted_HasDiabetes"":1,""probability_1"":0.629222373798557}"
7,26,1,"{""probability_0"":0.18449399225529128,""predicted_HasDiabetes"":1,""probability_1"":0.8155060077447087}"
8,43,1,"{""probability_0"":0.006277949339332567,""predicted_HasDiabetes"":1,""probability_1"":0.9937220506606674}"
9,44,0,"{""probability_0"":0.7477334297352949,""predicted_HasDiabetes"":0,""probability_1"":0.25226657026470506}"


In [7]:
query = f"""
SELECT * FROM mldb.PMMLPredict (
    ON (SELECT * FROM pima_test) AS DataTable
    ON (SELECT * FROM aoa_byom_models 
            WHERE model_version='{model_version}') AS ModelTable DIMENSION
    USING
      Accumulate ('patientid')
) AS td;
"""

pd.read_sql(query, get_connection()).head(10)

Unnamed: 0,PatientId,prediction,json_report
0,1,0,"{""probability_0"":0.9850747504768098,""predicted_HasDiabetes"":0,""probability_1"":0.014925249523190227}"
1,3,0,"{""probability_0"":0.9945255942085638,""predicted_HasDiabetes"":0,""probability_1"":0.005474405791436156}"
2,8,1,"{""probability_0"":0.05020155260184678,""predicted_HasDiabetes"":1,""probability_1"":0.9497984473981532}"
3,9,1,"{""probability_0"":0.28263442350828416,""predicted_HasDiabetes"":1,""probability_1"":0.7173655764917158}"
4,13,1,"{""probability_0"":0.08177880479006427,""predicted_HasDiabetes"":1,""probability_1"":0.9182211952099357}"
5,17,1,"{""probability_0"":0.23832592730256774,""predicted_HasDiabetes"":1,""probability_1"":0.7616740726974323}"
6,19,1,"{""probability_0"":0.37077762620144294,""predicted_HasDiabetes"":1,""probability_1"":0.629222373798557}"
7,26,1,"{""probability_0"":0.18449399225529128,""predicted_HasDiabetes"":1,""probability_1"":0.8155060077447087}"
8,43,1,"{""probability_0"":0.006277949339332567,""predicted_HasDiabetes"":1,""probability_1"":0.9937220506606674}"
9,44,0,"{""probability_0"":0.7477334297352949,""predicted_HasDiabetes"":0,""probability_1"":0.25226657026470506}"
