This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/tree/main/model_conversion/xgboost-autoconversion).

## Introduction

The following tutorial is a brief example of how to convert a [XGBoost](https://xgboost.readthedocs.io/en/stable/index.html) Regression ML model with the `convert_model` method and upload it into your Wallaroo instance.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

* Convert a `XGBoost` Regression ML model and upload it into the Wallaroo engine.
* Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

* `xgb_reg.pickle`: A pretrained `XGBoost` Regression model with 25 columns.
* `xgb_regression_eval.json`: Test data to perform a sample inference.

## Conversion Steps

## Conversion Steps

To use the Wallaroo autoconverter `convert_model(path, source_type, conversion_arguments)` method takes 3 parameters.  The parameters for `XGBoost` conversions are:

* `path` (STRING):  The path to the ML model file.
* `source_type` (ModelConversionSource): The type of ML model to be converted.  As of this time Wallaroo auto-conversion supports the following source types and their associated `ModelConversionSource`:
  * **sklearn**: `ModelConversionSource.SKLEARN`
  * **xgboost**: `ModelConversionSource.XGBOOST`
  * **keras**: `ModelConversionSource.KERAS`
* `conversion_arguments`:  The arguments for the conversion based on the type of model being converted.  These are:
    * `wallaroo.ModelConversion.ConvertXGBoostArgs`: Used for `XGBoost` models and takes the following parameters:
    * `name`: The name of the model being converted.
    * `comment`: Any comments for the model.
    * `number_of_columns`: The number of columns the model was trained for.
    * `input_type`: A [tensorflow Dtype](https://www.tensorflow.org/api_docs/python/tf/dtypes/DType) called in the format `ModelConversionInputType.{type}`, where `{type}` is `Float`, `Double`, etc depending on the model.

### Import Libraries

The first step is to import the libraries needed.

In [12]:
import wallaroo

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

### Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable `wl`.

In [13]:
# Login through local Wallaroo instance

#wl = wallaroo.Client()

# SSO login through keycloak

wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR PREFIX"

wallarooPrefix = "sparkly-apple-3026"
wallarooSuffix = "wallaroo.community"

# wallarooPrefix = "squishy-wallaroo-6187"
# wallarooSuffix = "wallaroo.dev"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

ERROR:root:Keycloak token refresh got error: 400 - {"error":"invalid_grant","error_description":"Invalid refresh token"}


Please log into the following URL in a web browser:

	https://sparkly-apple-3026.keycloak.wallaroo.community/auth/realms/master/device?user_code=TCUP-ETBW

Login successful!


### Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs.  This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, `arrowEnabled=True`. If disabled or you're not sure, set it to `arrowEnabled=False`

The examples below will be shown in an arrow enabled environment.

In [21]:
import os
arrowEnabled=True
os.environ["ARROW_ENABLED"]=f"{arrowEnabled}"

### Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the `keras` model, and the sample data.

The functions `get_workspace(name)` will either set the current workspace to the requested name, or create it if it does not exist.  The function `get_pipeline(name)` will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

In [15]:
workspace_name = 'xgboost-regression-autoconvert-workspace'
pipeline_name = 'xgboost-regression-autoconvert-pipeline'
model_name = 'xgb-regression-model'
model_file_name = 'xgb_reg.pickle'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

### Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

In [16]:
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline

0,1
name,xgboost-regression-autoconvert-pipeline
created,2023-02-22 17:36:31.143795+00:00
last_updated,2023-02-22 17:36:31.143795+00:00
deployed,(none)
tags,
versions,1220c119-45e9-4ff4-bdfd-8ff2f95486d5
steps,


### Set the Model Autoconvert Parameters

Set the paramters for converting the `xgb-class-model`.

In [17]:
#the number of columns
NF = 25

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="xgboost regression model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

### Upload and Convert the Model

Now we can upload the convert the model.  Once finished, it will be stored as `{unique-file-id}-converted.onnx`.

In [18]:
# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)

## Test Inference

With the model uploaded and converted, we can run a sample inference.

### Deploy the Pipeline

Add the uploaded and converted `model_wl` as a step in the pipeline, then deploy it.

In [19]:
pipeline.add_model_step(model_wl).deploy()

0,1
name,xgboost-regression-autoconvert-pipeline
created,2023-02-22 17:36:31.143795+00:00
last_updated,2023-02-22 17:36:36.097137+00:00
deployed,True
tags,
versions,"45081f0c-1991-4ce0-907c-6135ca328084, 1220c119-45e9-4ff4-bdfd-8ff2f95486d5"
steps,xgb-regression-model


In [24]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.48.0.117',
   'name': 'engine-58bffff6f9-j55cb',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'xgboost-regression-autoconvert-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'xgb-regression-model',
      'version': '0c3e4ea4-684b-4ce1-843e-fce88acb3751',
      'sha': '7414d26cb5495269bc54bcfeebd269d7c74412cbfca07562fc7cb184c55b6f8e',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.48.0.116',
   'name': 'engine-lb-74b4969486-n267q',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

### Run the Inference

Use the `test_class_eval.json` as set earlier as our `sample_data` and perform the inference.

In [26]:
if arrowEnabled is True:
    sample_data = 'xgb_regression_eval.df.json'
    result = pipeline.infer_from_file(sample_data)
    display(result)
else:
    sample_data = 'xgb_regression_eval.json'
    result = pipeline.infer_from_file(sample_data)
    result[0].data()

Unnamed: 0,error_message
0,There was an unexpected error retrieving the i...


### Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

In [None]:
pipeline.undeploy()

0,1
name,xgboost-regression-autoconvert-pipeline
created,2023-02-22 17:34:32.945311+00:00
last_updated,2023-02-22 17:34:38.173755+00:00
deployed,False
tags,
versions,"aa4361e0-0b74-4179-a91b-c2a65a4b4553, eda7b006-e7ef-47a5-99bf-29405e88f687"
steps,xgb-regression-model
