## Steps

### Import libraries

The first step is to import the libraries required.


In [1]:
import wallaroo
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

### Connect to Wallaroo

Connect to your Wallaroo instance and save the connection as the variable `wl`.

In [2]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

# SSO login through keycloak

# wallarooPrefix = "YOUR PREFIX"
# wallarooSuffix = "YOUR SUFFIX"

# wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
#                     auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
#                     auth_type="sso")

### Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs.  This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, `arrowEnabled=True`. If disabled or you're not sure, set it to `arrowEnabled=False`

The examples below will be shown in an arrow enabled environment.

In [3]:
import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"

if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
    arrowEnabled = False
else:
    arrowEnabled = True
print(arrowEnabled)

False


### Set Variables

The following variables are used to create or use existing workspaces, pipelines, and upload the models.  Adjust them based on your Wallaroo instance and organization requirements.

In [4]:
workspace_name = 'housing-shadow-nbz'
pipeline_name = 'hp-shadow'
champion_model_name = 'housing-rf'
champion_model_file = 'models/rf_model.onnx'
shadow_model_01_name = 'housing-xgb'
shadow_model_01_file = 'models/xgb_model.onnx'
shadow_model_02_name = 'housing-gbr'
shadow_model_02_file = 'models/gbr_model.onnx'

### Workspace and Pipeline

The following creates or connects to an existing workspace based on the variable `workspace_name`, and creates or connects to a pipeline based on the variable `pipeline_name`.

In [5]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

In [6]:
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline


0,1
name,hp-shadow
created,2023-03-14 00:54:59.178748+00:00
last_updated,2023-03-14 00:54:59.178748+00:00
deployed,(none)
tags,
versions,10d62450-7b81-4787-8286-0fd551827dc8
steps,


### Load the Models

The models will be uploaded into the current workspace based on the variable names set earlier and listed as the `champion`, `model2` and `model3`.

In [7]:
champion = wl.upload_model(champion_model_name, champion_model_file).configure()
model2 = wl.upload_model(shadow_model_01_name, shadow_model_01_file).configure()
model3 = wl.upload_model(shadow_model_02_name, shadow_model_02_file).configure()

### Create Shadow Deployment

A shadow deployment is created using the `add_shadow_deploy(champion, challengers[])` method where:

* `champion`: The model that will be primarily used for inferences run through the pipeline.  Inference results will be returned through the Inference Object's `data` element.
* `challengers[]`: An array of models that will be used for inferences iteratively.  Inference results will be returned through the Inference Object's `shadow_data` element.

In [8]:
pipeline.add_shadow_deploy(champion, [model2, model3])

0,1
name,hp-shadow
created,2023-03-14 00:54:59.178748+00:00
last_updated,2023-03-14 00:54:59.178748+00:00
deployed,(none)
tags,
versions,10d62450-7b81-4787-8286-0fd551827dc8
steps,


In [9]:
pipeline.deploy()

Waiting for deployment - this will take up to 45s ............. ok


0,1
name,hp-shadow
created,2023-03-14 00:54:59.178748+00:00
last_updated,2023-03-14 00:55:14.819430+00:00
deployed,True
tags,
versions,"4cbc99fb-9946-420e-981d-fe0ef29a7606, 10d62450-7b81-4787-8286-0fd551827dc8"
steps,housing-rf


### Run Test Inference

Using the data from `sample_data_file`, a test inference will be made.

In [11]:
if arrowEnabled is True:
    sample_data_file = 'data/xtest-1.df.json'
    response = pipeline.infer_from_file(sample_data_file)
else:
    sample_data_file = 'data/xtest-1.json'
    response = pipeline.infer_from_file(sample_data_file)
display(response)

[InferenceResult({'check_failures': [],
  'elapsed': 216622,
  'model_name': 'housing-rf',
  'model_version': 'b4e0daf2-9070-43b9-b757-dc2a3d204087',
  'original_data': {'tensor': [[4.0,
                                2.5,
                                2900.0,
                                5505.0,
                                2.0,
                                0.0,
                                0.0,
                                3.0,
                                8.0,
                                2900.0,
                                0.0,
                                47.6063,
                                -122.02,
                                2970.0,
                                5251.0,
                                12.0,
                                0.0,
                                0.0]]},
  'outputs': [{'Float': {'data': [718013.6875],
                         'dim': [1, 1],
                         'dtype': 'Float',
                         '

### View Pipeline Logs

With the inferences complete, we can retrieve the log data from the pipeline with the pipeline `logs` method.  Note that for **each** inference request, the logs return **one entry per model**.  For this example, for one inference request three log entries will be created.

In [12]:
pipeline.logs()

Timestamp,Output,Input,Anomalies
2023-14-Mar 00:57:13,[array([[659806.]])],"[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]",0
2023-14-Mar 00:57:13,[array([[704901.875]])],"[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]",0
2023-14-Mar 00:57:13,[array([[718013.6875]])],"[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]",0


### View Logs Per Model

Another way of displaying the logs would be to specify the model.

For Arrow enabled Wallaroo instances, the log files are returned as a DataFrame, and the models can be specified by rows.  The following code will display the log data based based on the model name and the inference output for that specific model.

For arrow disabled Wallaroo instances, to view the inputs and results for the shadow deployed models, use the pipeline `logs_shadow_deploy()` method.  The results will be grouped by the inputs.

In [13]:
import json
if arrowEnabled is True:
    logs = pipeline.logs()
    for index, row in logs.iterrows():
        convertedjson = json.loads(row['message'])
        displayModelName = convertedjson['model_name']
        displayOutputs = str(convertedjson['outputs'][0]['Float']['data'][0])
        display([displayModelName,displayOutputs])
else:
    logs = pipeline.logs()
    for log in logs:
        display(log.model_name, log.output) 
    shadow_logs = pipeline.logs_shadow_deploy()
    display(shadow_logs)

'housing-xgb'

[array([[659806.]])]

'housing-gbr'

[array([[704901.875]])]

'housing-rf'

[array([[718013.6875]])]

0,1,2,3,4,5,6
Log Entry 0,Log Entry 0,Log Entry 0,Log Entry 0,Log Entry 0,Log Entry 0,
,,,,,,
Input,"[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]","[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]","[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]","[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]","[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]","[[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]"
Model Type,Model Name,Output,Timestamp,Model Version,Elapsed,
Primary,housing-rf,[array([[718013.6875]])],2023-03-14T00:57:13.131000,b4e0daf2-9070-43b9-b757-dc2a3d204087,216622,
Challenger,housing-gbr,"[{'Float': {'v': 1, 'dim': [1, 1], 'data': [704901.875]}}]",,,,
Challenger,housing-xgb,"[{'Float': {'v': 1, 'dim': [1, 1], 'data': [659806.0]}}]",,,,


### Undeploy the Pipeline

With the tutorial complete, we undeploy the pipeline and return the resources back to the system.

In [14]:
pipeline.undeploy()

Waiting for undeployment - this will take up to 45s ..................................... ok


0,1
name,hp-shadow
created,2023-03-14 00:54:59.178748+00:00
last_updated,2023-03-14 00:55:14.819430+00:00
deployed,False
tags,
versions,"4cbc99fb-9946-420e-981d-fe0ef29a7606, 10d62450-7b81-4787-8286-0fd551827dc8"
steps,housing-rf
