This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/blob/main/wallaroo-observe-tutorials/pipeline-log-tutorial).

## Pipeline Log Tutorial

This tutorial demonstrates Wallaroo Pipeline logs and 

This tutorial will demonstrate how to:

1. Select or create a workspace, pipeline and upload the control model, then additional models for A/B Testing and Shadow Deploy.
1. Add a pipeline step with the champion model, then deploy the pipeline and perform sample inferences.
1. Display the various log types for a standard deployed pipeline.
1. Swap out the pipeline step with the champion model with a shadow deploy step that compares the champion model against two competitors.
1. Perform sample inferences with a shadow deployed step, then display the log files for a shadow deployed pipeline.
1. Swap out the shadow deployed pipeline step with an A/B pipeline step.
1. Perform sample inferences with a A/B pipeline step, then display the log files for an A/B pipeline step.
1. Undeploy the pipeline.

This tutorial provides the following:

* Models:
  * `models/rf_model.onnx`: The champion model that has been used in this environment for some time.
  * `models/xgb_model.onnx` and `models/gbr_model.onnx`: Rival models that will be tested against the champion.
* Data:
  * `data/xtest-1.df.json` and `data/xtest-1k.df.json`:  DataFrame JSON inference inputs with 1 input and 1,000 inputs.
  * `data/xtest-1k.arrow`:  Apache Arrow inference inputs with 1 input and 1,000 inputs.

## Prerequisites

* A deployed Wallaroo instance
* The following Python libraries installed:
  * [`wallaroo`](https://pypi.org/project/wallaroo/): The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
  * [`pandas`](https://pypi.org/project/pandas/): Pandas, mainly used for Pandas DataFrame
  * [`pyarrow`](https://pypi.org/project/pyarrow/): Pyarrow for Apache Arrow support

## Initial Steps

### Import libraries

The first step is to import the libraries needed for this notebook.

In [1]:
import wallaroo
from wallaroo.object import EntityNotFoundError

import pyarrow as pa

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import datetime

import os

### Connect to the Wallaroo Instance

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [3]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

### Create Workspace

We will create a workspace to manage our pipeline and models.  The following variables will set the name of our sample workspace then set it as the current workspace.

In [4]:
workspace_name = 'logworkspace'
main_pipeline_name = 'logpipeline-test'
model_name_control = 'logcontrol'
model_file_name_control = './models/rf_model.onnx'

In [6]:
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)

{'name': 'logworkspace', 'id': 27, 'archived': False, 'created_by': 'c97d480f-6064-4537-b18e-40fb1864b4cd', 'created_at': '2024-02-09T16:21:07.131681+00:00', 'models': [], 'pipelines': []}

## Standard Pipeline

### Upload The Champion Model

For our example, we will upload the champion model that has been trained to derive house prices from a variety of inputs.  The model file is `rf_model.onnx`, and is uploaded with the name `housingcontrol`.

In [7]:
housing_model_control = (wl.upload_model(model_name_control, 
                                         model_file_name_control, 
                                         framework=wallaroo.framework.Framework.ONNX)
                                         .configure(tensor_fields=["tensor"])
                        )

### Build the Pipeline

This pipeline is made to be an example of an existing situation where a model is deployed and being used for inferences in a production environment.  We'll call it `housepricepipeline`, set `housingcontrol` as a pipeline step, then run a few sample inferences.

In [24]:
mainpipeline = wl.build_pipeline(main_pipeline_name)
# in case this pipeline was run before
mainpipeline.clear()
mainpipeline.add_model_step(housing_model_control)


deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25)\
    .build()

mainpipeline.deploy(deployment_config=deploy_config)

Waiting for deployment - this will take up to 45s .......... ok


0,1
name,logpipeline-test
created,2024-02-09 16:21:09.406182+00:00
last_updated,2024-02-09 16:30:53.067304+00:00
deployed,True
arch,
tags,
versions,"e2b9d903-4015-4d09-902b-9150a7196cea, 9df38be1-d2f4-4be1-9022-8f0570a238b9, 3078b49f-3eff-48d1-8d9b-a8780b329ecc, 21bff9df-828f-40e7-8a22-449a2e636b44, f78a7030-bd25-4bf7-ba0d-a18cfe3790e0, 10c1ac25-d626-4413-8d5d-1bed42d0e65c, b179b693-b6b6-4ff9-b2a4-2a639d88bc9b, da7b9cf0-81e8-452b-8b70-689406dc9548, a9a9b62c-9d37-427f-99af-67725558bf9b, 1c14591a-96b4-4059-bb63-2d2bc4e308d5, add660ac-0ebf-4a24-bb6d-6cdc875866c8"
steps,logcontrol
published,False


### Testing

We'll use two inferences as a quick sample test - one that has a house that should be determined around \\$700k, the other with a house determined to be around \\$1.5 million.  We'll also save the start and end periods for these events to for later log functionality.

In [25]:
dataframe_start = datetime.datetime.now()

normal_input = pd.DataFrame.from_records({"tensor": [
            [
                4.0, 
                2.5, 
                2900.0, 
                5505.0, 
                2.0, 
                0.0, 
                0.0, 
                3.0, 
                8.0, 
                2900.0, 
                0.0, 
                47.6063, 
                -122.02, 
                2970.0, 
                5251.0, 
                12.0, 
                0.0, 
                0.0
            ]
        ]
    }
)
result = mainpipeline.infer(normal_input)
display(result)

Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:31:04.817,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0


In [26]:
large_house_input = pd.DataFrame.from_records(
    {
        'tensor': [
            [
                4.0, 
                3.0, 
                3710.0, 
                20000.0, 
                2.0, 
                0.0, 
                2.0, 
                5.0, 
                10.0, 
                2760.0, 
                950.0, 
                47.6696, 
                -122.261, 
                3970.0, 
                20000.0, 
                79.0, 
                0.0, 
                0.0
            ]
        ]
    }
)
large_house_result = mainpipeline.infer(large_house_input)
display(large_house_result)

import time
time.sleep(10)
dataframe_end = datetime.datetime.now()

Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:31:04.917,"[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0]",[1514079.4],0


As one last sample, we'll run through roughly 1,000 inferences at once and show a few of the results.  For this example we'll use an Apache Arrow table, which has a smaller file size compared to uploading a pandas DataFrame JSON file.  The inference result is returned as an arrow table, which we'll convert into a pandas DataFrame to display the first 20 results.

In [27]:
batch_inferences = mainpipeline.infer_from_file('./data/xtest-1k.arrow')

large_inference_result = batch_inferences.to_pandas()
display(large_inference_result.head(20))

Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:31:15.018,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.75],0
1,2024-02-09 16:31:15.018,"[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0, 8.0, 2170.0, 0.0, 47.7109, -122.017, 2310.0, 7419.0, 6.0, 0.0, 0.0]",[615094.56],0
2,2024-02-09 16:31:15.018,"[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, 8.0, 880.0, 420.0, 47.5893, -122.317, 1300.0, 824.0, 6.0, 0.0, 0.0]",[448627.72],0
3,2024-02-09 16:31:15.018,"[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0, 9.0, 2500.0, 0.0, 47.5759, -121.994, 2560.0, 8475.0, 24.0, 0.0, 0.0]",[758714.2],0
4,2024-02-09 16:31:15.018,"[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.0, 7.0, 2200.0, 0.0, 47.7659, -122.341, 1690.0, 8038.0, 62.0, 0.0, 0.0]",[513264.7],0
5,2024-02-09 16:31:15.018,"[3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0, 8.0, 1070.0, 1070.0, 47.6902, -122.339, 1470.0, 4923.0, 86.0, 0.0, 0.0]",[668288.0],0
6,2024-02-09 16:31:15.018,"[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0, 9.0, 3140.0, 450.0, 47.6763, -122.267, 2100.0, 6250.0, 9.0, 0.0, 0.0]",[1004846.5],0
7,2024-02-09 16:31:15.018,"[3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, 9.0, 1040.0, 240.0, 47.602, -122.311, 1280.0, 1173.0, 0.0, 0.0, 0.0]",[684577.2],0
8,2024-02-09 16:31:15.018,"[4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0, 9.0, 2820.0, 0.0, 47.7255, -122.101, 2440.0, 15000.0, 29.0, 0.0, 0.0]",[727898.1],0
9,2024-02-09 16:31:15.018,"[3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3.0, 8.0, 1790.0, 0.0, 47.6297, -122.099, 2290.0, 11894.0, 36.0, 0.0, 0.0]",[559631.1],0


### Standard Pipeline Logs

Pipeline logs with standard pipeline steps are retrieved either with:

* Pipeline `logs` which returns either a pandas DataFrame or Apache Arrow table.
* Pipeline `export_logs` which saves the logs either a pandas DataFrame JSON file or Apache Arrow table.

For full details, see the Wallaroo Documentation Pipeline Log Management guide.

#### Pipeline Log Method

The Pipeline `logs` method includes the following parameters.  For a complete list, see the [Wallaroo SDK Essentials Guide: Pipeline Log Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipeline-logs/).

| Parameter | Type | Description |
|---|---|---|
| `limit` | **Int** (*Optional*) | Limits how many log records to display.  Defaults to `100`.  If there are more pipeline logs than are being displayed, the **Warning** message `Pipeline log record limit exceeded` will be displayed.  For example, if 100 log files were requested and there are a total of 1,000, the warning message will be displayed. |
| `start_datetime` and `end_datetime` | **DateTime** (*Optional*) | Limits logs to all logs between the `start` and `end` DateTime parameters.  **Both parameters must be provided**. Submitting a `logs()` request with only `start_datetime` or `end_datetime` will generate an exception.<br />If `start_datetime` and `end_datetime` are provided as parameters, then the records are returned in **chronological** order, with the oldest record displayed first. |
| `dataset` | List (*OPTIONAL*) | The datasets to be returned. The datasets available are:<ul><li>`*`: Default. This translates to `["time", "in", "out", "anomaly"]`.</li><li>`time`: The DateTime of the inference request.</li><li>`in`: All inputs listed as `in_{variable_name}`.</li><li>`out`: All outputs listed as `out_variable_name`.</li><li>`anomaly`: Flags whether an [anomaly was detected](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-validation/) was triggered. `0` indicates no checks were triggered, 1 or greater indicates a an anomaly was detected. was triggered.  Each validation is displayed in the returned logs as part of the `anomaly` dataset as `anomaly.{validation_name}`.  For more information on anomaly detection, see [Wallaroo SDK Essentials Guide: Anomaly Detection](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-validation/)</li><li>`meta`: Returns metadata. **IMPORTANT NOTE**: See [Metadata RequestsRestrictions](#metadata-requests-restrictions) for specifications on how this dataset can be used with other datasets.<ul><li> Returns in the `metadata.elapsed` field:<ul><li>A list of time in nanoseconds for:<ul><li>The time to serialize the input.</li><li>How long each step took.</li></ul></li></ul></li><li>Returns in the `metadata.last_model` field:</li><ul><li>A dict with each Python step as:<ul><li>`model_name`: The name of the model in the pipeline step.</li><li>`model_sha` : The sha hash of the model in the pipeline step.</li></ul></li></ul></li><li>Returns in the `metadata.pipeline_version` field:<ul><li>The pipeline version as a UUID value.</li></ul></li></ul><li>`metadata.elapsed`: **IMPORTANT NOTE**: See [Metadata Requests Restrictions](#metadata-requests-restrictions)for specifications on how this dataset can be used with other datasets.<ul><li>Returns in the `metadata.elapsed` field:<ul><li>A list of time in nanoseconds for:<ul><li>The time to serialize the input.</li><li>How long each step took.</li></ul></li></ul></li></ul></ul> |
| `arrow` | **Boolean** (*Optional*) | Defaults to **False**.  If `arrow` is set to `True`, then the logs are returned as an [Apache Arrow table](https://arrow.apache.org/).  If `arrow=False`, then the logs are returned as a pandas DataFrame. |

##### Pipeline Log Warnings

If the total number of logs the either the set limit or 10 MB in file size, the following warning is returned:

`Warning: There are more logs available. Please set a larger limit or request a file using export_logs.`

If the total number of logs **requested** either through the limit or through the `start_datetime` and `end_datetime` request is greater than 10 MB in size, the following error is displayed:

`Warning: Pipeline log size limit exceeded. Only displaying 509 log messages. Please request a file using export_logs.`

The following examples demonstrate displaying the logs, then displaying the logs between the `control_model_start` and `control_model_end` periods, then again retrieved as an Arrow table with the logs limited to only 5 entries.

In [28]:
# pipeline log retrieval - reverse chronological order

regular_logs = mainpipeline.logs()

display("Standard Logs")
display(len(regular_logs))
display(regular_logs)

# Display metadata

metadatalogs = mainpipeline.logs(dataset=["time", "out.variable", "metadata"])
display("Metadata Logs")
# Only showing the pipeline version for space reasons
display(metadatalogs.loc[:, ["time", "out.variable", "metadata.pipeline_version"]])

# Display logs restricted by date and limit 

display("Logs restricted by date")
arrow_logs = mainpipeline.logs(start_datetime=dataframe_start, end_datetime=dataframe_end, limit=50)

display(len(arrow_logs))
display(arrow_logs)

# # pipeline log retrieval limited to arrow tables
display(mainpipeline.logs(arrow=True))

Pipeline log schema has changed over the logs requested 1 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


'Standard Logs'

1

Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:28:44.753,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0


Pipeline log schema has changed over the logs requested 1 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


'Metadata Logs'

Unnamed: 0,time,out.variable,metadata.pipeline_version
0,2024-02-09 16:28:44.753,[718013.7],21bff9df-828f-40e7-8a22-449a2e636b44


'Logs restricted by date'

2

Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:31:04.817,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0
1,2024-02-09 16:31:04.917,"[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0]",[1514079.4],0


Pipeline log schema has changed over the logs requested 1 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


pyarrow.Table
time: timestamp[ms]
in.tensor: list<item: double> not null
  child 0, item: double
out.variable: list<inner: float not null> not null
  child 0, inner: float not null
anomaly.count: uint32 not null
----
time: [[2024-02-09 16:28:44.753]]
in.tensor: [[[4,2.5,2900,5505,2,...,2970,5251,12,0,0]]]
out.variable: [[[718013.7]]]
anomaly.count: [[0]]

In [29]:
result = mainpipeline.infer(normal_input, dataset=["*", "metadata.pipeline_version"])
display(result)

Unnamed: 0,time,in.tensor,out.variable,anomaly.count,metadata.pipeline_version
0,2024-02-09 16:31:30.617,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0,


The following displays the pipeline metadata logs.

#### Standard Pipeline Steps Log Requests

Effected pipeline steps:

* `add_model_step`
* `replace_with_model_step`

For log file requests, the following metadata dataset requests for standard pipeline steps are available:

* `metadata`

These must be paired with specific columns.  `*` is **not** available when paired with `metadata`.

* `in`: All input fields.
* `out`: All output fields.
* `time`: The DateTime the inference request was made. 
* `in.{input_fields}`: Any input fields (`tensor`, etc.)
* `out.{output_fields}`: Any output fields (`out.house_price`, `out.variable`, etc.)
* `anomaly.count`:  Any anomalies detected from validations.
* `anomaly.{validation}`: The validation that triggered the anomaly detection and whether it is `True` (indicating an anomaly was detected) or `False`.

The following requests the metadata, and displays the output variable and last model from the metadata.

In [30]:
# Display metadata

metadatalogs = mainpipeline.logs(dataset=['time', "out","metadata"])
display("Metadata Logs")
display(metadatalogs.loc[:, ['time', 'out.variable', 'metadata.last_model']])


Pipeline log schema has changed over the logs requested 2 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


'Metadata Logs'

Unnamed: 0,time,out.variable,metadata.last_model
0,2024-02-09 16:28:44.753,[718013.7],"{""model_name"":""logcontrol"",""model_sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}"
1,2024-02-09 16:31:30.617,[718013.7],"{""model_name"":""logcontrol"",""model_sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}"


#### Pipeline Limits

In a previous step we performed 10,000 inferences at once.  If we attempt to pull them at once, we'll likely run into the size limit for this pipeline and receive the following warning message indicating that the pipeline size limits were exceeded and we should use `export_logs` instead.

`Warning: Pipeline log size limit exceeded. Only displaying 1000 log messages (of 10000 requested). Please request a file using export_logs.`

In [31]:
logs = mainpipeline.logs(limit=10000)
display(logs)

Pipeline log schema has changed over the logs requested 2 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


Unnamed: 0,time,in.tensor,out.variable,anomaly.count
0,2024-02-09 16:28:44.753,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0
1,2024-02-09 16:31:30.617,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.7],0



#### Pipeline export_logs Method

The Pipeline method `export_logs` returns the Pipeline records as either a DataFrame JSON file, or an Apache Arrow table file.  For a complete list, see the [Wallaroo SDK Essentials Guide: Pipeline Log Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipeline-logs/).

The `export_logs` method takes the following parameters:

| Parameter | Type | Description |
|---|---|---|
| `directory` | **String** (*Optional*) (*Default*: `logs`) | Logs are exported to a file from current working directory to `directory`.|
| `data_size_limit` | **String** (*Optional*) ((*Default*: `100MB`) | The maximum size for the exported data in bytes.  Note that file size is approximate to the request; a request of `10MiB` may return 10.3MB of data.  The fields are in the format "{size as number} {unit value}", and can include a space so "10 MiB" and "10MiB" are the same.  The accepted unit values are:  <ul><li>`KiB` (for KiloBytes)</li><li>`MiB` (for MegaBytes)</li><li>`GiB` (for GigaBytes)</li><li>`TiB` (for TeraBytes)</li></ul>  |
| `file_prefix` | **String** (*Optional*) (*Default*: The name of the pipeline) | The name of the exported files.  By default, this will the name of the pipeline and is segmented by pipeline version between the limits or the start and end period.  For example:  'logpipeline-1.json`, etc. |
| `limit` | **Int** (*Optional*) | Limits how many log records to display.  Defaults to `100`.  If there are more pipeline logs than are being displayed, the **Warning** message `Pipeline log record limit exceeded` will be displayed.  For example, if 100 log files were requested and there are a total of 1,000, the warning message will be displayed. |
| `start` and `end` | **DateTime** (*Optional*) | Limits logs to all logs between the `start` and `end` DateTime parameters.  **Both parameters must be provided**. Submitting a `logs()` request with only `start` or `end` will generate an exception.<br />If `start` and `end` are provided as parameters, then the records are returned in **chronological** order, with the oldest record displayed first. |
| `dataset` | List (*OPTIONAL*) | The datasets to be returned. The datasets available are:<ul><li>`*`: Default. This translates to `["time", "in", "out", "anomaly"]`.</li><li>`time`: The DateTime of the inference request.</li><li>`in`: All inputs listed as `in_{variable_name}`.</li><li>`out`: All outputs listed as `out_variable_name`.</li><li>`anomaly`: Flags whether an [anomaly was detected](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-validation/) was triggered. `0` indicates no checks were triggered, 1 or greater indicates a an anomaly was detected. was triggered.  Each validation is displayed in the returned logs as part of the `anomaly` dataset as `anomaly.{validation_name}`.  For more information on anomaly detection, see [Wallaroo SDK Essentials Guide: Anomaly Detection](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-validation/)</li><li>`meta`: Returns metadata. **IMPORTANT NOTE**: See [Metadata RequestsRestrictions](#metadata-requests-restrictions) for specifications on how this dataset can be used with otherdatasets.<ul><li> Returns in the `metadata.elapsed` field:<ul><li>A list of time in nanoseconds for:<ul><li>The time to serialize the input.</li><li>How long each step took.</li></ul></li></ul></li><li>Returns in the `metadata.last_model` field:</li><ul><li>A dict with each Python step as:<ul><li>`model_name`: The name of the model in the pipeline step.</li><li>`model_sha` : The sha hash of the model in the pipeline step.</li></ul></li></ul></li><li>Returns in the `metadata.pipeline_version` field:<ul><li>The pipeline version as a UUID value.</li></ul></li></ul><li>`metadata.elapsed`: **IMPORTANT NOTE**: See [Metadata Requests Restrictions](#metadata-requests-restrictions)for specifications on how this dataset can be used with other datasets.<ul><li>Returns in the `metadata.elapsed` field:<ul><li>A list of time in nanoseconds for:<ul><li>The time to serialize the input.</li><li>How long each step took.</li></ul></li></ul></li></ul></ul> |
| `arrow` | **Boolean** (*Optional*) | Defaults to **False**.  If `arrow` is set to `True`, then the logs are returned as an [Apache Arrow table](https://arrow.apache.org/).  If `arrow=False`, then the logs are returned as JSON in pandas DataFrame format. |

The following examples demonstrate saving a DataFrame version of the `mainpipeline` logs, then an Arrow version.

In [32]:
# Save the DataFrame version of the log file

mainpipeline.export_logs()
display(os.listdir('./logs'))

mainpipeline.export_logs(arrow=True)
display(os.listdir('./logs'))


Note: The logs with different schemas are written to separate files in the provided directory.

['logpipeline-test-1.arrow',
 'logpipeline-test-2.arrow',
 'logpipeline-test-2.json',
 'logpipeline-1.json',
 'logpipeline-test-1.json',
 'logpipeline-1.arrow']


Note: The logs with different schemas are written to separate files in the provided directory.

['logpipeline-test-1.arrow',
 'logpipeline-test-2.arrow',
 'logpipeline-test-2.json',
 'logpipeline-1.json',
 'logpipeline-test-1.json',
 'logpipeline-1.arrow']

## Shadow Deploy Pipelines

Let's assume that after analyzing the assay information we want to test two challenger models to our control.  We do that with the Shadow Deploy pipeline step.

In Shadow Deploy, the pipeline step is added with the `add_shadow_deploy` method, with the champion model listed first, then an array of challenger models after.  **All** inference data is fed to **all** models, with the champion results displayed in the `out.variable` column, and the shadow results in the format `out_{model name}.variable`.  For example, since we named our challenger models `housingchallenger01` and `housingchallenger02`, the columns `out_housingchallenger01.variable` and `out_housingchallenger02.variable` have the shadow deployed model results.

For this example, we will remove the previous pipeline step, then replace it with a shadow deploy step with `rf_model.onnx` as our champion, and models `xgb_model.onnx` and `gbr_model.onnx` as the challengers.  We'll deploy the pipeline and prepare it for sample inferences.

In [33]:
# Upload the challenger models

model_name_challenger01 = 'logcontrolchallenger01'
model_file_name_challenger01 = './models/xgb_model.onnx'

model_name_challenger02 = 'logcontrolchallenger02'
model_file_name_challenger02 = './models/gbr_model.onnx'

housing_model_challenger01 = (wl.upload_model(model_name_challenger01, 
                                              model_file_name_challenger01, 
                                              framework=wallaroo.framework.Framework.ONNX)
                                              .configure(tensor_fields=["tensor"])
                            )
housing_model_challenger02 = (wl.upload_model(model_name_challenger02, 
                                              model_file_name_challenger02, 
                                              framework=wallaroo.framework.Framework.ONNX)
                                              .configure(tensor_fields=["tensor"])
                            )


In [34]:
# Undeploy the pipeline
mainpipeline.undeploy()

mainpipeline.clear()

# Add the new shadow deploy step with our challenger models
mainpipeline.add_shadow_deploy(housing_model_control, [housing_model_challenger01, housing_model_challenger02])

# Deploy the pipeline with the new shadow step
deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25)\
    .build()

mainpipeline.deploy(deployment_config=deploy_config)

Waiting for undeployment - this will take up to 45s ................................... ok
Waiting for deployment - this will take up to 45s ........ ok


0,1
name,logpipeline-test
created,2024-02-09 16:21:09.406182+00:00
last_updated,2024-02-09 16:33:08.547068+00:00
deployed,True
arch,
tags,
versions,"e143a2d5-5641-4dcc-8ae4-786fd777a30a, e2b9d903-4015-4d09-902b-9150a7196cea, 9df38be1-d2f4-4be1-9022-8f0570a238b9, 3078b49f-3eff-48d1-8d9b-a8780b329ecc, 21bff9df-828f-40e7-8a22-449a2e636b44, f78a7030-bd25-4bf7-ba0d-a18cfe3790e0, 10c1ac25-d626-4413-8d5d-1bed42d0e65c, b179b693-b6b6-4ff9-b2a4-2a639d88bc9b, da7b9cf0-81e8-452b-8b70-689406dc9548, a9a9b62c-9d37-427f-99af-67725558bf9b, 1c14591a-96b4-4059-bb63-2d2bc4e308d5, add660ac-0ebf-4a24-bb6d-6cdc875866c8"
steps,logcontrol
published,False


### Shadow Deploy Sample Inference

We'll now use our same sample data for an inference to our shadow deployed pipeline, then display the first 20 results with just the comparative outputs.

In [35]:
shadow_date_start = datetime.datetime.now()

shadow_result = mainpipeline.infer_from_file('./data/xtest-1k.arrow')

shadow_outputs =  shadow_result.to_pandas()
display(shadow_outputs.loc[0:20,['out.variable','out_logcontrolchallenger01.variable','out_logcontrolchallenger02.variable']])

shadow_date_end = datetime.datetime.now()

Unnamed: 0,out.variable,out_logcontrolchallenger01.variable,out_logcontrolchallenger02.variable
0,[718013.75],[659806.0],[704901.9]
1,[615094.56],[732883.5],[695994.44]
2,[448627.72],[419508.84],[416164.8]
3,[758714.2],[634028.8],[655277.2]
4,[513264.7],[427209.44],[426854.66]
5,[668288.0],[615501.9],[632556.1]
6,[1004846.5],[1139732.5],[1100465.2]
7,[684577.2],[498328.88],[528278.06]
8,[727898.1],[722664.4],[659439.94]
9,[559631.1],[525746.44],[534331.44]


### Shadow Deploy Logs

Pipelines with a shadow deployed step include the shadow inference result in the same format as the inference result:  inference results from shadow deployed models are displayed as `out_{model name}.{output variable}`.

In [39]:
# display logs with shadow deployed steps

display(mainpipeline.logs(start_datetime=shadow_date_start, end_datetime=shadow_date_end).loc[:, ["time", "out.variable", "out_logcontrolchallenger01.variable", "out_logcontrolchallenger02.variable"]])



Unnamed: 0,time,out.variable,out_logcontrolchallenger01.variable,out_logcontrolchallenger02.variable
0,2024-02-09 16:33:18.093,[718013.75],[659806.0],[704901.9]
1,2024-02-09 16:33:18.093,[615094.56],[732883.5],[695994.44]
2,2024-02-09 16:33:18.093,[448627.72],[419508.84],[416164.8]
3,2024-02-09 16:33:18.093,[758714.2],[634028.8],[655277.2]
4,2024-02-09 16:33:18.093,[513264.7],[427209.44],[426854.66]
...,...,...,...,...
495,2024-02-09 16:33:18.093,[873315.0],[779848.6],[771244.75]
496,2024-02-09 16:33:18.093,[721143.6],[607252.1],[610430.56]
497,2024-02-09 16:33:18.093,[1048372.4],[844343.56],[900959.4]
498,2024-02-09 16:33:18.093,[244566.38],[251694.84],[246188.81]


For log file requests, the following metadata dataset requests for testing pipeline steps are available:

* `metadata`

These must be paired with specific columns.  `*` is **not** available when paired with `metadata`.

* `in`: All input fields.
* `out`: All output fields.
* `time`: The DateTime the inference request was made.
* `in.{input_fields}`: Any input fields (`tensor`, etc.).
* `out.{output_fields}`: Any output fields matching the specific `output_field` (`out.house_price`, `out.variable`, etc.).
* `out_`: All shadow deployed challenger steps Any output fields matching the specific `output_field` (`out.house_price`, `out.variable`, etc.).
* `anomaly.count`:  Any anomalies detected from validations.
* `anomaly.{validation}`: The validation that triggered the anomaly detection and whether it is `True` (indicating an anomaly was detected) or `False`.

The following example retrieves the logs from a pipeline with shadow deployed models, and displays the specific shadow deployed model outputs and the `metadata.elasped` field.

In [40]:
# display logs with shadow deployed steps

display(mainpipeline.logs(start_datetime=shadow_date_start, end_datetime=shadow_date_end).loc[:, ["time", 
                                                                                                  "out.variable", 
                                                                                                  "out_logcontrolchallenger01.variable", 
                                                                                                  "out_logcontrolchallenger02.variable"
                                                                                                  ]
                                                                                        ])



Unnamed: 0,time,out.variable,out_logcontrolchallenger01.variable,out_logcontrolchallenger02.variable
0,2024-02-09 16:33:18.093,[718013.75],[659806.0],[704901.9]
1,2024-02-09 16:33:18.093,[615094.56],[732883.5],[695994.44]
2,2024-02-09 16:33:18.093,[448627.72],[419508.84],[416164.8]
3,2024-02-09 16:33:18.093,[758714.2],[634028.8],[655277.2]
4,2024-02-09 16:33:18.093,[513264.7],[427209.44],[426854.66]
...,...,...,...,...
495,2024-02-09 16:33:18.093,[873315.0],[779848.6],[771244.75]
496,2024-02-09 16:33:18.093,[721143.6],[607252.1],[610430.56]
497,2024-02-09 16:33:18.093,[1048372.4],[844343.56],[900959.4]
498,2024-02-09 16:33:18.093,[244566.38],[251694.84],[246188.81]


In [47]:
metadatalogs = mainpipeline.logs(dataset=["time",
                                          "out_logcontrolchallenger01.variable", 
                                          "out_logcontrolchallenger02.variable", 
                                          "metadata",
                                          'anomaly.count'
                                          ],
                                start_datetime=shadow_date_start, 
                                end_datetime=shadow_date_end
                                )

display(metadatalogs.loc[:, ['out_logcontrolchallenger01.variable',	
                             'out_logcontrolchallenger02.variable', 
                             'metadata.elapsed',
                             'anomaly.count'
                             ]
                        ])



Unnamed: 0,out_logcontrolchallenger01.variable,out_logcontrolchallenger02.variable,metadata.elapsed,anomaly.count
0,[659806.0],[704901.9],"[325472, 124071]",0
1,[732883.5],[695994.44],"[325472, 124071]",0
2,[419508.84],[416164.8],"[325472, 124071]",0
3,[634028.8],[655277.2],"[325472, 124071]",0
4,[427209.44],[426854.66],"[325472, 124071]",0
...,...,...,...,...
495,[779848.6],[771244.75],"[325472, 124071]",0
496,[607252.1],[610430.56],"[325472, 124071]",0
497,[844343.56],[900959.4],"[325472, 124071]",0
498,[251694.84],[246188.81],"[325472, 124071]",0


The following demonstrates exporting the shadow deployed logs to the directory `shadow`.

In [42]:
# Save shadow deployed log files as pandas DataFrame

mainpipeline.export_logs(directory="shadow", file_prefix="shadowdeploylogs")
display(os.listdir('./shadow'))


Note: The logs with different schemas are written to separate files in the provided directory.

['shadowdeploylogs-2.json', 'shadowdeploylogs-1.json']

## A/B Testing Pipeline

A/B testing allows inference requests to be split between a control model and one or more challenger models.  For full details, see the [Pipeline Management Guide: A/B Testing](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipeline/#ab-testing).

When the inference results and log entries are displayed, they include the column `out._model_split` which displays:

| Field | Type | Description |
|---|---|---|
| `name` | String | The model name used for the inference.  |
| `version` | String| The version of the model. |
| `sha` | String | The sha hash of the model version. |

For this example, the shadow deployed step will be removed and replaced with an A/B Testing step with the ratio 1:1:1, so the control and each of the challenger models will be split randomly between inference requests.  A set of sample inferences will be run, then the pipeline logs displayed.

pipeline = (wl.build_pipeline("randomsplitpipeline-demo")
            .add_random_split([(2, control), (1, challenger)], "session_id"))

In [43]:
mainpipeline.undeploy()

# remove the shadow deploy steps
mainpipeline.clear()

# Add the a/b test step to the pipeline
mainpipeline.add_random_split([(1, housing_model_control), (1, housing_model_challenger01), (1, housing_model_challenger02)], "session_id")

deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25)\
    .build()

mainpipeline.deploy(deployment_config=deploy_config)

# Perform sample inferences of 20 rows and display the results
ab_date_start = datetime.datetime.now()
abtesting_inputs = pd.read_json('./data/xtest-1k.df.json')

for index, row in abtesting_inputs.sample(20).iterrows():
    display(mainpipeline.infer(row.to_frame('tensor').reset_index()).loc[:,["out._model_split", "out.variable"]])

ab_date_end = datetime.datetime.now()

Waiting for undeployment - this will take up to 45s ..................................... ok
Waiting for deployment - this will take up to 45s ......... ok


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[300542.5]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[580584.3]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[447162.84]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[581002.94]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[944906.25]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[488997.9]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[373955.94]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[868765.4]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[499459.2]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[559631.06]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[344156.25]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[296829.75]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[532923.94]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[878232.2]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[996693.6]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger02"",""version"":""6fc54099-7151-48d7-9e57-6d989fb9bb1c"",""sha"":""ed6065a79d841f7e96307bb20d5ef22840f15da0b587efb51425c7ad60589d6a""}]",[544343.3]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[379076.28]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[585684.3]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrolchallenger01"",""version"":""5b63884e-3f09-4e90-9f09-213350b9c445"",""sha"":""31e92d6ccb27b041a324a7ac22cf95d9d6cc3aa7e8263a229f7c4aec4938657c""}]",[573976.44]


Unnamed: 0,out._model_split,out.variable
0,"[{""name"":""logcontrol"",""version"":""1f93edce-3f3e-4d29-be29-6a4e9303da05"",""sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}]",[310164.06]


In [44]:
## Get the logs with the a/b testing information

metadatalogs = mainpipeline.logs(dataset=["time",
                                          "out", 
                                          "metadata"
                                          ]
                                )

display(metadatalogs.loc[:, ['out.variable', 'metadata.last_model']])

Pipeline log schema has changed over the logs requested 2 newest records retrieved successfully, newest record seen was at <datetime>. Please request additional records separately


Unnamed: 0,out.variable,metadata.last_model
0,[718013.7],"{""model_name"":""logcontrol"",""model_sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}"
1,[718013.7],"{""model_name"":""logcontrol"",""model_sha"":""e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6""}"


In [45]:
# Save a/b testing log files as DataFrame

mainpipeline.export_logs(directory="abtesting", 
                         file_prefix="abtests", 
                         start_datetime=ab_date_start, 
                         end_datetime=ab_date_end)
display(os.listdir('./abtesting'))

['abtests-1.json']

The following exports the metadata with the log files.

In [46]:
# Save a/b testing log files as DataFrame

mainpipeline.export_logs(directory="abtesting-metadata", 
                         file_prefix="abtests", 
                         start_datetime=ab_date_start, 
                         end_datetime=ab_date_end,
                         dataset=["time", "out", "metadata"])
display(os.listdir('./abtesting-metadata'))

['abtests-1.json']

## Anomaly Detection Logs

Wallaroo provides **validations** to detect anomalous data from inference inputs and outputs.  Validations are added to a Wallaroo pipeline with the `wallaroo.pipeline.add_validations` method.

Adding validations takes the format:

```python
pipeline.add_validations(
    validation_name_01 = polars.col(in|out.{column_name}) EXPRESSION,
    validation_name_02 = polars.col(in|out.{column_name}) EXPRESSION
    ...{additional rules}
)
```

* `validation_name`: The user provided name of the validation.  The names must match Python variable naming requirements.
  * **IMPORTANT NOTE**: Using the name `count` as a validation name **returns an error**.  Any validation rules named `count` are dropped upon request and an error returned.
* `polars.col(in|out.{column_name})`: Specifies the **input** or **output** for a specific field aka "column" in an inference result.  Wallaroo inference requests are in the format `in.{field_name}` for **inputs**, and `out.{field_name}` for **outputs**.
  * More than one field can be selected, as long as they follow the rules of the [polars 0.18 Expressions library](https://docs.pola.rs/docs/python/version/0.18/reference/expressions/index.html).
* `EXPRESSION`:  The expression to validate. When the expression returns **True**, that indicates an anomaly detected.

The [`polars` library version 0.18.5](https://docs.pola.rs/docs/python/version/0.18/index.html) is used to create the validation rule.  This is installed by default with the Wallaroo SDK.  This provides a powerful range of comparisons to organizations tracking anomalous data from their ML models.

When validations are added to a pipeline, inference request outputs return the following fields:

| Field | Type | Description |
|---|---|---|
| **anomaly.count** | **Integer** | The total of all validations that returned **True**. |
| **anomaly.{validation name}** | **Bool** | The output of the validation `{validation_name}`. |

When validation returns `True`, **an anomaly is detected**.

For example, adding the validation `fraud` to the following pipeline returns `anomaly.count` of `1` when the validation `fraud` returns `True`.  The validation `fraud` returns `True` when the **output** field **dense_1** at index **0** is greater than 0.9.

```python
sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
sample_pipeline.add_model_step(model)

# add the validation
sample_pipeline.add_validations(
    fraud=pl.col("out.dense_1").list.get(0) > 0.9,
    )

# deploy the pipeline
sample_pipeline.deploy()

# sample inference
display(sample_pipeline.infer_from_file("dev_high_fraud.json", data_format='pandas-records'))
```

|&nbsp;|time|in.tensor|out.dense_1|anomaly.count|anomaly.fraud|
|---|---|---|---|---|---|
|0|2024-02-02 16:05:42.152|[1.0678324729, 18.1555563975, -1.6589551058, 5...]|[0.981199]|1|True|



### Anomaly Detection Inference Requests Example

For this example, we create the validation rule `too_high` which detects houses with a value greater than 1,000,000 and show the output for houses that trigger that validation.

For these examples we'll create a new pipeline to ensure the logs are "clean" for the samples.

In [72]:
import polars as pl

mainpipeline.undeploy()
mainpipeline.clear()
mainpipeline.add_model_step(housing_model_control)
mainpipeline.add_validations(
    too_high=pl.col("out.variable").list.get(0) > 1000000.0
)

deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25)\
    .build()

mainpipeline.deploy(deployment_config=deploy_config)

Waiting for undeployment - this will take up to 45s ...................................... ok
Waiting for deployment - this will take up to 45s ......... ok


0,1
name,logpipeline-test
created,2024-02-09 16:21:09.406182+00:00
last_updated,2024-02-09 16:53:37.061953+00:00
deployed,True
arch,
tags,
versions,"764c7706-c996-42e9-90ff-87b1b496f98d, 05c46dbc-9d72-40d5-bc4c-7fee7bc3e971, 9a4d76f5-9905-4063-8bf8-47e103987515, d5e4882a-3c17-4965-b059-66432a50a3cd, 00b3d5e7-4644-4138-b73d-b0511b3c9e2a, e143a2d5-5641-4dcc-8ae4-786fd777a30a, e2b9d903-4015-4d09-902b-9150a7196cea, 9df38be1-d2f4-4be1-9022-8f0570a238b9, 3078b49f-3eff-48d1-8d9b-a8780b329ecc, 21bff9df-828f-40e7-8a22-449a2e636b44, f78a7030-bd25-4bf7-ba0d-a18cfe3790e0, 10c1ac25-d626-4413-8d5d-1bed42d0e65c, b179b693-b6b6-4ff9-b2a4-2a639d88bc9b, da7b9cf0-81e8-452b-8b70-689406dc9548, a9a9b62c-9d37-427f-99af-67725558bf9b, 1c14591a-96b4-4059-bb63-2d2bc4e308d5, add660ac-0ebf-4a24-bb6d-6cdc875866c8"
steps,logcontrol
published,False


In [73]:
import datetime
import time
import pytz


inference_start = datetime.datetime.now(pytz.utc)

# adding sleep to ensure log distinction
time.sleep(15)

results = mainpipeline.infer_from_file('./data/test-1000.df.json')

inference_end = datetime.datetime.now(pytz.utc)

# first 20 results
display(results.head(20))

# only results that trigger the anomaly too_high
results.loc[results['anomaly.too_high'] == True]

Unnamed: 0,time,in.tensor,out.variable,anomaly.count,anomaly.too_high
0,2024-02-09 16:54:02.507,"[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]",[718013.75],0,False
1,2024-02-09 16:54:02.507,"[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0, 8.0, 2170.0, 0.0, 47.7109, -122.017, 2310.0, 7419.0, 6.0, 0.0, 0.0]",[615094.56],0,False
2,2024-02-09 16:54:02.507,"[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, 8.0, 880.0, 420.0, 47.5893, -122.317, 1300.0, 824.0, 6.0, 0.0, 0.0]",[448627.72],0,False
3,2024-02-09 16:54:02.507,"[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0, 9.0, 2500.0, 0.0, 47.5759, -121.994, 2560.0, 8475.0, 24.0, 0.0, 0.0]",[758714.2],0,False
4,2024-02-09 16:54:02.507,"[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.0, 7.0, 2200.0, 0.0, 47.7659, -122.341, 1690.0, 8038.0, 62.0, 0.0, 0.0]",[513264.7],0,False
5,2024-02-09 16:54:02.507,"[3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0, 8.0, 1070.0, 1070.0, 47.6902, -122.339, 1470.0, 4923.0, 86.0, 0.0, 0.0]",[668288.0],0,False
6,2024-02-09 16:54:02.507,"[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0, 9.0, 3140.0, 450.0, 47.6763, -122.267, 2100.0, 6250.0, 9.0, 0.0, 0.0]",[1004846.5],1,True
7,2024-02-09 16:54:02.507,"[3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, 9.0, 1040.0, 240.0, 47.602, -122.311, 1280.0, 1173.0, 0.0, 0.0, 0.0]",[684577.2],0,False
8,2024-02-09 16:54:02.507,"[4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0, 9.0, 2820.0, 0.0, 47.7255, -122.101, 2440.0, 15000.0, 29.0, 0.0, 0.0]",[727898.1],0,False
9,2024-02-09 16:54:02.507,"[3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3.0, 8.0, 1790.0, 0.0, 47.6297, -122.099, 2290.0, 11894.0, 36.0, 0.0, 0.0]",[559631.1],0,False


Unnamed: 0,time,in.tensor,out.variable,anomaly.count,anomaly.too_high
6,2024-02-09 16:54:02.507,"[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0, 9.0, 3140.0, 450.0, 47.6763, -122.267, 2100.0, 6250.0, 9.0, 0.0, 0.0]",[1004846.5],1,True
30,2024-02-09 16:54:02.507,"[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0]",[1514079.8],1,True
40,2024-02-09 16:54:02.507,"[4.0, 4.5, 5120.0, 41327.0, 2.0, 0.0, 0.0, 3.0, 10.0, 3290.0, 1830.0, 47.7009, -122.059, 3360.0, 82764.0, 6.0, 0.0, 0.0]",[1204324.8],1,True
63,2024-02-09 16:54:02.507,"[4.0, 3.0, 4040.0, 19700.0, 2.0, 0.0, 0.0, 3.0, 11.0, 4040.0, 0.0, 47.7205, -122.127, 3930.0, 21887.0, 27.0, 0.0, 0.0]",[1028923.06],1,True
110,2024-02-09 16:54:02.507,"[4.0, 2.5, 3470.0, 20445.0, 2.0, 0.0, 0.0, 4.0, 10.0, 3470.0, 0.0, 47.547, -122.219, 3360.0, 21950.0, 51.0, 0.0, 0.0]",[1412215.3],1,True
130,2024-02-09 16:54:02.507,"[4.0, 2.75, 2620.0, 13777.0, 1.5, 0.0, 2.0, 4.0, 9.0, 1720.0, 900.0, 47.58, -122.285, 3530.0, 9287.0, 88.0, 0.0, 0.0]",[1223839.1],1,True
133,2024-02-09 16:54:02.507,"[5.0, 2.25, 3320.0, 13138.0, 1.0, 0.0, 2.0, 4.0, 9.0, 1900.0, 1420.0, 47.759, -122.269, 2820.0, 13138.0, 51.0, 0.0, 0.0]",[1108000.1],1,True
154,2024-02-09 16:54:02.507,"[4.0, 2.75, 3800.0, 9606.0, 2.0, 0.0, 0.0, 3.0, 9.0, 3800.0, 0.0, 47.7368, -122.208, 3400.0, 9677.0, 6.0, 0.0, 0.0]",[1039781.25],1,True
160,2024-02-09 16:54:02.507,"[5.0, 3.5, 4150.0, 13232.0, 2.0, 0.0, 0.0, 3.0, 11.0, 4150.0, 0.0, 47.3417, -122.182, 3840.0, 15121.0, 9.0, 0.0, 0.0]",[1042119.1],1,True
210,2024-02-09 16:54:02.507,"[4.0, 3.5, 4300.0, 70407.0, 2.0, 0.0, 0.0, 3.0, 10.0, 2710.0, 1590.0, 47.4472, -122.092, 3520.0, 26727.0, 22.0, 0.0, 0.0]",[1115275.0],1,True


In [None]:
### Anomaly Detection Logs

Pipeline logs retrieves with `wallaroo.pipeline.logs` include the `anomaly` dataset.

In [75]:
logs = mainpipeline.logs(limit=1000)
display(logs)
display(logs.loc[logs['anomaly.too_high'] == True])



Unnamed: 0,time,in.tensor,out.variable,anomaly.count,anomaly.too_high
0,2024-02-09 16:49:26.521,"[3.0, 2.0, 2005.0, 7000.0, 1.0, 0.0, 0.0, 3.0, 7.0, 1605.0, 400.0, 47.6039, -122.298, 1750.0, 4500.0, 34.0, 0.0, 0.0]",[581003.0],0,False
1,2024-02-09 16:49:26.521,"[3.0, 1.75, 2910.0, 37461.0, 1.0, 0.0, 0.0, 4.0, 7.0, 1530.0, 1380.0, 47.7015, -122.164, 2520.0, 18295.0, 47.0, 0.0, 0.0]",[706823.56],0,False
2,2024-02-09 16:49:26.521,"[4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0, 9.0, 1830.0, 1080.0, 47.616, -122.282, 3100.0, 8200.0, 100.0, 0.0, 0.0]",[1060847.5],1,True
3,2024-02-09 16:49:26.521,"[4.0, 1.75, 2700.0, 7875.0, 1.5, 0.0, 0.0, 4.0, 8.0, 2700.0, 0.0, 47.454, -122.144, 2220.0, 7875.0, 46.0, 0.0, 0.0]",[441960.38],0,False
4,2024-02-09 16:49:26.521,"[3.0, 2.5, 2900.0, 23550.0, 1.0, 0.0, 0.0, 3.0, 10.0, 1490.0, 1410.0, 47.5708, -122.153, 2900.0, 19604.0, 27.0, 0.0, 0.0]",[827411.0],0,False
...,...,...,...,...,...
995,2024-02-09 16:49:26.521,"[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.0, 7.0, 2200.0, 0.0, 47.7659, -122.341, 1690.0, 8038.0, 62.0, 0.0, 0.0]",[513264.7],0,False
996,2024-02-09 16:49:26.521,"[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0, 9.0, 2500.0, 0.0, 47.5759, -121.994, 2560.0, 8475.0, 24.0, 0.0, 0.0]",[758714.2],0,False
997,2024-02-09 16:49:26.521,"[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, 8.0, 880.0, 420.0, 47.5893, -122.317, 1300.0, 824.0, 6.0, 0.0, 0.0]",[448627.72],0,False
998,2024-02-09 16:49:26.521,"[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0, 8.0, 2170.0, 0.0, 47.7109, -122.017, 2310.0, 7419.0, 6.0, 0.0, 0.0]",[615094.56],0,False


Unnamed: 0,time,in.tensor,out.variable,anomaly.count,anomaly.too_high
2,2024-02-09 16:49:26.521,"[4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0, 9.0, 1830.0, 1080.0, 47.616, -122.282, 3100.0, 8200.0, 100.0, 0.0, 0.0]",[1060847.5],1,True
26,2024-02-09 16:49:26.521,"[5.0, 2.0, 3540.0, 9970.0, 2.0, 0.0, 3.0, 3.0, 9.0, 3540.0, 0.0, 47.7108, -122.277, 2280.0, 7195.0, 44.0, 0.0, 0.0]",[1085835.8],1,True
34,2024-02-09 16:49:26.521,"[6.0, 4.0, 5310.0, 12741.0, 2.0, 0.0, 2.0, 3.0, 10.0, 3600.0, 1710.0, 47.5696, -122.213, 4190.0, 12632.0, 48.0, 0.0, 0.0]",[2016006.0],1,True
58,2024-02-09 16:49:26.521,"[4.0, 3.75, 3770.0, 4000.0, 2.5, 0.0, 0.0, 5.0, 9.0, 2890.0, 880.0, 47.6157, -122.287, 2800.0, 5000.0, 98.0, 0.0, 0.0]",[1182821.0],1,True
80,2024-02-09 16:49:26.521,"[4.0, 3.25, 5180.0, 19850.0, 2.0, 0.0, 3.0, 3.0, 12.0, 3540.0, 1640.0, 47.562, -122.162, 3160.0, 9750.0, 9.0, 0.0, 0.0]",[1295531.2],1,True
87,2024-02-09 16:49:26.521,"[3.0, 2.25, 2960.0, 8330.0, 1.0, 0.0, 3.0, 4.0, 10.0, 2260.0, 700.0, 47.7035, -122.385, 2960.0, 8840.0, 62.0, 0.0, 0.0]",[1178314.0],1,True
98,2024-02-09 16:49:26.521,"[4.0, 2.25, 4470.0, 60373.0, 2.0, 0.0, 0.0, 3.0, 11.0, 4470.0, 0.0, 47.7289, -122.127, 3210.0, 40450.0, 26.0, 0.0, 0.0]",[1208638.0],1,True
171,2024-02-09 16:49:26.521,"[5.0, 3.5, 3760.0, 10207.0, 2.0, 0.0, 0.0, 3.0, 10.0, 3150.0, 610.0, 47.5605, -122.225, 3550.0, 12118.0, 46.0, 0.0, 0.0]",[1489624.5],1,True
172,2024-02-09 16:49:26.521,"[4.0, 2.5, 3340.0, 10422.0, 2.0, 0.0, 0.0, 3.0, 10.0, 3340.0, 0.0, 47.6515, -122.197, 1770.0, 9490.0, 18.0, 0.0, 0.0]",[1103101.4],1,True
181,2024-02-09 16:49:26.521,"[4.0, 4.0, 4620.0, 130208.0, 2.0, 0.0, 0.0, 3.0, 10.0, 4620.0, 0.0, 47.5885, -121.939, 4620.0, 131007.0, 1.0, 0.0, 0.0]",[1164589.4],1,True


### Undeploy Main Pipeline

With the examples and tutorial complete, we will undeploy the main pipeline and return the resources back to the Wallaroo instance.

In [76]:
mainpipeline.undeploy()

Waiting for undeployment - this will take up to 45s ..................................... ok


0,1
name,logpipeline-test
created,2024-02-09 16:21:09.406182+00:00
last_updated,2024-02-09 16:53:37.061953+00:00
deployed,False
arch,
tags,
versions,"764c7706-c996-42e9-90ff-87b1b496f98d, 05c46dbc-9d72-40d5-bc4c-7fee7bc3e971, 9a4d76f5-9905-4063-8bf8-47e103987515, d5e4882a-3c17-4965-b059-66432a50a3cd, 00b3d5e7-4644-4138-b73d-b0511b3c9e2a, e143a2d5-5641-4dcc-8ae4-786fd777a30a, e2b9d903-4015-4d09-902b-9150a7196cea, 9df38be1-d2f4-4be1-9022-8f0570a238b9, 3078b49f-3eff-48d1-8d9b-a8780b329ecc, 21bff9df-828f-40e7-8a22-449a2e636b44, f78a7030-bd25-4bf7-ba0d-a18cfe3790e0, 10c1ac25-d626-4413-8d5d-1bed42d0e65c, b179b693-b6b6-4ff9-b2a4-2a639d88bc9b, da7b9cf0-81e8-452b-8b70-689406dc9548, a9a9b62c-9d37-427f-99af-67725558bf9b, 1c14591a-96b4-4059-bb63-2d2bc4e308d5, add660ac-0ebf-4a24-bb6d-6cdc875866c8"
steps,logcontrol
published,False
