## Statsmodel Forecast AB Testing

A/B Testing is one method of models against each other.  This demonstration will show how to use the Wallaroo pipeline step `add_random_split` and `replace_with_random_split` to randomly submit inference input data into control and challenger models.

## Prerequisites

* A Wallaroo instance version 2023.2.1 or greater.

## References

* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
* [Wallaroo SDK Essentials: Inference Guide: Parallel Inferences](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences)

## A/B Testing

A/B testing is a method that provides the ability to test out ML models for performance, accuracy or other useful benchmarks.  A/B testing is contrasted with the Wallaroo Shadow Deployment feature.  In both cases, two sets of models are added to a pipeline step:

* Control or Champion model:  The model currently used for inferences.
* Challenger model(s): One or more models that are to be compared to the champion model.

The two feature are different in this way:

| Feature | Description |
|---|---|
| A/B Testing | A subset of inferences are submitted to either the champion ML model or a challenger ML model. |
| Shadow Deploy | All inferences are submitted to the champion model and one or more challenger models. |

Wallaroo implements A/B testing via a pipeline step as either `wallaroo.pipeline.add_random_split` to set a new pipeline step or `wallaroo.pipeline.replace_with_random_split` to replace an existing step.

## Tutorial Steps

### Import Libraries

The first step is to import the libraries that we will need.

In [23]:
import json
import os
import datetime

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
import numpy as np

from resources import simdb
from resources import util

pd.set_option('display.max_colwidth', None)

In [24]:
display(wallaroo.__version__)

'2023.2.1'

### Connect to the Wallaroo Instance

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [25]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

wallarooPrefix = "doc-test."
wallarooSuffix = "wallaroocommunity.ninja"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}keycloak.{wallarooSuffix}", 
                    auth_type="sso")

### Set Configurations

The following will set the workspace, model name, and pipeline that will be used for this example.  If the workspace or pipeline already exist, then they will assigned for use in this example.  If they do not exist, they will be created based on the names listed below.

Workspace names must be unique.  To allow this tutorial to run in the same Wallaroo instance for multiple users, set the `suffix` variable or share the workspace with other users.

#### Set Configurations References

* [Wallaroo SDK Essentials Guide: Workspace Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-workspace/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)

In [26]:
# used for unique connection names

suffix='john'

workspace_name = f'forecast-model-workshop{suffix}'

pipeline_name = 'forecast-workshop-pipeline'

### Set the Workspace, Pipeline and Models

The workspace will be either used or created if it does not exist, along with the pipeline.

The models were uploaded in the Upload and Deploy notebook.

In [27]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(name)
    return pipeline

# Get the most recent version of a model in the workspace
# Assumes that the most recent version is the first in the list of versions.
# wl.get_current_workspace().models() returns a list of models in the current workspace

def get_model(mname):
    modellist = wl.get_current_workspace().models()
    model = [m.versions()[0] for m in modellist if m.name() == mname]
    if len(model) <= 0:
        raise KeyError(f"model {mname} not found in this workspace")
    return model[0]

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)

# upload three models:  the control and two challengers

control_model_name = 'forecast-control-model'
challenger01_model_name = 'forecast-challenger01-model'
challenger02_model_name = 'forecast-challenger02-model'

# upload the models

bike_day_model = get_model(control_model_name)

challenger_model_01 = get_model(challenger01_model_name)

challenger_model_02 = get_model(challenger02_model_name)


### Deploy the Pipeline

We will now add the uploaded model as a step for the pipeline, then deploy it.  The pipeline configuration will allow for multiple replicas of the pipeline to be deployed and spooled up in the cluster.  Each pipeline replica will use 0.25 cpu and 512 Gi RAM.

In [28]:
# Set the deployment to allow for additional engines to run
# Undeploy and clear the pipeline in case it was used in other demonstrations
pipeline.undeploy()
pipeline.clear()
deploy_config = (wallaroo.DeploymentConfigBuilder()
                        .replica_count(1)
                        .replica_autoscale_min_max(minimum=2, maximum=5)
                        .cpus(0.25)
                        .memory("512Mi")
                        .build()
                    )

pipeline.add_model_step(bike_day_model)
pipeline.deploy(deployment_config = deploy_config)

0,1
name,forecast-workshop-pipeline
created,2023-08-02 15:50:59.480547+00:00
last_updated,2023-08-02 15:53:36.992849+00:00
deployed,True
tags,
versions,"3abf03dd-8eab-4a8d-8432-aa85a30c0eda, 5ec5e8dc-7492-498b-9652-b3733e4c87f7, 1d89287b-4eff-47ec-a7bb-8cedaac1f33f"
steps,forecast-control-model


### Run Inference

Run a test inference to verify the pipeline is operational from the sample test data stored in `./data/testdata_dict.json`.

In [31]:
inferencedata = pd.read_json("./data/testdata_standard.df.json")
display(inferencedata)

results = pipeline.infer(inferencedata)

display(results)

Unnamed: 0,count
0,"[1526, 1550, 1708, 1005, 1623, 1712, 1530, 1605, 1538, 1746, 1472, 1589, 1913, 1815, 2115, 2475, 2927, 1635, 1812, 1107, 1450, 1917, 1807, 1461, 1969, 2402, 1446, 1851]"


Please log into the following URL in a web browser:

	https://doc-test.keycloak.wallaroocommunity.ninja/auth/realms/master/device?user_code=ACDV-MFES

Login successful!


Unnamed: 0,time,in.count,out._model_split,out.forecast,out.weekly_average,check_failures
0,2023-08-02 16:26:33.022,"[1526, 1550, 1708, 1005, 1623, 1712, 1530, 1605, 1538, 1746, 1472, 1589, 1913, 1815, 2115, 2475, 2927, 1635, 1812, 1107, 1450, 1917, 1807, 1461, 1969, 2402, 1446, 1851]","[{""name"":""forecast-challenger01-model"",""version"":""5bb81da9-f6ce-4d09-8435-9b27b0846c00"",""sha"":""77d1045ee551cb101435be265344b4483d53892e67cc0a20a70cf7e2ccfdd4a0""}]","[1703, 1757, 1737, 1744, 1742, 1743, 1742]",1738.285714,0


### Create A/B Step

Here we will configure a pipeline with two models and set the control model with a random split chance of receiving 2/3 of the data.  Because this is a random split, it is possible for one model or the other to receive more inferences than a strict 2:1 ratio, but the more inferences are run, the more likely it is for the proper ratio split.

The control model was already set as a python step, so we will be replacing it with the random split step, then deploying the pipeline to set the configuration.

In [32]:
pipeline.replace_with_random_split(0, 
                                   [(2, bike_day_model), 
                                    (1, challenger_model_01), 
                                    (1, challenger_model_02)], 
                                    "session_id"
                                    )
pipeline.deploy()

0,1
name,forecast-workshop-pipeline
created,2023-08-02 15:50:59.480547+00:00
last_updated,2023-08-02 16:26:42.554024+00:00
deployed,True
tags,
versions,"ab4e58bf-3b75-4bf6-b6b3-f703fe61e7af, 3773f5c5-e4c5-4e46-a839-6945af15ca13, 3abf03dd-8eab-4a8d-8432-aa85a30c0eda, 5ec5e8dc-7492-498b-9652-b3733e4c87f7, 1d89287b-4eff-47ec-a7bb-8cedaac1f33f"
steps,forecast-control-model


### Inference via Random Split

A sample inference will be run.  Because this is a model that accepts DataFrame as the input and outputs, we can see which model was from the `out._model_split` field, which is automatically included when a Random Split pipeline step is used.

For the later log steps, we will run the inference 10 times to view the results, and how they change with the same input depending on which model the inference request is submitted to.

In [54]:
for i in range(10):
    results = pipeline.infer(inferencedata)
    results['model'] = results['out._model_split'].apply(lambda x: json.loads(x[0])['name'])
    display(results.loc[:, ['time', 'model', 'out.forecast', 'out.weekly_average']])
    # display(results)

Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:46.329,forecast-challenger02-model,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:46.898,forecast-challenger02-model,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:47.349,forecast-control-model,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:47.851,forecast-control-model,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:48.325,forecast-control-model,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:48.780,forecast-challenger02-model,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:49.307,forecast-challenger01-model,"[1703, 1757, 1737, 1744, 1742, 1743, 1742]",1738.285714


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:49.809,forecast-control-model,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:50.280,forecast-challenger02-model,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0


Unnamed: 0,time,model,out.forecast,out.weekly_average
0,2023-08-02 16:39:50.762,forecast-control-model,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714


To view what model was used for the inference, we look at the pipeline logs with the dataset `metadata` to retrieve additional information.

The following will retrieves the `model_name` parameter from the `metadata.last_model` metadata.  This will allow us to easily identify which model was used with which inference request and output.

In [56]:
def get_log_model(df: pd.DataFrame):
    return df['metadata.last_model'].apply(lambda x: json.loads(x)['model_name'])

In [58]:
logs = pipeline.logs(limit=10, dataset=['time', 'out.forecast', 'out.weekly_average', 'metadata'])
logs['model'] = get_log_model(logs)

logs.loc[:, ["time", "out.forecast", "out.weekly_average", "model"]]



Unnamed: 0,time,out.forecast,out.weekly_average,model
0,2023-08-02 16:39:50.280,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0,forecast-challenger02-model
1,2023-08-02 16:39:48.325,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
2,2023-08-02 16:39:47.349,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
3,2023-08-02 16:39:46.898,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0,forecast-challenger02-model
4,2023-08-02 16:37:46.965,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
5,2023-08-02 16:37:45.974,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
6,2023-08-02 16:37:45.032,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
7,2023-08-02 16:37:44.291,"[1703, 1757, 1737, 1744, 1742, 1743, 1742]",1738.285714,forecast-challenger01-model
8,2023-08-02 16:37:35.428,"[1764, 1749, 1743, 1741, 1740, 1740, 1740]",1745.285714,forecast-control-model
9,2023-08-02 16:37:34.029,"[1814, 1814, 1814, 1814, 1814, 1814, 1814]",1814.0,forecast-challenger02-model


### Undeploy the Pipeline

Undeploy the pipeline and return the resources back to the Wallaroo instance.

In [59]:
pipeline.undeploy()

0,1
name,forecast-workshop-pipeline
created,2023-08-02 15:50:59.480547+00:00
last_updated,2023-08-02 16:26:42.554024+00:00
deployed,False
tags,
versions,"ab4e58bf-3b75-4bf6-b6b3-f703fe61e7af, 3773f5c5-e4c5-4e46-a839-6945af15ca13, 3abf03dd-8eab-4a8d-8432-aa85a30c0eda, 5ec5e8dc-7492-498b-9652-b3733e4c87f7, 1d89287b-4eff-47ec-a7bb-8cedaac1f33f"
steps,forecast-control-model
