## Statsmodel Forecast AB Testing

A/B  Testing is one method of models against each other.  This demonstration will show how to use the Wallaroo pipeline step `add_random_split` and `replace_with_random_split` to randomly submit inference input data into control and challenger models.

## Prerequisites

* A Wallaroo instance version 2023.2.1 or greater.

## References

* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
* [Wallaroo SDK Essentials: Inference Guide: Parallel Inferences](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences)

## A/B Testing

### Import Libraries

The first step is to import the libraries that we will need.

In [1]:
import json
import os
import datetime

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
import numpy as np

from resources import simdb
from resources import util

pd.set_option('display.max_colwidth', None)

In [2]:
display(wallaroo.__version__)

'2023.2.1'

### Initialize connection

Start a connect to the Wallaroo instance and save the connection into the variable `wl`.

In [66]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

wallarooPrefix = "doc-test."
wallarooSuffix = "wallaroocommunity.ninja"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}keycloak.{wallarooSuffix}", 
                    auth_type="sso")

### Set Configurations

The following will set the workspace, model name, and pipeline that will be used for this example.  If the workspace or pipeline already exist, then they will assigned for use in this example.  If they do not exist, they will be created based on the names listed below.

Workspace names must be unique.  To allow this tutorial to run in the same Wallaroo instance for multiple users, the `suffix` variable is generated from a random set of 4 ASCII characters.  To use the same workspace across the tutorial notebooks, hard code `suffix` and verify the workspace name created is is unique across the Wallaroo instance.

In [67]:
# used for unique connection names

suffix='-jch'

workspace_name = f'forecast-model-workshop{suffix}'

pipeline_name = 'forecast-workshop-pipeline'

### Set the Workspace and Pipeline

The workspace will be either used or created if it does not exist, along with the pipeline.

In [68]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(name)
    return pipeline

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)

### Upload Model

The Python model created in "Forecast and Parallel Infer with Statsmodel: Model Creation" will now be uploaded.  Note that the Framework and runtime are set to `python`.

In [69]:
# upload three models:  the control and two challengers

control_model_name = 'forecast-control-model'
control_model_file = './forecast_standard.py'

challenger01_model_name = 'forecast-challenger01-model'
challenger01_model_file = './forecast_alternate01.py'

challenger02_model_name = 'forecast-challenger02-model'
challenger02_model_file = './forecast_alternate02.py'

# upload the models

control_model = wl.upload_model(control_model_name, control_model_file, Framework.PYTHON).configure(runtime="python")

challenger_model_01 = wl.upload_model(challenger01_model_name, challenger01_model_file, Framework.PYTHON).configure(runtime="python")

challenger_model_02 = wl.upload_model(challenger02_model_name, challenger02_model_file, Framework.PYTHON).configure(runtime="python")


### Deploy the Pipeline

We will now add the uploaded model as a step for the pipeline, then deploy it.  The pipeline configuration will allow for multiple replicas of the pipeline to be deployed and spooled up in the cluster.  Each pipeline replica will use 0.25 cpu and 512 Gi RAM.

In [70]:
# Set the deployment to allow for additional engines to run
# deploy_config = (wallaroo.DeploymentConfigBuilder()
#                         .replica_count(1)
#                         .replica_autoscale_min_max(minimum=2, maximum=5)
#                         .cpus(0.25)
#                         .memory("512Mi")
#                         .build()
#                     )

pipeline.add_model_step(control_model)
# pipeline.deploy()

0,1
name,forecast-workshop-pipeline
created,2023-07-25 17:41:56.086042+00:00
last_updated,2023-07-25 17:51:43.782085+00:00
deployed,True
tags,
versions,"c6679593-811b-435d-9dbb-a5258616cd37, b3ebff58-b8c5-4fae-962a-2072c28f1867, 39de1087-fc8e-4910-b398-d8530f3e9145, c3f88026-20c8-4daf-ba9b-5b8c3c0dfb35, fc94da61-008a-46d1-845f-f8944f34ca4b, 8c5b1e66-b107-4e1d-97c0-08873021cdd6, 646240d7-a214-4770-96de-4cdc1205d86e, 6b6dfa50-f60f-4c96-b397-d8ed2ed01da5, 9a8aee13-2d68-403b-aa0c-016d7417e102, 929a2ce8-88e4-407d-8686-6886c7421030, db0c914a-1f75-4c2e-a22e-0ac95a116302, bac4c9a8-9d05-4256-977f-b20e3bc2da3b"
steps,forecast-control-model


### Run Inference
For this example, we will forecast bike rentals by looking back one month from "today" which will be set as 2011-02-22.  The data from 2011-01-23 to 2011-01-27 (the 5 days starting from one month back) are used to generate a forecast for what bike sales will be over the next week from "today", which will be 2011-02-23 to 2011-03-01.

In [57]:
# inferencedata = json.load(open("./data/testdata_dict.json"))

# results = pipeline.infer(inferencedata)

# display(results)

[{'forecast': [1764, 1749, 1743, 1741, 1740, 1740, 1740]}]

In [58]:
pipeline.replace_with_random_split(0, 
                                   [(1, control_model), 
                                    (1, challenger_model_01), 
                                    (1, challenger_model_02)], 
                                    "session_id"
                                    )
pipeline.deploy()

0,1
name,forecast-workshop-pipeline
created,2023-07-25 17:41:56.086042+00:00
last_updated,2023-07-25 17:51:43.782085+00:00
deployed,True
tags,
versions,"c6679593-811b-435d-9dbb-a5258616cd37, b3ebff58-b8c5-4fae-962a-2072c28f1867, 39de1087-fc8e-4910-b398-d8530f3e9145, c3f88026-20c8-4daf-ba9b-5b8c3c0dfb35, fc94da61-008a-46d1-845f-f8944f34ca4b, 8c5b1e66-b107-4e1d-97c0-08873021cdd6, 646240d7-a214-4770-96de-4cdc1205d86e, 6b6dfa50-f60f-4c96-b397-d8ed2ed01da5, 9a8aee13-2d68-403b-aa0c-016d7417e102, 929a2ce8-88e4-407d-8686-6886c7421030, db0c914a-1f75-4c2e-a22e-0ac95a116302, bac4c9a8-9d05-4256-977f-b20e3bc2da3b"
steps,forecast-control-model


### Replace Pipeline Step with Random Step

A 2:1:1 weighted random split - control will get 50% of the inference requests, the other two models 25% each.

In [74]:
inferencedata = json.load(open("./data/testdata_dict.json"))

results = pipeline.infer(inferencedata)

import time
import datetime

start_time = datetime.datetime.now()

time.sleep(5)

results = pipeline.infer(inferencedata)

end_time = datetime.datetime.now()

display(results)

[{'forecast': [1814, 1814, 1814, 1814, 1814, 1814, 1814]}]

In [73]:
pipeline.logs(start_datetime=start_time, end_datetime=end_time)

Unnamed: 0,time,in.json,out.json,check_failures
0,2023-07-25 18:37:44.321,"{""cnt"":[1526,1550,1708,1005,1623,1712,1530,1605,1538,1746,1472,1589,1913,1815,2115,2475,2927,1635,1812,1107,1450,1917,1807,1461,1969,2402,1446,1851]}","{""forecast"":[1703,1757,1737,1744,1742,1743,1742]}",0


### Sample Inferences

Run the following 10 times and we should see where the inference results lie.

In [9]:
# retrieve forecast schedule
first_day, analysis_days = util.get_forecast_days()

print(f'Running analysis on {first_day}')

Running analysis on 2011-02-22


In [27]:
# connect to SQL data base 
conn = simdb.get_db_connection()
print(f'Bike rentals table: {simdb.tablename}')

# create the query and retrieve data
query = util.mk_dt_range_query(tablename=simdb.tablename, forecast_day=first_day)
print(query)
data = pd.read_sql_query(query, conn)
data.head()

Bike rentals table: bikerentals
select cnt from bikerentals where date > DATE(DATE('2011-02-22'), '-1 month') AND date <= DATE('2011-02-22')


Unnamed: 0,cnt
0,986
1,1416
2,1985
3,506
4,431


In [11]:
pd.read_sql_query("select date, cnt from bikerentals where date > DATE(DATE('2011-02-22'), '-1 month') AND date <= DATE('2011-02-22') LIMIT 5", conn)

Unnamed: 0,date,cnt
0,2011-01-23,986
1,2011-01-24,1416
2,2011-01-25,1985
3,2011-01-26,506
4,2011-01-27,431


In [12]:
# send data to model for forecast

results = pipeline.infer(data.to_dict(orient='list'))[0]
results


{'forecast': [1462, 1483, 1497, 1507, 1513, 1518, 1521]}

In [13]:
# annotate with the appropriate dates (the next seven days)
resultframe = pd.DataFrame({
    'date' : util.get_forecast_dates(first_day),
    'forecast' : results['forecast']
})

# write the new data to the db table "bikeforecast"
resultframe.to_sql('bikeforecast', conn, index=False, if_exists='append')

# display the db table
query = "select date, forecast from bikeforecast"
pd.read_sql_query(query, conn)

Unnamed: 0,date,forecast
0,2011-02-23,1462
1,2011-02-24,1483
2,2011-02-25,1497
3,2011-02-26,1507
4,2011-02-27,1513
5,2011-02-28,1518
6,2011-03-01,1521


### Four Weeks of Inference Data

Now we'll go back staring at the "current data" of 2011-03-01, and fetch each week's data across the month.  This will be used to submit 5 inference requests through the Pipeline `parallel_infer` method.

The inference data is saved into the `inference_data` List - each element in the list will be a separate inference request.

In [14]:
# get our list of items to run through

inference_data = []

content_type = "application/json"

days = []

for day in analysis_days:
    print(f"Current date: {day}")
    days.append(day)
    query = util.mk_dt_range_query(tablename=simdb.tablename, forecast_day=day)
    print(query)
    data = pd.read_sql_query(query, conn)
    inference_data.append(data.to_dict(orient='list'))

Current date: 2011-03-01
select cnt from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')
Current date: 2011-03-08
select cnt from bikerentals where date > DATE(DATE('2011-03-08'), '-1 month') AND date <= DATE('2011-03-08')
Current date: 2011-03-15
select cnt from bikerentals where date > DATE(DATE('2011-03-15'), '-1 month') AND date <= DATE('2011-03-15')
Current date: 2011-03-22
select cnt from bikerentals where date > DATE(DATE('2011-03-22'), '-1 month') AND date <= DATE('2011-03-22')
Current date: 2011-03-29
select cnt from bikerentals where date > DATE(DATE('2011-03-29'), '-1 month') AND date <= DATE('2011-03-29')


### Parallel Inference Request

The List `inference_data` will be submitted.  Recall that the pipeline deployment can spool up to 5 replicas.

The pipeline `parallel_infer(tensor_list, timeout, num_parallel, retries)` **asynchronous** method performs an inference as defined by the pipeline steps and takes the following arguments:

* **tensor_list** (*REQUIRED List*): The data submitted to the pipeline for inference as a List of the supported data types:
  * [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html):  Data submitted as a pandas DataFrame are returned as a pandas DataFrame.  For models that output one column  based on the models outputs.
  * [Apache Arrow](https://arrow.apache.org/) (**Preferred**): Data submitted as an Apache Arrow are returned as an Apache Arrow.
* **timeout** (*OPTIONAL int*): A timeout in seconds before the inference throws an exception.  The default is 15 second per call to accommodate large, complex models.  Note that for a batch inference, this is **per list item** - with 10 inference requests, each would have a default timeout of 15 seconds.
* **num_parallel** (*OPTIONAL int*):  The number of parallel threads used for the submission.  **This should be no more than four times the number of pipeline replicas**.
* **retries** (*OPTIONAL int*):  The number of retries per inference request submitted.

`parallel_infer` is an asynchronous method that returns the Python callback list of tasks. Calling `parallel_infer` should be called with the `await` keyword to retrieve the callback results.

For more details, see the Wallaroo [parallel inferences guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences).

In [24]:
parallel_results = await pipeline.parallel_infer(tensor_list=inference_data, timeout=20, num_parallel=16, retries=2)

display(parallel_results)

[[{'forecast': [1764, 1749, 1743, 1741, 1740, 1740, 1740]}],
 [{'forecast': [1735, 1858, 1755, 1841, 1770, 1829, 1780]}],
 [{'forecast': [1878, 1851, 1858, 1856, 1857, 1856, 1856]}],
 [{'forecast': [2363, 2316, 2277, 2243, 2215, 2192, 2172]}],
 [{'forecast': [2225, 2133, 2113, 2109, 2108, 2108, 2108]}]]

### Upload into DataBase

With our results, we'll merge the results we have into the days we were looking to analyze.  Then we can upload the results into the sample database and display the results.

In [25]:
# merge the days and the results

days_results = list(zip(days, parallel_results))

In [28]:
# upload to the database
for day_result in days_results:
    resultframe = pd.DataFrame({
        'date' : util.get_forecast_dates(day_result[0]),
        'forecast' : day_result[1][0]['forecast']
    })
    resultframe.to_sql('bikeforecast', conn, index=False, if_exists='append')

On April 1st, we can compare March forecasts to actuals

In [29]:
query = f'''SELECT bikeforecast.date AS date, forecast, cnt AS actual
            FROM bikeforecast LEFT JOIN bikerentals
            ON bikeforecast.date = bikerentals.date
            WHERE bikeforecast.date >= DATE('2011-03-01')
            AND bikeforecast.date <  DATE('2011-04-01')
            ORDER BY 1'''

print(query)


comparison = pd.read_sql_query(query, conn)
comparison

SELECT bikeforecast.date AS date, forecast, cnt AS actual
            FROM bikeforecast LEFT JOIN bikerentals
            ON bikeforecast.date = bikerentals.date
            WHERE bikeforecast.date >= DATE('2011-03-01')
            AND bikeforecast.date <  DATE('2011-04-01')
            ORDER BY 1


Unnamed: 0,date,forecast,actual
0,2011-03-02,1764,2134
1,2011-03-03,1749,1685
2,2011-03-04,1743,1944
3,2011-03-05,1741,2077
4,2011-03-06,1740,605
5,2011-03-07,1740,1872
6,2011-03-08,1740,2133
7,2011-03-09,1735,1891
8,2011-03-10,1858,623
9,2011-03-11,1755,1977


### Undeploy the Pipeline

Undeploy the pipeline and return the resources back to the Wallaroo instance.

In [30]:
conn.close()
pipeline.undeploy()

0,1
name,bikedaypipe
created,2023-07-14 15:53:07.284131+00:00
last_updated,2023-07-14 15:56:07.413409+00:00
deployed,False
tags,
versions,"9c67dd93-014c-4cc9-9b44-549829e613ad, 258dafaf-c272-4bda-881b-5998a4a9be26"
steps,bikedaymodel


In [20]:
pipeline.undeploy()

0,1
name,bikedaypipe
created,2023-07-25 17:23:07.593591+00:00
last_updated,2023-07-25 17:32:33.836509+00:00
deployed,False
tags,
versions,"987e07a3-360f-47cc-a455-ffa5907acb0c, fbb8b90c-1b89-4af0-b826-4b268edc63d7, d26265c8-de6b-4410-920b-481ed7d3b052, b4bd4182-7c51-43a8-bba5-5a8bbdd53240, 4662d677-2c5d-42cb-bf6e-6323b8c2982f, 339d6e82-3c90-4359-a565-a38cf6820920, 298a5e03-ac29-46ab-a840-0b498bc15b4f, 68b21f55-893b-4698-887b-eaa71d61dd77"
steps,forecast-control-model
