## Model Insights Canned Data Loader

The following Model Insights Canned Dat Loader is created to support the [Model Insights Tutorial](model-insights.ipynb).  That tutorial is best run with actual historical inference data from a Wallaroo pipeline, but for organizations that do not have access to historical inference data, this will load canned historical data into a sample workspace and pipeline with a supplied model.

This loader should only be run if an organization does not have their own historical data for the Model Insights Tutorial.

This loader will use the following default workspaces, pipeline and model name.  Note that if these are changed, update the [Model Insights Tutorial](model-insights.ipynb) to match.

* `workspace_name` = 'housepricedrift'
* `pipeline_name` = 'housepricepipe'
* `model_name` = 'housepricemodel'

## Steps

### Load Libraries

The first step is to load the required libraries for the data loader.  Part of this includes the Python script `upload_data.py` that is included with this loader.

In [1]:
import datetime as dt
from datetime import datetime, timedelta
import json
import joblib
import requests

import wallaroo
import wallaroo_client
from wallaroo.assay_config import *
from wallaroo.object import EntityNotFoundError

import wallaroo.assay

from upload_data import upload_data # custom code to load canned data

### Workspace, Pipeline and Model

Set the name of the workspace, pipeline and model that will be used to upload the canned historical data.

In [2]:
workspace_name = 'housepricedrift'
pipeline_name = 'housepricepipe'
model_name = 'housepricemodel'

### Connect to Wallaroo Instance

Connect with your Wallaroo instance.

In [3]:
# Get the wallaroo client
wl = wallaroo.Client()

### Create the Workspace, Pipeline and Upload Model

The following segments will use the existing workspace, pipeline and model with the names defined above.  If they do not exist, then they will be created.

In [4]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)

{'name': 'houseprice-drift2', 'id': 8, 'archived': False, 'created_by': 'd962c620-c758-4c39-bce2-710d77024e38', 'created_at': '2022-06-27T19:38:19.621168+00:00', 'models': [], 'pipelines': []}

In [5]:
# Get or create a pipeline. The pipeline is not executed/run/active/deployed in this demo but
# is used to create associated assays. These names are the default names used by upload_data()

try:
    pipeline = wl.pipelines_by_name(pipeline_name)[0]
except EntityNotFoundError:
    price = wl.upload_model(model_name, 'keras_ccfraud.onnx')
    p = wl.build_pipeline(pipeline_name)
    p = p.add_model_step(price)
    pipeline = p.deploy()
    
topic = wl.get_topic_name(pipeline.id())

Waiting for deployment - this will take up to 45s .... ok


### Upload the Canned Historical Data

If there is no history of past inferences, then this historical data will be uploaded into the pipeline.  If historical data exists, then this canned historical data will **not** be uploaded.

This process may take 5-10 minutes depending on the performance of your cluster.

In [6]:
# Check to see if there are any inferences at all
past_inferences = wl.get_raw_pipeline_inference_logs(pipeline_name, datetime.fromtimestamp(0), datetime.now(), model_name, limit=1)
if len(past_inferences) == 0:
    canned_inference_records = joblib.load('inference_records.pkl')
    uploaded_logs =  upload_data(canned_inference_records, pipeline = pipeline_name, model = model_name, topic=topic)
    num_uploaded_logs = len(uploaded_logs)
    print(f"\n Uploaded {num_uploaded_logs} canned logs")

Url: http://plateau:3030/topic/workspace-8-pipeline-houseprice-pipe2-inference/partition/part-1
PARAMS 1640995200000 {'time': '2022-01-01T00:00:00+00:00'}
0 2022-01-01 00:00:00+00:00, 10000 2022-01-06 12:26:35.730000+00:00, 20000 2022-01-12 00:53:11.460000+00:00, 30000 2022-01-17 13:19:47.190000+00:00, 40000 2022-01-23 01:46:22.920000+00:00, 50000 2022-01-28 14:12:58.650000+00:00, 
Final: 56174 2022-01-31 23:59:12.333702+00:00

 Uploaded 56175 canned logs


### Undeploy the Pipeline

Undeploy the pipeline and return the resources to the Wallaroo instance.

In [7]:
pipeline.undeploy()

Waiting for undeployment - this will take up to 45s ................................ ok


0,1
name,houseprice-pipe2
created,2022-06-27 19:38:20.622514+00:00
last_updated,2022-06-27 19:38:20.706593+00:00
deployed,False
tags,
steps,houseprice-model2
