This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/tree/main/model_conversion/statsmodels).

## Introduction

Organizations can deploy a Machine Learning (ML) model based on the [statsmodels](https://www.statsmodels.org/stable/index.html) directly into Wallaroo through the following process.  This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.

This example provides the following:

* `train-statsmodel.ipynb`: A sample Jupyter Notebook that trains a sample model.  The model predicts how many bikes will be rented on each of the next 7 days, based on the previous 7 days' bike rentals, temperature, and wind speed.  Additional files to support this example are:
  * `day.csv`: Data used to train the sample `statsmodel` example.
  * `infer.py`: The inference script that is part of the `statsmodel`.
* `convert-statsmodel-tutorial.ipynb`: A sample Jupyter Notebook that demonstrates how to upload, convert, and deploy the `statsmodel` example into a Wallaroo instance.    Additional files to support this example are:
  * `bike_day_model.pkl`: A `statsmodel` ML model trained from the `train-statsmodel.ipynb` Notebook.

    **IMPORTANT NOTE:** The `statsmodel` ML model is composed of two parts that are contained in the .pkl file:

    * The pickled Python runtime expects a dictionary with two keys: `model` and `script`:

      * `model`—the pickled model, which will be automatically loaded into the python runtime with the name 'model'
      * `script`—the text of the python script to be run, in a format similar to the existing python script steps (i.e. defining a wallaroo_json method which operates on the data).  In this case, the file `infer.py` is the script used.

  * `bike_day_eval.json`: Evaluation data used to test the model's performance.

### Prerequisites

Before uploading and running an inference with a MLFlow model in Wallaroo the following will be required:

* An installed Wallaroo instance.
* The following Python libraries installed:
  * `os`
  * [`wallaroo`](https://pypi.org/project/wallaroo/): The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
  

## Steps

The following steps will perform the following:
    
1. Upload the `statsmodel` ML model `bike_day_model.pkl` into a Wallaroo.
2. Deploy the model into a pipeline.
3. Run a test inference.
4. Undeploy the pipeline.

### Import Libraries

The first step is to import the libraries that we will need.

In [87]:
import wallaroo
from wallaroo.object import EntityNotFoundError

import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
os.environ["ARROW_ENABLED"]="True"

import pandas as pd

### Initialize connection

Start a connect to the Wallaroo instance and save the connection into the variable `wl`.

In [88]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

# SSO login through keycloak

wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR PREFIX"

wallarooPrefix = "doc-test"
wallarooSuffix = "wallaroocommunity.ninja"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

### Set Configurations

The following will set the workspace, model name, and pipeline that will be used for this example.  If the workspace or pipeline already exist, then they will assigned for use in this example.  If they do not exist, they will be created based on the names listed below.

In [89]:
workspace_name = 'bikedayevalworkspace2'
pipeline_name = 'bikedayevalpipeline2'
model_name = 'bikedaymodel2'
model_file_name = 'bike_day_model.pkl'

## Set the Workspace and Pipeline

This sample code will create or use the existing workspace `bike-day-workspace` as the current workspace.

In [90]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(name)
    return pipeline

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline

0,1
name,bikedayevalpipeline2
created,2023-03-31 13:46:40.869639+00:00
last_updated,2023-03-31 14:16:28.298221+00:00
deployed,False
tags,
versions,"77ccc416-9081-4b2e-8ec7-8a01a4a3882c, e856e7fb-8177-450e-81a9-1682a7173228, 79f98d55-5b64-4484-af0e-629021a0af7b, d2b156ed-6faf-4f69-81d7-94befdc400c8, ad996a55-05fa-4868-93b9-5c97c7b0e2b3, c9283735-bce7-4bfc-89f6-fb0a7c4832e5, 2957dd52-7784-467f-967e-b4e3029ff60e"
steps,bikedaymodel2


### Upload Pickled Package Statsmodel Model

Upload the statsmodel stored into the pickled package `bike_day_model.pkl`.  See the Notebook `train-statsmodel.ipynb` for more details on creating this package.

Note that this package is being specified as a `python` configuration.

In [91]:
bike_day_model = wl.upload_model(model_name, model_file_name).configure(runtime="python")

### Deploy the Pipeline

We will now add the uploaded model as a step for the pipeline, then deploy it.

In [92]:
pipeline.add_model_step(bike_day_model)

0,1
name,bikedayevalpipeline2
created,2023-03-31 13:46:40.869639+00:00
last_updated,2023-03-31 14:16:28.298221+00:00
deployed,False
tags,
versions,"77ccc416-9081-4b2e-8ec7-8a01a4a3882c, e856e7fb-8177-450e-81a9-1682a7173228, 79f98d55-5b64-4484-af0e-629021a0af7b, d2b156ed-6faf-4f69-81d7-94befdc400c8, ad996a55-05fa-4868-93b9-5c97c7b0e2b3, c9283735-bce7-4bfc-89f6-fb0a7c4832e5, 2957dd52-7784-467f-967e-b4e3029ff60e"
steps,bikedaymodel2


In [93]:
pipeline.deploy()

0,1
name,bikedayevalpipeline2
created,2023-03-31 13:46:40.869639+00:00
last_updated,2023-03-31 14:19:32.701608+00:00
deployed,True
tags,
versions,"c9266ee7-fb33-4f02-a97b-86962b686477, 77ccc416-9081-4b2e-8ec7-8a01a4a3882c, e856e7fb-8177-450e-81a9-1682a7173228, 79f98d55-5b64-4484-af0e-629021a0af7b, d2b156ed-6faf-4f69-81d7-94befdc400c8, ad996a55-05fa-4868-93b9-5c97c7b0e2b3, c9283735-bce7-4bfc-89f6-fb0a7c4832e5, 2957dd52-7784-467f-967e-b4e3029ff60e"
steps,bikedaymodel2


In [94]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.1.18',
   'name': 'engine-5bccd5699d-5gc4x',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'bikedayevalpipeline2',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'bikedaymodel2',
      'version': '183ce72a-1b88-4fb3-a939-2762ee86a1fe',
      'sha': '978f2274c384f4050f2ece7c96a5f9bb5d6f96b387c0c5cbea5d4f36e2fbf18e',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.2.13',
   'name': 'engine-lb-ddd995646-lkjbt',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

### Run Inference

Perform an inference from the evaluation data JSON file `bike_day_eval.json`.

In [95]:
inferData = pd.read_json('./bike_day_eval.df.json', orient="records")

results = pipeline.infer(inferData)


# results = pipeline.infer_from_file('bike_day_eval.df.json')
display(results)

InferenceError: Inference failed: 
	InternalError: Error running inference <class 'AttributeError'>, 'str' object has no attribute 'loc'

### Undeploy the Pipeline

Undeploy the pipeline and return the resources back to the Wallaroo instance.

In [None]:
pipeline.undeploy()