## Statsmodel Forecast with Wallaroo Features: Model Creation

This tutorial series demonstrates how to use Wallaroo to create a Statsmodel forecasting model based on bike rentals.  This is based on the statsmodel ARIMA model that is used as a Python model in these steps.

This notebook is focused on demonstrating this model outside of Wallaroo deployment.  Other notebooks will focus on the core pillars of using Wallaroo with that model to:

* Upload
* Deploy
* Observe
* Optimize
* Automate

## Prerequisites

* A Wallaroo instance version 2023.2.1 or greater.

## References

* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
* [Wallaroo SDK Essentials: Inference Guide: Parallel Inferences](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences)

## Tutorial Steps

### Import Libraries

First we will import the libraries we need.

In [1]:
import pandas as pd
import datetime
import os

from statsmodels.tsa.arima.model import ARIMA
from resources import simdb as simdb


import pyarrow as pa

### Test the Model

The model's main function is contained in this code from `forecast_standard.py`:

```python
from statsmodels.tsa.arima.model import ARIMA

def _fit_model(dataframe):
    model = ARIMA(dataframe['count'], 
                    order=(1, 0, 1)
                    ).fit()
    return model
```

To create other versions of this model, the `order` is altered.

Python models uploaded to Wallaroo must have the method `wallaroo_json` as the entry point for the Wallaroo engine.  In this example, it takes data retrieved from the CSV file `day.csv`, shapes it and submits it to the method above, then returns the results.

```python
def wallaroo_json(data):
    obj = json.loads(data)
    evaluation_frame = pd.DataFrame.from_dict(obj)

    nforecast = 7
    model = _fit_model(evaluation_frame)

    forecast =  model.forecast(steps=nforecast).round().to_numpy()
    forecast = forecast.astype(int)

    return {"forecast": forecast.tolist()}
```

The final result is a prediction of the next 7 days of bike rentals based on the previous month.

The code below will use a simulated database to retrieve bike rental data from one month before March 1 2011, then use it to evaluate the model and check it's forecasts.

In [2]:
def mk_dt_range_query(*, tablename: str, seed_day: str) -> str:
    assert isinstance(tablename, str)
    assert isinstance(seed_day, str)
    query = f"select count from {tablename} where date > DATE(DATE('{seed_day}'), '-1 month') AND date <= DATE('{seed_day}')"
    return query

conn = simdb.get_db_connection()

# create the query
query = mk_dt_range_query(tablename=simdb.tablename, seed_day='2011-03-01')
print(query)

# read in the data
training_frame = pd.read_sql_query(query, conn)
training_frame

select count from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')


Unnamed: 0,count
0,1526
1,1550
2,1708
3,1005
4,1623
5,1712
6,1530
7,1605
8,1538
9,1746


In [None]:
## turn this into a single array dataframe

In [16]:
inference_frame = pd.DataFrame({'count': [training_frame['count'].values.tolist()]})

display(inference_frame)

Unnamed: 0,count
0,"[1526, 1550, 1708, 1005, 1623, 1712, 1530, 160..."


In [18]:
inference_frame.loc[0, 'count']

[1526,
 1550,
 1708,
 1005,
 1623,
 1712,
 1530,
 1605,
 1538,
 1746,
 1472,
 1589,
 1913,
 1815,
 2115,
 2475,
 2927,
 1635,
 1812,
 1107,
 1450,
 1917,
 1807,
 1461,
 1969,
 2402,
 1446,
 1851]

## Test the Forecast

The training frame is then loaded, and tested against our `forecast` model.

In [35]:
# test
import forecast_standard
import json

# reload if the model was changed since last run
import importlib
importlib.reload(forecast_standard)

# create the appropriate json
# jsonstr = json.dumps(training_frame.to_dict(orient='list'))
# print(jsonstr)

# result = forecast_standard.wallaroo_json(jsonstr)
# print(result)

inference_frame = pd.DataFrame({"count": [15]})
display(inference_frame)
display(pa.Schema.from_pandas(inference_frame))

result = forecast_standard.wallaroo_json(inference_frame)
print(result)

Unnamed: 0,count
0,15


count: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 365

[{'forecast': 15}]


In [31]:
test = pd.DataFrame(result)

input_schema = pa.Schema.from_pandas(inference_frame)
output_schema = pa.Schema.from_pandas(test)

display(test)
display(input_schema)
display(output_schema)

Unnamed: 0,forecast
0,15


count: list<item: int64>
  child 0, item: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 372

forecast: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 371

In [23]:
# test
import step

# reload if the model was changed since last run
import importlib
importlib.reload(step)

step.wallaroo_json(result)

[array([1764, 1749, 1743, 1741, 1740, 1740, 1740]), {'average_rentals': 1500}]

We'll also test out other versions of this model.  `forecast_alternative01` has the ARIMA `order` set to `order=(1, 1, 0)`, and `forecast_alterative02` has the ARIMA `order` set to `order=(0, 1, 1)`.

In [8]:
# test
import forecast_alternate01
import json

# reload if the model was changed since last run
import importlib
importlib.reload(forecast_alternate01)

# create the appropriate json
jsonstr = json.dumps(training_frame.to_dict(orient='list'))

forecast_alternate01.wallaroo_json(jsonstr)

{'forecast': [1703, 1757, 1737, 1744, 1742, 1743, 1742]}

In [9]:
# test
import forecast_alternate02
import json

# reload if the model was changed since last run
import importlib
importlib.reload(forecast_alternate02)

# create the appropriate json
jsonstr = json.dumps(training_frame.to_dict(orient='list'))

forecast_alternate02.wallaroo_json(jsonstr)

{'forecast': [1814, 1814, 1814, 1814, 1814, 1814, 1814]}