This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/tree/main/wallaroo-features/pipeline_multiple_replicas_forecast_tutorial).

## Statsmodel Forecast with Wallaroo Features: Model Creation

This tutorial series demonstrates how to use Wallaroo to create a Statsmodel forecasting model based on bike rentals.  This tutorial series is broken down into the following:

* Create and Train the Model:  This first notebook shows how the model is trained from existing data.
* Deploy and Sample Inference:  With the model developed, we will deploy it into Wallaroo and perform a sample inference.
* Sample Inferences from DataBase Records:  Simulate pulling inference input data from a database, performing inferences, and uploading the results to the database.


## Prerequisites

* A Wallaroo instance version 2024.1 or greater.

## References

* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
* [Wallaroo SDK Essentials: Inference Guide: Parallel Inferences](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences)

In [8]:
import pandas as pd
import datetime
import os

import numpy as np

from statsmodels.tsa.arima.model import ARIMA
from resources import simdb as simdb

import wallaroo

### Train the Model

The resources to train the model will start with the local file `day.csv`.  This data is load and prepared for use in training the model.

For this example, the simulated database is controled by the resources `simbdb`.

In [3]:
def mk_dt_range_query(*, tablename: str, seed_day: str) -> str:
    assert isinstance(tablename, str)
    assert isinstance(seed_day, str)
    query = f"select count from {tablename} where date > DATE(DATE('{seed_day}'), '-1 month') AND date <= DATE('{seed_day}')"
    return query

conn = simdb.get_db_connection()

# create the query
query = mk_dt_range_query(tablename=simdb.tablename, seed_day='2011-03-01')
print(query)

# read in the data
training_frame = pd.read_sql_query(query, conn)
training_frame

select count from bikerentals where date > DATE(DATE('2011-03-01'), '-1 month') AND date <= DATE('2011-03-01')


Unnamed: 0,count
0,1526
1,1550
2,1708
3,1005
4,1623
5,1712
6,1530
7,1605
8,1538
9,1746


## Test the Forecast

The training frame is then loaded, and tested against our `forecast` model.

In [25]:
# test
from models import forecast_standard as forecast
import importlib
importlib.reload(forecast)
import json

# create the appropriate json
# jsonstr = json.dumps(training_frame.to_dict(orient='list'))
# print(jsonstr)


data = {
        'count': np.asarray(training_frame['count'])
    }
display(data)



# convert it to a list
# df = pd.DataFrame([
#     {
#         'count': training_frame['count'].tolist()
#     }
# ])
# display(df['count'])


forecast.process_data(data)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  'count': np.asarray(training_frame['count'])(dtype=np.int)


TypeError: 'numpy.ndarray' object is not callable

### Reload New Model

The `forecast` model is reloaded in preparation of creating the evaluation data.

In [4]:
import importlib
importlib.reload(forecast)

<module 'models.forecast_standard' from '/home/jovyan/pipeline_multiple_replicas_forecast_tutorial/models/forecast_standard.py'>

### Prepare evaluation data

For ease of inference, we save off the evaluation data to a separate json file.

In [10]:
# save off the evaluation frame
import json

with open("./data/testdata.json", "w") as outfile:
    outfile.write(json.dumps({
        'count': training_frame['count'].tolist()
    }))