## Demo deserialization and prediction with a pipeline stored on the file system
The pipeline used in this test is related to the notebook "BC5_pipeline_training_github.ipynb"
The pipeline will generate calendar features and holidays from the DatetimeIndex of the input df to learn the occupancy patterns of a zone from movement sensor data and forecast the occupancy in the future.
The format used for the serialization and deserialization of the pipeline must be cloudpickle.
Also, here we do not use any other feature than the ones extracted from the DatetimeIndex.
So, we will skip the harmonization.

#### - Generate the a DatetimeIndex with a time range for the predictions
Here we only need to generate a DatetimeIndex for the predictions.

In [15]:
import pytz
import pandas as pd
new_X_data = pd.DataFrame(index= pd.date_range(start='2022/07/01', end='2022/12/31', tz=pytz.utc, freq='H'))
new_X_data.index.name = "timestamp"
new_X_data

2022-07-01 00:00:00+00:00
2022-07-01 01:00:00+00:00
2022-07-01 02:00:00+00:00
2022-07-01 03:00:00+00:00
2022-07-01 04:00:00+00:00
...
2022-12-30 20:00:00+00:00
2022-12-30 21:00:00+00:00
2022-12-30 22:00:00+00:00
2022-12-30 23:00:00+00:00
2022-12-31 00:00:00+00:00


#### - Generate column of predictions using the cloudpickled pipeline

In [16]:
from ai_toolbox.data_modelling import deserialize_and_predict
from os.path import join
from os import getcwd

dataset_dir = join(getcwd(), "static_data")
model_filename = "best_pipeline_bc5"
extension = ".cloudpickle"
model_full_path = join(dataset_dir, "{}{}".format(model_filename, extension))
new_X_data["y_pred"] = deserialize_and_predict(model_full_path, new_X_data)
new_X_data

Unnamed: 0_level_0,y_pred
timestamp,Unnamed: 1_level_1
2022-07-01 00:00:00+00:00,0
2022-07-01 01:00:00+00:00,0
2022-07-01 02:00:00+00:00,0
2022-07-01 03:00:00+00:00,0
2022-07-01 04:00:00+00:00,0
...,...
2022-12-30 20:00:00+00:00,0
2022-12-30 21:00:00+00:00,0
2022-12-30 22:00:00+00:00,0
2022-12-30 23:00:00+00:00,0


#### - Generate holidays column only for visualization purposes

In [17]:
from ai_toolbox.data_transformation import HolidayTransformer

new_X_data = HolidayTransformer(country='GR').fit_transform(new_X_data)

new_X_data

Unnamed: 0_level_0,y_pred,holiday
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-07-01 00:00:00+00:00,0,0
2022-07-01 01:00:00+00:00,0,0
2022-07-01 02:00:00+00:00,0,0
2022-07-01 03:00:00+00:00,0,0
2022-07-01 04:00:00+00:00,0,0
...,...,...
2022-12-30 20:00:00+00:00,0,0
2022-12-30 21:00:00+00:00,0,0
2022-12-30 22:00:00+00:00,0,0
2022-12-30 23:00:00+00:00,0,0


#### - Visualize the predictions

In [18]:
import pandas_bokeh
pandas_bokeh.output_notebook()

new_X_data.plot_bokeh.area(
    title = "Occupancy Forecast",
    x=new_X_data.index, 
    y=["y_pred", "holiday"],
    figsize=(1200, 600),
    ylim=(0, 1.1),
    zooming=True,
    panning=True
)

#### - Let's make a multistep prediction from now to the next 9 hours
We just need to generate a DatetimeIndex in the range we want

In [19]:
from datetime import datetime

hours_ahead = 9
one_step = pd.DataFrame(index= pd.date_range(start=datetime.utcnow(), periods=hours_ahead, tz=pytz.utc, freq='H'))
one_step

2023-05-24 16:18:15.521874+00:00
2023-05-24 17:18:15.521874+00:00
2023-05-24 18:18:15.521874+00:00
2023-05-24 19:18:15.521874+00:00
2023-05-24 20:18:15.521874+00:00
2023-05-24 21:18:15.521874+00:00
2023-05-24 22:18:15.521874+00:00
2023-05-24 23:18:15.521874+00:00
2023-05-25 00:18:15.521874+00:00


In the result 1 means occupied, 0 means unoccupied

In [20]:
deserialize_and_predict(model_full_path, one_step)

array([1, 1, 1, 0, 0, 0, 0, 0, 0])