
# DEV NOTEBOOK

**NOTE**
This tip has been incoprorate into the ARIMA_PLUS_XREG notebook along with expanded examples.  Please see: [BQML Multivariate Forecasting with ARIMA+ XREG](../../Applied%20Forecasting/BQML%20Multivariate%20Forecasting%20with%20ARIMA+%20XREG.ipynb)

# Using BQML ARIMA_PLUS_XREG With Multiple Time Series

This repository contains a [series on forecasting methods in GCP](../../Applied%20Forecasting/readme.md)

One of the methods covered is [BQML Multivariate Forecasting with ARIMA+ XREG](../../Applied%20Forecasting/BQML%20Multivariate%20Forecasting%20with%20ARIMA+%20XREG.ipynb).  While in preview (current day is April 6, 2023) this model type (`MODEL_TYPE = 'ARIMA_PLUS_XREG'`) fits one time series at a time. In contrast, the `MODEL_TYPE = 'ARIMA_PLUS'` for unvariate ARIMA based forcasting as the parameter `time_series_id_col` which allows the specification of column that contains groups of rows belonging to different time series.

This short notebook present a temporary workaround for this.  The method is to create separate forcasting models for each time series.  This is done by using parameterize SQL queries launch from the Python Client for BigQuery.

---

**Prerequisites:**
- [BigQuery Time Series Forecasting Data Review and Preparation](./BigQuery%20Time%20Series%20Forecasting%20Data%20Review%20and%20Preparation.ipynb)
    - prepare data for this notebook

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20Forecasting/BQML%20Multivariate%20Forecasting%20with%20ARIMA+%20XREG.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [473]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Setup

inputs:

In [13]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [19]:
REGION = 'us-central1'
EXPERIMENT = 'bqml-arimaplusxreg'
SERIES = 'applied-forecasting'

BQ_PROJECT = PROJECT_ID
BQ_DATASET = SERIES.replace('-','_')
BQ_TABLE = 'forecasting-data_prepped'

viz_limit = 12

packages:

In [44]:
from google.cloud import bigquery

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

from time import sleep

clients:

In [21]:
bq = bigquery.Client(project = PROJECT_ID)

---
## Work Around

Define forecasting parameters:

In [38]:
# CUSTOMIZE
TARGET_COLUMN = 'num_trips'
TIME_COLUMN = 'starttime'
SERIES_COLUMN = 'start_station_name'
SPLIT_COLUMN = 'splits'
COVARIATE_COLUMNS = ['avg_tripduration', 'pct_subscriber', 'ratio_gender'] # could be empty

# CUSTOMIZE
FORECAST_GRANULARITY = 'DAILY' # the data preparation included preparing the data at this level
FORECAST_HORIZON_LENGTH = 14
FORECAST_TEST_LENGTH = 14 # the data preparation included setting this value for splits = TEST
FORECAST_VALIDATE_LENGTH = 14 # the data preparation included setting this value for splits = VALIDATE

Retrieve a list of the time series id's

In [39]:
query = f"""
SELECT DISTINCT {SERIES_COLUMN}
FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
"""
time_series_id_col = bq.query(query).to_dataframe()
time_series_id_col = time_series_id_col[SERIES_COLUMN].tolist()
time_series_id_col

['Central Park S & 6 Ave',
 'Central Park West & W 72 St',
 'Grand Army Plaza & Central Park S',
 'W 82 St & Central Park West',
 'Central Park West & W 100 St',
 'Central Park West & W 85 St',
 'Central Park North & Adam Clayton Powell Blvd',
 'Central Park West & W 76 St',
 'Central Park West & W 68 St',
 'Central Park West & W 102 St',
 'Central Park W & W 96 St',
 'W 106 St & Central Park West']

Make a function that creates queries:

In [40]:
def make_model(ts_number, ts):
    query = f"""
        # create a model for {SERIES_COLUMN} = '{ts}'
        CREATE OR REPLACE MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_arimaplusxreg_{ts_number}`
        OPTIONS
          (model_type = 'ARIMA_PLUS_XREG',
           time_series_timestamp_col = '{TIME_COLUMN}',
           time_series_data_col = '{TARGET_COLUMN}',
           #time_series_id_col = '{SERIES_COLUMN}',
           data_frequency = '{FORECAST_GRANULARITY}',
           auto_arima_max_order = 5,
           holiday_region = ['GLOBAL', 'US'],
           horizon = {FORECAST_HORIZON_LENGTH} + {FORECAST_TEST_LENGTH}
          ) AS
        SELECT {TIME_COLUMN}, {TARGET_COLUMN},
            {', '.join(COVARIATE_COLUMNS)}
        FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
        WHERE {SPLIT_COLUMN} in ('TRAIN','VALIDATE')
            AND {SERIES_COLUMN} = '{ts}'
    """
    return query

In [41]:
print(make_model(0, time_series_id_col[0]))


        # create a model for start_station_name = 'Central Park S & 6 Ave'
        CREATE OR REPLACE MODEL `statmike-mlops-349915.applied_forecasting.forecasting-data_prepped_arimaplusxreg_0`
        OPTIONS
          (model_type = 'ARIMA_PLUS_XREG',
           time_series_timestamp_col = 'starttime',
           time_series_data_col = 'num_trips',
           #time_series_id_col = 'start_station_name',
           data_frequency = 'DAILY',
           auto_arima_max_order = 5,
           holiday_region = ['GLOBAL', 'US'],
           horizon = 14 + 14
          ) AS
        SELECT starttime, num_trips,
            avg_tripduration, pct_subscriber, ratio_gender
        FROM `statmike-mlops-349915.applied_forecasting.forecasting-data_prepped`
        WHERE splits in ('TRAIN','VALIDATE')
            AND start_station_name = 'Central Park S & 6 Ave'
    


In [46]:
bqml_jobs = [bq.query(query = make_model(tsi, ts)) for tsi, ts in enumerate(time_series_id_col)]

In [47]:
while not all([job.done() for job in bqml_jobs]):
    print('waiting on all jobs to finish ... sleeping for 5s')
    sleep(5)
for j, job in enumerate(bqml_jobs):
    print('Completed with Errors = ', job.error_result)

waiting on all jobs to finish ... sleeping for 5s
waiting on all jobs to finish ... sleeping for 5s
waiting on all jobs to finish ... sleeping for 5s
waiting on all jobs to finish ... sleeping for 5s
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None
Completed with Errors =  None


In [48]:
print(f'Direct link to the models in BigQuery:\nhttps://console.cloud.google.com/bigquery?project={PROJECT_ID}&ws=!1m5!1m4!5m3!1s{PROJECT_ID}!2s{BQ_DATASET}!3s{BQ_TABLE}_arimaplusxreg_0')


Direct link to the models in BigQuery:
https://console.cloud.google.com/bigquery?project=statmike-mlops-349915&ws=!1m5!1m4!5m3!1sstatmike-mlops-349915!2sapplied_forecasting!3sforecasting-data_prepped_arimaplusxreg_0
