In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table align="left">
  <td><a href="https://colab.research.google.com/github/GoogleCloudPlatform/ai-ml-recipes/blob/main/notebooks/forecast/arima_timesfm_bigquery.ipynb"><img src="https://avatars.githubusercontent.com/u/33467679?s=200&v=4" width="32px" alt="Colab logo"> Run in Colab</a></td>
  <td><a href="https://github.com/GoogleCloudPlatform/ai-ml-recipes/blob/main/notebooks/forecast/arima_timesfm_bigquery.ipynb"><img src="https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png" width="32px" alt="GitHub logo"> View on GitHub</a></td>
  <td><a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/ai-ml-recipes/main/notebooks/forecast/arima_timesfm_bigquery.ipynb"><img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"> Open in Vertex AI Workbench</a></td>
  <td><a href="https://console.cloud.google.com/bigquery/import?url=https://github.com/GoogleCloudPlatform/ai-ml-recipes/blob/main/notebooks/forecast/arima_timesfm_bigquery.ipynb"><img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s" alt="BQ logo" width="35"> Open in BQ Studio</a></td>
  <td><a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fai-ml-recipes%2Fmain%2Fnotebooks/forecast/arima_timesfm_bigquery.ipynb"><img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"> Open in Colab Enterprise</a></td>
</table>

# Time Series Analysis with TimesFM and ARIMA in BigQuery

| Author |
| --- |
| [Jeff Nelson](https://github.com/jeffonelson) |

## Overview

Accurate and granular demand forecasting is critical for managing retail inventory, but forecasting for every individual item and location is a major operational challenge. This tutorial demonstrates how to perform scalable time series forecasting directly in [BigQuery](https://cloud.google.com/bigquery/docs/introduction), using the public [Iowa Liquor Sales](https://console.cloud.google.com/marketplace/product/iowa-department-of-commerce/iowa-liquor-sales) dataset as a practical example.

In this notebook, you will compare two distinct approaches to generate forecasts:
* **A Trained Model:** follow a traditional time series workflow by training an [**ARIMA**](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series) model and generating predictions with [BigQuery Machine Learning](https://cloud.google.com/bigquery/docs/bqml-introduction).
* **A Zero Shot Model:** use [**TimesFM**](https://github.com/google-research/timesfm), a foundaiton model, to generate predicitons directly from the data, no explicit model training required.

You will apply both methods to a single aggregated time series and then scale to multiple series to see how each performs at a granular level.

### Objectives

You will learn to:

* Prepare raw data for time series forcasting scenarios in BigQuery
* Train a traditional forecasting model using [`CREATE MODEL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series) AND [`ML.FORECAST`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-forecast) functions (ARIMA)
* Generate zero-shot forcasts directly from data using the [`AI.FORECAST`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-forecast) function (TimesFM)
* Scale your analysis from a single time series to multiple time series
* Visualize and compare the outputs to understand the tradeoffs between the two methods

### Services and Costs

This tutorial uses the following billable components of Google Cloud:

* **BigQuery**: [Pricing](https://cloud.google.com/bigquery/pricing)

* **BigQuery ML**: [Pricing](https://cloud.google.com/bigquery/pricing#bqml)


You can use the [Pricing Calculator](https://cloud.google.com/products/calculator) to generate a cost estimate based on your projected usage.

---

## Before you begin

### Set up your Google Cloud project
**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the BigQuery, BigQuery Connection, and Vertex AI APIs](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

### Set your project ID

Input your Google Cloud project_id into the box below:

#### Initialize your project ID
Set your Google Cloud project ID and configure the `gcloud` command-line tool to use it.

In [None]:
PROJECT_ID = "data-demo-n25"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

### Authenticate to your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Colab Enterprise in BigQuery Studio or Vertex AI**
* Do nothing as you are already authenticated.

**2. Colab Consumer - uncomment and run the following:**

#### Authenticate in Colab Consumer
Authenticate your user in Colab consumer environments to access Google Cloud services.

In [None]:
from google.colab import auth
auth.authenticate_user()

**3. Local JupyterLab instance, uncomment and run the following:**



#### Authenticate in Local JupyterLab
Authenticate your user in local JupyterLab environments to access Google Cloud services.

In [None]:
# ! gcloud auth login

#### Create a BigQuery Dataset

Running the following query creates a [BigQuery dataset](https://cloud.google.com/bigquery/docs/datasets-intro) called **`bq_forecasting`** to house any tables or models created in this tutorial:

Create a BigQuery dataset named `bq_forecasting` to store all tables and models created in this notebook.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE SCHEMA `bq_forecasting` OPTIONS (location = 'US');

### Set Colab display options

Colab includes the `google.colab.data_table` package that can be used to display large pandas dataframes as an interactive data table. It can be enabled with:

Load the `google.colab.data_table` extension to enable interactive data tables for pandas DataFrames in Colab.

In [None]:
%load_ext google.colab.data_table

## Times Series Forecasting with BigQuery

This tutorial uses the [Iowa Liquor Sales data](https://console.cloud.google.com/marketplace/product/iowa-department-of-commerce/iowa-liquor-sales), which contains every wholesale purchase of liquor in the state of Iowa from January 1, 2012 to today. We'll use this data for retail demand forecasting.

Let's take a quick look at a few rows from the table.

Inspect the first few rows of the raw `iowa_liquor_sales` public dataset to understand its structure.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
    invoice_and_item_number,
    date,
    county,
    store_number,
    category_name,
    item_description,
    bottles_sold,
    sale_dollars
FROM
  `bigquery-public-data.iowa_liquor_sales.sales`
WHERE sale_dollars > 0
LIMIT
  5;

### Data Preparation

The `sales` table can contain multple rows for each date / county / store / item / other field combination. For this tutorial, we'll aggregate the table to generate total sales for each field we're interested in forecasting (e.g. `date`, `item_description`).

We'll create two tables for two intended use cases:

| Table | Granularity | Usage |
|---|---|---|
| liquor_sales_training_single | `total_sales` by date | Single times series |
| liquor_sales_training_multiple | `total_sales` by multiple fields, including `item_name`| Multiple times series |

Aggregate the raw sales data to create two new tables:

*   `bq_forecasting.liquor_sales_training_multiple`: Aggregated by `date`, `county`, and `item_name` for multiple time series forecasting.
*   `bq_forecasting.liquor_sales_training_single`: Aggregated by `date` for single time series forecasting.

In [None]:
%%bigquery --project {PROJECT_ID}

-- SALES AGGREAGTED BY DATE, COUNTY, ITEM_NAME
CREATE OR REPLACE TABLE bq_forecasting.liquor_sales_training_multiple AS (
WITH top_sellers AS(
      SELECT
        item_description,
        SUM( sale_dollars ) AS total_sales
    FROM
        `bigquery-public-data.iowa_liquor_sales.sales`
    GROUP BY
        item_description
    ORDER BY total_sales DESC
    LIMIT 5
)
SELECT
    date,
    county,
    item_description AS item_name,
    SUM( sale_dollars ) AS total_sales
FROM
    `bigquery-public-data.iowa_liquor_sales.sales`
GROUP BY date, county, item_name
HAVING
    date BETWEEN '2018-01-01' AND '2024-12-31'
    AND item_description IN (SELECT item_description FROM top_sellers)
);

-- SALES AGGREGATED BY DATE
CREATE OR REPLACE TABLE bq_forecasting.liquor_sales_training_single AS (
SELECT
    date,
    SUM( total_sales ) AS total_sales
FROM
    `bq_forecasting.liquor_sales_training_multiple`
GROUP BY date
);

## 1. Forecasting a Single Time Series

In time series forcasting, a simple scenario is a **single time series**. This means we are looking to predict a single variable over time. Think of it as having one column for your dates and one column for the value you want to predict.

**Scenario: Total Sales Over Time**

For this example, we'll forecast `total_sales` over time. We aren't breaking down sales by any other category. We will take two approaches using BigQuery ML and see how they compare:

1. **ARIMA**: A widely used statistical model that requires you to explictly train a model on historical data.
2. **TimesFM**: A pre-trained foundation model that allows you to generate a forecast with a simple function call, no training required.

### Method A: ARIMA

#### Create a Model

Before we can get a forecast, we need to train a dedicated model on our historical data. In BigQuery ML, we accomplish this using the `CREATE MODEL` statement, where we specify our timestamp column (`date`) and the data column we want to forecast (`total_sales`).

Train an `ARIMA_PLUS` model using `BigQuery ML`. This model will learn patterns from the `total_sales` data over time in the `bq_forecasting.liquor_sales_training_single` table.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE OR REPLACE MODEL bq_forecasting.arima_model_single
OPTIONS(
  MODEL_TYPE='ARIMA_PLUS',
  TIME_SERIES_TIMESTAMP_COL='date',
  TIME_SERIES_DATA_COL='total_sales'
) AS
SELECT
    date,
    total_sales
FROM
  bq_forecasting.liquor_sales_training_single;

#### Generate a Forecast

With our `arima_model` now trained, we can use it to generate the actual forecast. We do this with the `ML.FORECAST` function. This function takes our trained model as input and requires us to specify the `horizon` (the number of future time steps we want to predict), and a confidence level. It returns the forecasted values.

Generate a 30-day sales forecast using the trained `bq_forecasting.arima_model_single`. The `ML.FORECAST` function applies the trained model to predict future `total_sales` values and their confidence intervals.

In [None]:
%%bigquery df_arima --project {PROJECT_ID}

SELECT *
FROM ML.FORECAST(MODEL bq_forecasting.arima_model_single,
       STRUCT(30 AS horizon,
              0.95 AS confidence_level)
      );

#### Fetch Historical Data for Single Series Plotting

This query retrieves historical sales data for the single time series, which will be used for visualizing the ARIMA forecast.    

Retrieve the most recent historical `total_sales` data from the `bq_forecasting.liquor_sales_training_single` table to provide context for visualizing the ARIMA forecast.

In [None]:
%%bigquery df_history --project {PROJECT_ID}

SELECT
    date,
    total_sales
FROM
  bq_forecasting.liquor_sales_training_single
WHERE
  date BETWEEN '2024-10-01' AND '2024-12-31'
ORDER BY date

#### Visualize the Forecast

To check out our model, we can plot the historical sales data against the forecated values and their 95% confidence interval.

Use `matplotlib` to plot the historical sales data alongside the ARIMA forecast, including the 95% confidence interval, to visually assess the model's predictions.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df_history['date'], df_history['total_sales'], label='2024 Historical Data')
plt.plot(df_arima['forecast_timestamp'], df_arima['forecast_value'], label='ARIMA Forecast')
plt.fill_between(df_arima['forecast_timestamp'], df_arima['prediction_interval_lower_bound'], df_arima['prediction_interval_upper_bound'], color='blue', alpha=0.2, label='ARIMA 95% Confidence Interval')
plt.xlabel('Timestamp')
plt.ylabel('Total Sales')
plt.title('ARIMA Forecast vs 2024 Historical Data with 95% Confidence Interval')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Method B: TimesFM

Our second method uses TimesFM, which operates differently from "traditional" models like ARIMA. TimesFM is a pre-trained foundation model, meaning the complex model training has already been done. As a result, we can skip the `CREATE MODEL` step entirely and apply it directly to our data for a zero-shot forecast.


#### Generate a Forecast

To get our forecast, we use the [`AI.FORECAST`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-forecast) function. Unlike the two-step process for ARIMA (create, then forecast), this single function takes our historical data table directly as its input. We just configure the forecast by passing arguments for the `horizon`, `timestamp_col`, and `data_col`. Then, BigQuery returns the predicted values in one go.

Below is a quick look at 5 forecasted days.

Generate a 5-day zero-shot forecast using the pre-trained TimesFM model. The `AI.FORECAST` function directly predicts future `total_sales` without explicit model training, displaying a preview of the results.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT *
FROM
  AI.FORECAST(
    (
      SELECT *
      FROM bq_forecasting.liquor_sales_training_single
    ),
    horizon => 5,
    confidence_level => 0.95,
    timestamp_col => 'date',
    data_col => 'total_sales')
  ;

### Visualize the Results

Let's expand the horizon to 30 days of forecasted data and save it to a DataFrame called `df_timesfm` so we can plot it alongside the ARIMA predictions.

Extend the TimesFM zero-shot forecast to a 30-day horizon, similar to the ARIMA forecast, to enable a direct visual comparison between the two methods.

In [None]:
%%bigquery df_timesfm --project {PROJECT_ID}

SELECT *
FROM
  AI.FORECAST(
    (
      SELECT *
      FROM bq_forecasting.liquor_sales_training_single
    ),
    horizon => 30,
    confidence_level => 0.95,
    timestamp_col => 'date',
    data_col => 'total_sales')
  ;

Plotting the results together provides a clear, side-by-side comparison. We can see that both models successfully identified the strong weekly seasonality in the sales data. However, the ARIMA forecast consistently underestimates the peaks of the sales cycles, while the TimesFM forecast appears to capture the magnitude of the recent historical data more accurately.

Visualize both the ARIMA and TimesFM forecasts against the recent historical data to visually compare their respective predictions and identify any notable differences.

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df_history['date'], df_history['total_sales'], label='2024 Historical Data')
plt.plot(df_arima['forecast_timestamp'], df_arima['forecast_value'], label='ARIMA Forecast')
plt.plot(df_timesfm['forecast_timestamp'], df_timesfm['forecast_value'], label='TimesFM Forecast')
plt.xlabel('Timestamp')
plt.ylabel('Total Sales')
plt.title('ARIMA vs TimesFM Forecast')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Recap

In this scenario we:
* Addressed a single time series problem by forecasting `total_sales` based on `date`.
* Used a two-step process to first train an `ARIMA_PLUS` model with the `CREATE MODEL` statement, and then generate predictions using `ML.FORECAST`.
* Used a one-step, zero-shot approach to generate a forecast directly from the data using `AI.FORECAST` and the TimesFM model.
* Plotted both forecasts alongside the historical data to visually compare the outputs from each method.


---


## 2. Forecasting Multiple TIme Series

Forecasting a single series is useful, but most real-world scenarios require more granularity. With **multiple time series forecasting**, we can predict many individual time series at the same time. Instead of one forecast for `total_sales` by date, we will generate a unique forecast for every combination of `county` and `item_name`. This is powerful for tasks like inventory management and revenue predictions.

Handling this complexity in BigQuery is straightforward. We simply specify which coumns uniquely identify each individual time series. Both the ARIMA and TimesFM methods support a parameter that tells BigQuery to partition the data, treating each unique ID as a separate series to be forecasted. With this addition, we can scale forecasting efforts from a _single_ series to thousands or more, without a significant change to our workflow.

### Method A: ARIMA

Once again, we will start with the ARIMA model. This process is very similar to the single series example, but with an additional parameter to tell BigQuery how to handle different series within our dataset.

#### Create a Model

To train the model on multiple time series simultaneously, we introduce the `TIME_SERIES_ID_COL` option. This parameter accepts an array of column names that uniquely identify each series. In our case, the input table is uniquely identified by a combination of `date` and the array [`county`, `item_name`].

BigQuery now partitions the data by these IDs and trains a distinct ARIMA model for each indvidiual time series behind the scenes.



Train an `ARIMA_PLUS` model designed for multiple time series. The `TIME_SERIES_ID_COL` option ensures that a separate ARIMA model is implicitly trained for each unique combination of `county` and `item_name`.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE OR REPLACE MODEL bq_forecasting.arima_model_multiple
OPTIONS(
  MODEL_TYPE='ARIMA_PLUS',
  TIME_SERIES_TIMESTAMP_COL='date',
  TIME_SERIES_DATA_COL='total_sales',
  TIME_SERIES_ID_COL=['county','item_name']
) AS
SELECT
    date,
    total_sales,
    county,
    item_name
FROM
  bq_forecasting.liquor_sales_training_multiple;

#### Generate a Forecast

After the model is trained, the process to generate a forcast is identical to what we did before. We use the `ML.FORECAST` function to call our new model (`arima_model_multiple`), which inherently understands it needs to produce a separate forecaste for each unique `county` and `item_name` combination.

The result contains forecasts for all series. To make sense of the output, we will filter the results to visualize the forecast for single, specific series.

Generate a 50-day forecast for all the multiple time series using the trained `bq_forecasting.arima_model_multiple`. The results are then filtered to display the forecast for 'BLACK VELVET' in 'POLK' county.

In [None]:
%%bigquery df_arima_multiple --project {PROJECT_ID}

SELECT *
FROM ML.FORECAST(MODEL bq_forecasting.arima_model_multiple,
       STRUCT(50 AS horizon,
              0.95 AS confidence_level)
      )
WHERE county = 'POLK'
AND item_name = 'BLACK VELVET';

### Method B: TimesFM

Adapting the TimesFM appraoch for multiple time series is just as straightforward. Since we don't have a model to create, we only need to add one parameter to the existing `AI.FORECAST` query to make it aware of the different series within our data.

#### Generate a Forecast

The process remains a single-step query, with the key adddition of the `id_cols` parameter. We simply pass an array of column names (`['county', 'item_name']`) to tell the function that each unique combination defines a separate time series.

The `AI.FORECAST` function computes a forecast for every series in the dataset. As shown in the code, you can use a standard WHERE clause to output and easily filter the results for a specific series of interest. In this case, we'll look at Black Velvet liquor in Polk County.

Generate a 50-day zero-shot forecast for multiple time series using TimesFM. The `id_cols` parameter specifies `county` and `item_name` as identifiers for individual series, and the output is filtered for 'BLACK VELVET' in 'POLK' county.

In [None]:
%%bigquery df_timesfm_multiple --project {PROJECT_ID}

SELECT *
FROM
  AI.FORECAST(
    (
      SELECT *
      FROM bq_forecasting.liquor_sales_training_multiple
    ),
    horizon => 50,
    confidence_level => 0.95,
    timestamp_col => 'date',
    data_col => 'total_sales',
    id_cols => ['county', 'item_name']
    )
  WHERE county = 'POLK'
  AND item_name = 'BLACK VELVET'
  ;

### Visualize the Results

Retrieve historical sales data for a specific county and item, which will be used for visualizing the multiple time series forecasts.  

Retrieve recent historical sales data for the selected `county` and `item_name` to be used for visualizing the multiple time series forecasts.

In [None]:
%%bigquery df_history_multiple --project {PROJECT_ID}

SELECT
    date,
    county,
    item_name,
    total_sales
FROM
  bq_forecasting.liquor_sales_training_multiple
WHERE
  date BETWEEN '2024-10-01' AND '2025-03-20'

Filter the historical data and plot it alongside both the ARIMA and TimesFM forecasts for 'BLACK VELVET' in 'POLK' county, providing a visual comparison for the multiple time series scenario.

In [None]:
df_history_filtered = df_history_multiple[(df_history_multiple['county'] == 'POLK') & (df_history_multiple['item_name'] == 'BLACK VELVET')]
plt.figure(figsize=(12, 6))
plt.plot(df_history_filtered['date'], df_history_filtered['total_sales'], label='Historical Data')
plt.plot(df_arima_multiple['forecast_timestamp'], df_arima_multiple['forecast_value'], label='ARIMA Forecast')
plt.plot(df_timesfm_multiple['forecast_timestamp'], df_timesfm_multiple['forecast_value'], label='TimesFM Forecast')
plt.xlabel('Timestamp')
plt.ylabel('Total Sales')
plt.title('ARIMA vs TimesFM Forecast for BLACK VELVET in POLK County')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Recap

In this section, we:
* Moved to multiple time series, forecasting `total_sales` for each unique combination of `county` and `item_name`.
* Adapted our ARIMA model by adding the `TIME_SERIES_ID_COL` parameter to the `CREATE MODEL` statement.
* Adapted our TimesFM query by adding the `id_cols` parameter to the `AI.FORECAST` function.
* Demonstrated how a single parameter change in BigQuery allows us to easily scale our forecasting efforts from one to many individual series.



---



# Cleaning Up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

Execute `bq rm` commands to delete the BigQuery tables, models, and the dataset created during this tutorial, ensuring a clean environment.

In [None]:
# Delete the BigQuery tables
! bq rm --table -f bq_forecasting.liquor_sales_training_single
! bq rm --table -f bq_forecasting.liquor_sales_training_multiple

# Delete the ARIMA models
! bq rm --model -f bq_forecasting.arima_model_single
! bq rm --model -f bq_forecasting.arima_model_multiple

# Delete the BigQuery dataset
! bq rm -r -f $PROJECT_ID:bq_forecasting