In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Online prediction with BigQuery ML

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/bigquery_ml/bqml-online-prediction.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/bigquery_ml/bqml-online-prediction.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/bigquery_ml/bqml-online-prediction.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>                                                                                               
</table>

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.9

## Overview

This notebook demonstrates how to train a churn prediciton model using BigQuery ML on Google Cloud Platform. The churn model training is performed on Google Analytics 4 app event data and is supposed to estimate the likelihood of churning users for the app. 

Learn more about the usecase and the data from the blog: [Churn prediction for game developers using Google Analytics 4 (GA4) and BigQuery ML](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml).

After training the BigQuery ML model, this notebook also explores registering the trained model in Vertex AI and deploying to an endpoint for online predictions.

### Objective

In this tutorial, you fetch the required data from a public BigQuery dataset and prepare it for training. Then you train a churn prediction model on the data using BigQuery ML. Finally, you deploy the model in Vertex AI and get predictions from the endpoint.

This tutorial uses the following Google Cloud data analytics and ML services:

- BigQuery
- BigQuery ML
- Vertex AI Model Registry
- Vertex AI Endpoints


The steps performed include:

- Query and fetch the data from the public BigQuery dataset.
- Prepare the data for training.
- Train a churn classification model using BigQuery ML.
- Save the trained model to Vertex AI Model Registry.
- Deploy the model to a Vertex AI Endpoint.
- Make online prediction requests to the endpoint.


### Dataset

This notebook uses the public BigQuery dataset [`firebase-public-project.analytics_153293282`](https://console.cloud.google.com/bigquery?p=firebase-public-project&d=analytics_153293282&t=events_20181003&page=table), which contains raw event data from a real mobile gaming app called Flood It! (Android app, iOS app). The data schema originates from Google Analytics for Firebase, but has the same schema as Google Analytics 4. Therefore, the steps in this notebook can be applied to either Google Analytics for Firebase or Google Analytics 4 data.

Google Analytics 4 (GA4) uses an event-based measurement model. Events provide insight on what is happening in an app or on a website, such as user actions, system events, or errors. Every row in the data is an event, with various characteristics relevant to that event stored in a nested format within the row.

Learn more about the [dataset](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml).

### Costs 

This tutorial uses billable components of Google Cloud:

* BigQuery
* BigQuery ML
* Vertex AI


Learn about <a href="https://cloud.google.com/bigquery/pricing" target="_blank">BigQuery Pricing</a>, <a href="https://cloud.google.com/bigquery-ml/pricing" target="_blank">BigQuery ML pricing</a>, <a href="https://cloud.google.com/vertex-ai/pricing" target="_blank">Vertex AI
pricing</a>, and use the <a href="https://cloud.google.com/products/calculator/" target="_blank">Pricing
Calculator</a>
to generate a cost estimate based on your projected usage.

## Installation

Install the following packages required to execute this notebook. 

In [None]:
! pip3 install --upgrade google-cloud-aiplatform \
                                 google-cloud-bigquery \
                                 db-dtypes

### Colab only: Uncomment the following cell to restart the kernel

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to serve as a staging bucket for Vertex AI.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

### Import libraries

In [None]:
import json
import os
from typing import Union

import google.cloud.aiplatform as vertex_ai
import pandas as pd
from google.cloud import bigquery

### Initialize Vertex AI and BigQuery SDKs for Python

Initialize the Vertex AI SDK for your project and create a BigQuery client.

In [None]:
# initialize the vertex ai sdk
vertex_ai.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

# create the bigquery client object
bq_client = bigquery.Client(project=PROJECT_ID)

## Create a dataset in BigQuery

Before you create a dataset, define a helper function for running queries in BigQuery.

In [None]:
# Wrapper to use BigQuery client to run query/job, return job ID or result as DF


def run_bq_query(sql: str) -> Union[str, pd.DataFrame]:
    """
    Input: SQL query, as a string, to execute in BigQuery
    Returns the query results as a pandas DataFrame, or error, if any
    """
    # Try dry run before executing query to catch any errors
    job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    bq_client.query(sql, job_config=job_config)

    # If dry run succeeds without errors, proceed to run query
    job_config = bigquery.QueryJobConfig()
    client_result = bq_client.query(sql, job_config=job_config)

    job_id = client_result.job_id

    # Wait for query/job to finish running. then get & return data frame
    df = client_result.result().to_arrow().to_pandas()
    print(f"Finished job_id: {job_id}")
    # return the dataframe (if any tabular data is returned)
    return df

Create a BigQuery dataset using the `Create` statement in SQL.

Learn more about [SQL dialects in BigQuery](https://cloud.google.com/bigquery/docs/introduction-sql#bigquery-sql-dialects).

In [None]:
# provide the dataset name
BQ_DATASET_NAME = "churnprediction_unique"  # @param {type:"string"}

# create a SQL query for creating the dataset
create_dataset_query = (
    f"""CREATE SCHEMA IF NOT EXISTS `{PROJECT_ID}.{BQ_DATASET_NAME}`"""
)

# run the query
run_bq_query(create_dataset_query)

## Prepare the training data

Before you can train a machine learning model, the data needs to be processed and proper features need to be extracted from it. 

The event data also includes some behavioral data of the users which can be considered for generating features for training. The timestamp information from the data is processed to determine whether a user churned or not which is used as the ground truth for training.

In this section, you perform the following steps:

1. Identify whether a user has churned or not based on the event timestamp information.
2. Extract behavioral data for each user.
3. Join the behavioral data with the churn label data.


Note: The original [blog](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml) about training on this dataset also considers user demographic data which is omitted in this tutorial.

### Identify the label for each user

The event dataset doesn't have a feature that tells you whether a user has "churned" or "returned". So, in this section, you create this label based on the following criteria:

```
If a user shows no event data after 24 hrs from their first engagement with the app, the user is considered churned.
```

There are many ways user churn can be defined. For this notebook, you define a 1-day churn. 

Additionally, you extract calendar information from the timestamps to use as features and remove the "bouncing" cases from the data. Bouncing cases refer to the cases where a user just spends a few minutes (say 10 min.) with the app.

Run the following cell to create a view named **`returningusers`** with the below columns:

- `user_pseudo_id`: An id (false) for the user.
- `user_first_engagement`: Earliest event timestamp of the user.
- `user_last_engagement`: Latest event timestamp of the user.
- `month`: Month of the year for the first engagement of the user.
- `julianday`: Day of the year for the first engagement of the user.
- `dayofweek`: Day of the week for the first engagement of the user.
- `ts_24hr_after_first_engagement`: Timestamp after 24hrs from the first user engagement.
- `churned`: Boolean field with **1** representing churned and **0** representing not churned.
- `bounced`: Boolean field with **1** representing bounced and **0** representing not bounced.

In [None]:
# define the sql query
create_label_data_query = f"""
    CREATE OR REPLACE VIEW `{PROJECT_ID}.{BQ_DATASET_NAME}.returningusers` AS (
      WITH firstlasttouch AS (
        SELECT
          user_pseudo_id,
          MIN(event_timestamp) AS user_first_engagement,
          MAX(event_timestamp) AS user_last_engagement
        FROM
          `firebase-public-project.analytics_153293282.events_*`
        WHERE event_name="user_engagement"
        GROUP BY
          user_pseudo_id

      )
      SELECT
        user_pseudo_id,
        user_first_engagement,
        user_last_engagement,
        EXTRACT(MONTH from TIMESTAMP_MICROS(user_first_engagement)) as month,
        EXTRACT(DAYOFYEAR from TIMESTAMP_MICROS(user_first_engagement)) as julianday,
        EXTRACT(DAYOFWEEK from TIMESTAMP_MICROS(user_first_engagement)) as dayofweek,

        #add 24 hr to user's first touch
        (user_first_engagement + 86400000000) AS ts_24hr_after_first_engagement,

    #churned = 1 if last_touch within 24 hr of app installation, else 0
    IF (user_last_engagement < (user_first_engagement + 86400000000),
        1,
        0 ) AS churned,

    #bounced = 1 if last_touch within 10 min, else 0
    IF (user_last_engagement <= (user_first_engagement + 600000000),
        1,
        0 ) AS bounced,
      FROM
        firstlasttouch
      GROUP BY
        1,2,3
        );

    SELECT 
      * 
    FROM 
      `{PROJECT_ID}.{BQ_DATASET_NAME}.returningusers`
    LIMIT 10;
"""

# run the query
run_bq_query(create_label_data_query)

### Extract behavioral data for each user

Behavioral data in the raw event data spans across multiple events. Hence, you aggregate and extract the behavioral data for each user. 

Since the model needs to predict based on user activity within the first 24 hrs, you extract aggregates from data less than 24 hrs of the app usage.

For aggregation, you count the total number of the following event types in the data per user:

- user_engagement
- level_start_quickplay
- level_end_quickplay
- level_complete_quickplay
- level_reset_quickplay
- post_score
- spend_virtual_currency
- ad_reward
- challenge_a_friend
- completed_5_levels
- use_extra_steps

Run the below cell to create a view named `user_aggregate_behavior` querying the aggregate behavioral data.

In [None]:
create_behavioral_data_query = f"""
    CREATE OR REPLACE VIEW `{PROJECT_ID}.{BQ_DATASET_NAME}.user_aggregate_behavior` AS (
    WITH
      events_first24hr AS (
        SELECT
          e.*
        FROM
          `firebase-public-project.analytics_153293282.events_*` e
        JOIN
          `{PROJECT_ID}.{BQ_DATASET_NAME}.returningusers` r
        ON
          e.user_pseudo_id = r.user_pseudo_id
        WHERE
          e.event_timestamp <= r.ts_24hr_after_first_engagement
        )
    SELECT
      user_pseudo_id,
      SUM(IF(event_name = 'user_engagement', 1, 0)) AS cnt_user_engagement,
      SUM(IF(event_name = 'level_start_quickplay', 1, 0)) AS cnt_level_start_quickplay,
      SUM(IF(event_name = 'level_end_quickplay', 1, 0)) AS cnt_level_end_quickplay,
      SUM(IF(event_name = 'level_complete_quickplay', 1, 0)) AS cnt_level_complete_quickplay,
      SUM(IF(event_name = 'level_reset_quickplay', 1, 0)) AS cnt_level_reset_quickplay,
      SUM(IF(event_name = 'post_score', 1, 0)) AS cnt_post_score,
      SUM(IF(event_name = 'spend_virtual_currency', 1, 0)) AS cnt_spend_virtual_currency,
      SUM(IF(event_name = 'ad_reward', 1, 0)) AS cnt_ad_reward,
      SUM(IF(event_name = 'challenge_a_friend', 1, 0)) AS cnt_challenge_a_friend,
      SUM(IF(event_name = 'completed_5_levels', 1, 0)) AS cnt_completed_5_levels,
      SUM(IF(event_name = 'use_extra_steps', 1, 0)) AS cnt_use_extra_steps,
    FROM
      events_first24hr
    GROUP BY
      1
      );

    SELECT
      *
    FROM
      `{PROJECT_ID}.{BQ_DATASET_NAME}.user_aggregate_behavior`
    LIMIT 10
"""
# run the query
run_bq_query(create_behavioral_data_query)

### Combine the label data and behavioral data

Now, join both the churn label data and the behavioral data on the id field i.e., `user_pseudo_id`. 

This operation creates a view named `train` which is further used for training.

In [None]:
combine_data_query = f"""
    CREATE OR REPLACE VIEW `{PROJECT_ID}.{BQ_DATASET_NAME}.train` AS (

      SELECT
        ret.user_pseudo_id,
        IFNULL(beh.cnt_user_engagement, 0) AS cnt_user_engagement,
        IFNULL(beh.cnt_level_start_quickplay, 0) AS cnt_level_start_quickplay,
        IFNULL(beh.cnt_level_end_quickplay, 0) AS cnt_level_end_quickplay,
        IFNULL(beh.cnt_level_complete_quickplay, 0) AS cnt_level_complete_quickplay,
        IFNULL(beh.cnt_level_reset_quickplay, 0) AS cnt_level_reset_quickplay,
        IFNULL(beh.cnt_post_score, 0) AS cnt_post_score,
        IFNULL(beh.cnt_spend_virtual_currency, 0) AS cnt_spend_virtual_currency,
        IFNULL(beh.cnt_ad_reward, 0) AS cnt_ad_reward,
        IFNULL(beh.cnt_challenge_a_friend, 0) AS cnt_challenge_a_friend,
        IFNULL(beh.cnt_completed_5_levels, 0) AS cnt_completed_5_levels,
        IFNULL(beh.cnt_use_extra_steps, 0) AS cnt_use_extra_steps,
        ret.user_first_engagement,
        ret.month,
        ret.julianday,
        ret.dayofweek,
        ret.churned
      FROM
        `{PROJECT_ID}.{BQ_DATASET_NAME}.returningusers` ret
      LEFT OUTER JOIN 
        `{PROJECT_ID}.{BQ_DATASET_NAME}.user_aggregate_behavior` beh
      ON
        ret.user_pseudo_id = beh.user_pseudo_id
      WHERE ret.bounced = 0
      );

    SELECT
      *
    FROM
      `{PROJECT_ID}.{BQ_DATASET_NAME}.train`
    LIMIT 10
"""
# run the query
run_bq_query(combine_data_query)

## Train a logistic regression model with BQML


BQML provides the capability to train machine learning models on tabular data such as classification, regression, forecasting, and matrix factorization in BigQuery using SQL syntax. BQML uses the scalable infrastructure of BigQuery so you don't need to set up additional infrastructure for training or batch serving.


Now, train a logistic regression model on the created training data using BQML. For this, you use the `CREATE OR REPLACE MODEL` statement from BigQuery's SQL dialect with the following options:

* `MODEL_TYPE`: Specify the machine learning algorithm to train the model. Learn about other [available model types](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#model_type).

* `INPUT_LABEL_COLS`: The label column names in the training data.

* `MODEL_REGISTRY`: Specifies the model registry destination. For now, 'VERTEX_AI' is the only supported model registry destination. To learn more, see [MLOps with BigQuery ML and Vertex AI](https://cloud.google.com/bigquery/docs/create_vertex#register_a_model_to_the).


* `VERTEX_AI_MODEL_VERSION_ALIASES`: The Vertex AI model alias to register the model with. It can only be set when `MODEL_REGISTRY` is set to 'VERTEX_AI'. To learn more, see [adding a Vertex AI model alias](https://cloud.google.com/bigquery/docs/create_vertex#add_a_model_alias).

Learn more about training models in [BQML](https://cloud.google.com/bigquery/docs/bqml-introduction).

Note that the model names also follow the [same rules](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#model_name) as tables in BigQuery. 

In [None]:
# set the model name
BQ_MODEL_NAME = "churn_model_unique"  # @param {type:"string"}

# define the query for creating and training the BQML model
# remove the id columns user_first_engagement, user_pseudo_id
# for training
train_model_query = f"""
    CREATE OR REPLACE MODEL `{PROJECT_ID}.{BQ_DATASET_NAME}.{BQ_MODEL_NAME}`

    OPTIONS(
      MODEL_TYPE="LOGISTIC_REG",
      INPUT_LABEL_COLS=["churned"],
      MODEL_REGISTRY="VERTEX_AI",
      VERTEX_AI_MODEL_VERSION_ALIASES=['logistic_reg', 'churn_prediction']
    ) AS
  
    SELECT
      * EXCEPT(user_first_engagement, user_pseudo_id)
    FROM
      `{PROJECT_ID}.{BQ_DATASET_NAME}.train`
"""
# run the query
run_bq_query(train_model_query)

### Fetch the evaluation scores

Once the training is done, fetch the evaluation metric scores for the trained model.

The following cell should give results with the below fields:

| precision | recall | accuracy | f1_score | log_loss | roc_auc |
| -------- | ------- | ------- | ------- | ------- | ------- |


Learn more about the [metrics returned by the **ML.EVALUATE** function](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate#mlevaluate_output).

In [None]:
# define the query
evaluate_model_query = f"""
    SELECT
      *
    FROM
      ML.EVALUATE(MODEL `{PROJECT_ID}.{BQ_DATASET_NAME}.{BQ_MODEL_NAME}`)
"""

# run the query
run_bq_query(evaluate_model_query)

## Generate batch predictions with explanations


Make a batch prediction in BQML on some samples from the training data. The probability of churn for each of the users is seen in the `probability` column, with the predicted label under the `predicted_churn` column.

Additionally, you can also generate explanations in the form of feature attributions using the `ML.EXPLAIN_PREDICT()` function. This allows you to interpret the top contributing features for each prediction.

The output returned consists of the following additional columns:

- `predicted_churned`: Column with the predicted value of the label (churned).

- `probability`: The probability of the predicted label class.

- `top_feature_attributions`:  An ARRAY of STRUCTs containing the attributions of the top k features to the final prediction.

    - `top_feature_attributions.feature`: The feature name.
    
    - `top_feature_attributions.attribution`: Attribution of the feature to the final prediction.
    
- `baseline_prediction_value`: For linear models, the baseline_prediction_value is the intercept of the model.

- `prediction_value`: The [logit](https://en.wikipedia.org/wiki/Logit) value (i.e., log-odds) for the predicted class.

- `approximation_error`: The approximation error for the algorithm used for explanations. In case of the current logistic regression model, there is no approximation error and this field is always 0. Learn more about the [approximation error](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-explain-predict#mlexplain_predict_output).

Learn more about [**ML.EXPLAIN_PREDICT**](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-xai-overview). 

Note: You can simply use the [**ML.PREDICT**](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict#mlpredict_function) to get predictions without explanations.

In [None]:
# write sql query for predictions with explanations
explain_predict_query = f"""
    SELECT
      *
    FROM
      ML.EXPLAIN_PREDICT(
        MODEL `{PROJECT_ID}.{BQ_DATASET_NAME}.{BQ_MODEL_NAME}`,
        (
            SELECT * FROM `{PROJECT_ID}.{BQ_DATASET_NAME}.train` LIMIT 10
        )
    )
"""
# run the sql query
run_bq_query(explain_predict_query)

## Get model from Model Registry

The `CREATE MODEL` statement with `MODEL_REGISTRY` set to `VERTEX_AI` from earlier creates a model in the Vertex AI Model Registry. 

Fetch the model resource using the model name.

In [None]:
model = vertex_ai.Model(model_name=BQ_MODEL_NAME)

print(model.gca_resource)

## Create an endpoint for deployment

While BQML supports batch prediction, it is not suitable for real-time predictions where you need low latency predictions with potentially high frequency of requests. Hence, deploying the BQML model to an endpoint enables you to do online predictions.

Create a Vertex AI Endpoint to deploy the BQML model. 

In [None]:
ENDPOINT_NAME = "churn-model-endpoint-unique"  # @param {type:"string"}

endpoint = vertex_ai.Endpoint.create(
    display_name=ENDPOINT_NAME,
    project=PROJECT_ID,
    location=REGION,
)

print("Endpoint display name:", endpoint.display_name)
print("Endpoint resource name:", endpoint.resource_name)

## Deploy the model to the endpoint

Deploy the model resource to the created endpoint resource using the [**deploy**](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_deploy) method.

When your BQML model is registered in Vertex AI Model Registry, and if it is an Explainable AI supported model type, you can enable Explainable AI on the model when deploying to an endpoint. For this tutorial, you deploy the model without Explainable AI.

Learn more about [deploying BQML model to Vertex AI Endpoint with Explainable AI enabled](https://cloud.google.com/bigquery/docs/vertex-xai#enable_explainable_ai_in).

In [None]:
# deploying the model to the endpoint may take 10-15 minutes
model.deploy(endpoint=endpoint)

## Create payload for online prediction

Query the training data for some samples (say 2 records) to send as an online request to the endpoint. The instances payload need to be formatted as a list of key value pairs where a key represents a feature.

In [None]:
# query to select two instances for testing
test_df = run_bq_query(
    f"""
    SELECT 
    * except(user_first_engagement, user_pseudo_id, churned) 
    FROM `{PROJECT_ID}.{BQ_DATASET_NAME}.train` 
    LIMIT 2
    """
)
# convert the dataframe to json instances to send
df_sample_requests_list = json.loads(test_df.to_json(orient="records"))

# display the data
test_df

## Send online request to the endpoint

Use endpoint's **predict** method and send the list of instances for prediction.

Predictions from the returned response consist of the following:

- `churned_probs`: probabilities for each of the classes.
- `predicted_churned`: class predicted.
- `churned_values`: possible values for the classes (1 for churned and 0 for not churned).

In [None]:
# send prediction request to the endpoint
prediction = endpoint.predict(df_sample_requests_list)
# print the predictions
print(prediction.predictions)

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Vertex AI Endpoint
- Vertex AI Model (BQML model can not be deleted from Vertex AI and needs to be deleted from BigQuery)
- BigQuery Dataset
- Cloud Storage bucket

In [None]:
# set True to delete the dataset
delete_dataset = False
# set True to delete the bucket
delete_bucket = False

# undeploy model from the endpoint
endpoint.undeploy_all()
# # delete the endpoint
endpoint.delete()

# delete BigQuery dataset
if delete_dataset or os.getenv("IS_TESTING"):
    ! bq rm -r -f $PROJECT_ID:$BQ_DATASET_NAME
# delete the Cloud Storage bucket
if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil -m rm -r $BUCKET_URI