In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/vertex-ai-samples/notebooks/community/feature_store/mobile_gaming_feature_store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/inardini/vertex-ai-samples/blob/main/vertex-ai-samples/notebooks/community/feature_store/mobile_gaming_feature_store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>


## Overview

Imagine you are a member of the Data Science team working on the same Mobile Gaming application reported in the [Churn prediction for game developers using Google Analytics 4 (GA4) and BigQuery ML](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml) blog post. 

Your team successfully implemented a model that determines the likelihood of specific users returning to your app and consumes that insight to drive marketing incentives. As a result, the company consolidates its user base. 

Now, businesses want to use that information in real-time to monetize it by implementing a conditional ads system. In particular, each time a user plays with the app, they want to display ads depending on the customer demographic,  behavioral information and the resulting propensity of return. Of course, the new application should work with a minimum impact on the user experience. 

Given the business challenge, the team is required to design and build a possible serving system which needs to minimize real-time prediction serving latency. 

The assumptions are:

1.   Predictions would be delivered synchronously
2.   Scalability, support for multiple ML frameworks and security are essential.
3.   Only demographic features (country, operating system and language) passed in real time.
2.   The system would be able to handle behavioral features as static reference features calculated each 24h (offline batch feature engineering job). 
3.   It has to migrate training serving skew by a timestamp data model, a point-in-time lookups to avoid data leakage and a feature distribution monitoring service. 

Based on those assumptions, low read-latency lookup data store and a performing serving engine are needed. Indeed, about the data store, even if you can implement governance on BigQuery, it is still not optimized for singleton lookup operations. Also, the solution need a low overhead serving system that can seamlessly scale up and down based on requests.

Last year, Google Cloud announced Vertex AI, a managed machine learning (ML) platform that allows data science teams to accelerate the deployment and maintenance of ML models. The platform is composed of several building blocks and two of them are Vertex AI Feature store and Vertex AI prediction. 

With Vertex AI Feature store, you have a managed service for low latency scalable feature serving. It also provides a centralized feature repository with easy APIs to search & discover features and feature monitoring capabilities to track drift and other quality issues. With Vertex AI Prediction, you will deploy models into production more easily with online serving via HTTP or batch prediction for bulk scoring. It offers a unified scalable framework to deploy custom models trained in TensorFlow, scikit or XGB, as well as BigQuery ML and AutoML models, and on a broad range of machine types and GPUs.

Below the high level picture puts together once the team decides to go with Google Cloud:

<img src="./assets/solution_overview_final.png"/>

In order:

1.   Once you create historical features, they are ingested into Vertex AI Feature store

2.   Then you can train and deploy the model using BigQuery (or AutoML)

3.   Once the model is deployed, the ML serving engine will receive a prediction request passing entity ID and demographic attributes. 

4.   Features related to a specific entity will be retrieved from the Vertex AI Feature store and passed them as inputs to the model for online prediction.

5.   The predictions will be returned back to the activation layer.


### Dataset

The dataset is the public sample export data from an actual mobile game app called "Flood It!" (Android, iOS)

### Objective

In the following notebook, you will learn the role of Vertex AI Feature Store in a scenario when the user's activities within the first 24 hours of first user engagement and the gaming platform would consume in order to offer conditional ads.

**Notice that we assume that already know how to set up a Vertex AI Feature store. In case you are not, please check out [this detailed notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/gapic-feature-store.ipynb).**

At the end, you will more confident about how Vertex AI Feature store

1.   Provide a centralized feature repository with easy APIs to search & discover features and fetch them for training/serving. 

2.   Simplify deployments of models for Online Prediction, via low latency scalable feature serving.

3.   Mitigate training serving skew and data leakage by performing point in time lookups to fetch historical data for training.

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* BigQuery
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or Google Cloud Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

1. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3. Activate the virtual environment.

1. To install Jupyter, run `pip3 install jupyter` on the
command-line in a terminal shell.

1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

Install additional package dependencies not installed in your notebook environment, such as {XGBoost, AdaNet, or TensorFlow Hub TODO: Replace with relevant packages for the tutorial}. Use the latest major GA version of each package.

In [1]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

In [2]:
! pip3 install --upgrade pip
! pip3 install {USER_FLAG} --upgrade git+https://github.com/googleapis/python-aiplatform.git@main -q --no-warn-conflicts
! pip3 install {USER_FLAG} --upgrade pandas==1.3.5 -q --no-warn-conflicts
! pip3 install {USER_FLAG} --upgrade google-cloud-bigquery==2.24.0 -q --no-warn-conflicts
! pip3 install {USER_FLAG} --upgrade tensorflow==2.8.0 -q --no-warn-conflicts



### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API and Compute Engine API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component). 

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [3]:
import os

PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Project ID:  inardini-playground


Otherwise, set your project ID here.

In [4]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = ""  # @param {type:"string"}

In [5]:
!gcloud config set project '' #change it

[1;31mERROR:[0m (gcloud.config.set) The project property is set to the empty string, which is invalid.
To set your project, run:

  $ gcloud config set project PROJECT_ID

or to unset it, run:

  $ gcloud config unset project


#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial.

In [6]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Google Cloud Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [7]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

Set the name of your Cloud Storage bucket below. It must be unique across all
Cloud Storage buckets.

You may also change the `REGION` variable, which is used for operations
throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are
available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may
not use a Multi-Regional Storage bucket for training with Vertex AI.

In [8]:
BUCKET_URI = ""  # @param {type:"string"}
REGION = "[your-region]"  # @param {type:"string"}

In [9]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://[your-bucket-name]":
    BUCKET_URI = "gs://" + PROJECT_ID + "-aip-" + TIMESTAMP

if REGION == "[your-region]":
    REGION = "us-central1"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [10]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

Creating gs://inardini-playground-aip-20220227164900/...


Run the following cell to grant access to your Cloud Storage resources from Vertex AI Feature store

In [11]:
! gsutil uniformbucketlevelaccess set on $BUCKET_URI

Enabling Uniform bucket-level access for gs://inardini-playground-aip-20220227164900...


Finally, validate access to your Cloud Storage bucket by examining its contents:

In [12]:
! gsutil ls -al $BUCKET_URI

### Create a Bigquery dataset

In [13]:
BQ_DATASET = "Mobile_Gaming"  # @param {type:"string"}
LOCATION = "US"

!bq mk --location=$LOCATION --dataset $PROJECT_ID:$BQ_DATASET

BigQuery error in mk operation: Dataset 'inardini-playground:Mobile_Gaming'
already exists.


### Import libraries

In [14]:
# General
import os
import sys
import time

# Data Engineering
import pandas as pd
# Vertex AI and its Feature Store
from google.cloud import aiplatform as vertex_ai
from google.cloud import bigquery
# EntityType
from google.cloud.aiplatform import (Featurestore, Feature)

### Define constants

In [15]:
# Data Engineering and Feature Engineering
FEATURES_TABLE = "wide_features_table"  # @param {type:"string"}
MIN_DATE = "2018-10-03"
MAX_DATE = "2018-10-04"
FEATURES_TABLE_DAY_ONE = f"wide_features_table_{MIN_DATE}"
FEATURES_TABLE_DAY_TWO = f"wide_features_table_{MAX_DATE}"
FEATURESTORE_ID = "mobile_gaming"  # @param {type:"string"}
ENTITY_TYPE_ID = "user"

# BQ Model Training and Deployment
MODEL_NAME = f"churn_logit_classifier_{TIMESTAMP}"
MODEL_TYPE = "LOGISTIC_REG"
AUTO_CLASS_WEIGHTS = "TRUE"
MAX_ITERATIONS = "50"
INPUT_LABEL_COLS = "churned"
JOB_ID = f"extract_{MODEL_NAME}_{TIMESTAMP}"
MODEL_SOURCE = bigquery.model.ModelReference.from_api_repr(
    {"projectId": PROJECT_ID, "datasetId": BQ_DATASET, "modelId": MODEL_NAME}
)
SERVING_DIR = "serving_dir"
DESTINATION_URI = f"{BUCKET_URI}/model"
EXTRACT_JOB_CONFIG = bigquery.ExtractJobConfig(destination_format="ML_TF_SAVED_MODEL")
VERSION = "v1"
SERVING_CONTAINER_IMAGE_URI = (
    "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest"
)
ENDPOINT_NAME = "mobile_gaming_churn"
DEPLOYED_MODEL_NAME = f"churn_logistic_classifier_{VERSION}"

# Vertex AI Feature store
ONLINE_STORE_NODES_COUNT = 3
ENTITY_ID = "user"
API_ENDPOINT = f"{REGION}-aiplatform.googleapis.com"
FEATURE_TIME = "user_first_engagement"
ENTITY_ID_FIELD = "user_pseudo_id"
BQ_SOURCE_URI_DAY_ONE = f"bq://{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE_DAY_ONE}"
BQ_SOURCE_URI_DAY_TWO = f"bq://{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE_DAY_TWO}"
BQ_DESTINATION_OUTPUT_URI = f"bq://{PROJECT_ID}.{BQ_DATASET}.train_snapshot_{TIMESTAMP}"
SERVING_FEATURE_IDS = {"customer": ["*"]}
READ_INSTANCES_TABLE = f"ground_truth_{TIMESTAMP}"
READ_INSTANCES_URI = f"bq://{PROJECT_ID}.{BQ_DATASET}.{READ_INSTANCES_TABLE}"

# Vertex AI AutoML model
DATASET_NAME = f"churn_mobile_gaming_{TIMESTAMP}"
AUTOML_TRAIN_JOB_NAME = f"automl_classifier_training_{TIMESTAMP}"
AUTOML_MODEL_NAME = f"churn_automl_classifier_{TIMESTAMP}"
MODEL_DEPLOYED_NAME = "churn_automl_classifier_v1"
SERVING_MACHINE_TYPE = "n1-highcpu-4"
MIN_NODES = 1
MAX_NODES = 1

### Helpers

In [16]:
def run_bq_query(query: str):
    """
    An helper function to run a BigQuery job
    Args:
        query: a formatted SQL query
    Returns:
        None
    """
    try:
        job = bq_client.query(query)
        _ = job.result()
    except RuntimeError as error:
        print(error)


def upload_model(
    display_name: str,
    serving_container_image_uri: str,
    artifact_uri: str,
    sync: bool = True,
) -> vertex_ai.Model:
    """

    Args:
        display_name: The name of Vertex AI Model artefact
        serving_container_image_uri: The uri of the serving image
        artifact_uri: The uri of artefact to import
        sync:

    Returns: Vertex AI Model

    """
    model = vertex_ai.Model.upload(
        display_name=display_name,
        artifact_uri=artifact_uri,
        serving_container_image_uri=serving_container_image_uri,
        sync=sync,
    )
    model.wait()
    print(model.display_name)
    print(model.resource_name)
    return model


def create_endpoint(display_name: str) -> vertex_ai.Endpoint:
    """
    An utility to create a Vertex AI Endpoint
    Args:
        display_name: The name of Endpoint

    Returns: Vertex AI Endpoint

    """
    endpoint = vertex_ai.Endpoint.create(display_name=display_name)

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint


def deploy_model(
    model: vertex_ai.Model,
    machine_type: str,
    endpoint: vertex_ai.Endpoint = None,
    deployed_model_display_name: str = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    sync: bool = True,
) -> vertex_ai.Model:
    """
    An helper function to deploy a Vertex AI Endpoint
    Args:
        model: A Vertex AI Model
        machine_type: The type of machine to serve the model
        endpoint: An Vertex AI Endpoint
        deployed_model_display_name: The name of the model
        min_replica_count: Minimum number of serving replicas
        max_replica_count: Max number of serving replicas
        sync: Whether to execute method synchronously

    Returns: vertex_ai.Model

    """
    model_deployed = model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        sync=sync,
    )

    model_deployed.wait()

    print(model_deployed.display_name)
    print(model_deployed.resource_name)
    return model_deployed


def endpoint_predict_sample(
    instances: list, endpoint: vertex_ai.Endpoint
) -> vertex_ai.models.Prediction:
    """
    An helper function to get prediction from Vertex AI Endpoint
    Args:
        instances: The list of instances to score
        endpoint: An Vertex AI Endpoint

    Returns:
        vertex_ai.models.Prediction

    """
    prediction = endpoint.predict(instances=instances)
    print(prediction)
    return prediction


def simulate_prediction(
    endpoint: vertex_ai.Endpoint, online_sample: dict
) -> vertex_ai.models.Prediction:
    """
    An helper function to simulate online prediction with customer entity type
        - format entities for prediction
        - retrive static features with a singleton lookup operations from Vertex AI Feature store
        - run the prediction request and get back the result
    Args:
        endpoint:
        online_sample:

    Returns:
        vertex_ai.models.Prediction
    """
    online_features = pd.DataFrame.from_dict(online_sample)
    entity_ids = online_features["entity_id"].tolist()

    customer_aggregated_features = customer_entity_type.read(
        entity_ids=entity_ids,
        feature_ids=[
            "cnt_user_engagement",
            "cnt_level_start_quickplay",
            "cnt_level_end_quickplay",
            "cnt_level_complete_quickplay",
            "cnt_level_reset_quickplay",
            "cnt_post_score",
            "cnt_spend_virtual_currency",
            "cnt_ad_reward",
            "cnt_challenge_a_friend",
            "cnt_completed_5_levels",
            "cnt_use_extra_steps",
        ],
    )

    prediction_sample_df = pd.merge(
        customer_aggregated_features.set_index("entity_id"),
        online_features.set_index("entity_id"),
        left_index=True,
        right_index=True,
    ).reset_index(drop=True)

    prediction_sample = prediction_sample_df.to_dict("records")
    prediction = endpoint.predict(prediction_sample)
    return prediction

# Setting the Online (real-time) prediction scenario

As we mentioned at the beginning, this section would simulate the original but this time we introduce Vertex AI for online (real-time) serving. In particular, we will

1.   Create static features including demographic and behavioral attibutes
2.   Training a simple BQML model
3.   Export and deploy the model to Vertex AI endpoint


<img src="./assets/data_processing.png"/>



## Initiate clients

In [17]:
bq_client = bigquery.Client(project=PROJECT_ID, location=LOCATION)
vertex_ai.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

## Data and Feature Engineering 

The original dataset contains raw event data we cannot ingest in the feature store as they are.
In this section, we will pre-process the raw data into an appropriate format. 

**Notice we simulate those transformations in different point of time (day one and day two).**

### Label, Demographic and Behavioral Transformations

This section is based on the [Churn prediction for game developers using Google Analytics 4 (GA4) and BigQuery ML](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml?utm_source=linkedin&utm_medium=unpaidsoc&utm_campaign=FY21-Q2-Google-Cloud-Tech-Blog&utm_content=google-analytics-4&utm_term=-) blog article by Minhaz Kazi and Polong Lin

In [18]:
preprocess_sql_query = f"""
CREATE OR REPLACE TABLE
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE}` AS
WITH
  # query to create label --------------------------------------------------------------------------------
  get_label AS (
  SELECT
    user_pseudo_id,
    user_first_engagement,
    user_last_engagement,
    # EXTRACT(MONTH from TIMESTAMP_MICROS(user_first_engagement)) as month,
    # EXTRACT(DAYOFYEAR from TIMESTAMP_MICROS(user_first_engagement)) as julianday,
    # EXTRACT(DAYOFWEEK from TIMESTAMP_MICROS(user_first_engagement)) as dayofweek,

    #add 24 hr to user's first touch
    (user_first_engagement + 86400000000) AS ts_24hr_after_first_engagement,

    #churned = 1 if last_touch within 24 hr of app installation, else 0
    IF (user_last_engagement < (user_first_engagement + 86400000000),
        1,
        0 ) AS churned,

    #bounced = 1 if last_touch within 10 min, else 0
    IF (user_last_engagement <= (user_first_engagement + 600000000),
        1,
        0 ) AS bounced,
  FROM
    (
      SELECT
      user_pseudo_id,
      MIN(event_timestamp) AS user_first_engagement,
      MAX(event_timestamp) AS user_last_engagement
      FROM
        `firebase-public-project.analytics_153293282.events_*`
      WHERE event_name="user_engagement"
      GROUP BY
        user_pseudo_id
    )
  GROUP BY 1,2,3),

  # query to create class weights --------------------------------------------------------------------------------
  get_class_weights AS (
  SELECT
    CAST(COUNT(*) / (2*(COUNT(*) - SUM(churned))) AS STRING) AS class_weight_zero,
    CAST(COUNT(*) / (2*SUM(churned)) AS STRING) AS class_weight_one,
  FROM
    get_label
    ),

  # query to extract demographic data for each user ---------------------------------------------------------
  get_demographic_data AS (
  SELECT * EXCEPT (row_num)
  FROM (
    SELECT
      user_pseudo_id,
      geo.country as country,
      device.operating_system as operating_system,
      device.language as language,
      ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp DESC) AS row_num
    FROM `firebase-public-project.analytics_153293282.events_*`
    WHERE event_name="user_engagement")
  WHERE row_num = 1),

  # query to extract behavioral data for each user ----------------------------------------------------------
  get_behavioral_data AS (
  SELECT
    user_pseudo_id,
    SUM(IF(event_name = 'user_engagement', 1, 0)) AS cnt_user_engagement,
    SUM(IF(event_name = 'level_start_quickplay', 1, 0)) AS cnt_level_start_quickplay,
    SUM(IF(event_name = 'level_end_quickplay', 1, 0)) AS cnt_level_end_quickplay,
    SUM(IF(event_name = 'level_complete_quickplay', 1, 0)) AS cnt_level_complete_quickplay,
    SUM(IF(event_name = 'level_reset_quickplay', 1, 0)) AS cnt_level_reset_quickplay,
    SUM(IF(event_name = 'post_score', 1, 0)) AS cnt_post_score,
    SUM(IF(event_name = 'spend_virtual_currency', 1, 0)) AS cnt_spend_virtual_currency,
    SUM(IF(event_name = 'ad_reward', 1, 0)) AS cnt_ad_reward,
    SUM(IF(event_name = 'challenge_a_friend', 1, 0)) AS cnt_challenge_a_friend,
    SUM(IF(event_name = 'completed_5_levels', 1, 0)) AS cnt_completed_5_levels,
    SUM(IF(event_name = 'use_extra_steps', 1, 0)) AS cnt_use_extra_steps,
  FROM (
    SELECT
      e.*
    FROM
      `firebase-public-project.analytics_153293282.events_*` e
    JOIN
      get_label r
    ON
      e.user_pseudo_id = r.user_pseudo_id
    WHERE
      e.event_timestamp <= r.ts_24hr_after_first_engagement
    )
  GROUP BY 1)

SELECT
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(ret.user_first_engagement))) AS user_first_engagement,
    # ret.month,
    # ret.julianday,
    # ret.dayofweek,
    dem.*,
    CAST(IFNULL(beh.cnt_user_engagement, 0) AS FLOAT64)  AS cnt_user_engagement,
    CAST(IFNULL(beh.cnt_level_start_quickplay, 0) AS FLOAT64) AS cnt_level_start_quickplay,
    CAST(IFNULL(beh.cnt_level_end_quickplay, 0) AS FLOAT64) AS cnt_level_end_quickplay,
    CAST(IFNULL(beh.cnt_level_complete_quickplay, 0) AS FLOAT64) AS cnt_level_complete_quickplay,
    CAST(IFNULL(beh.cnt_level_reset_quickplay, 0) AS FLOAT64) AS cnt_level_reset_quickplay,
    CAST(IFNULL(beh.cnt_post_score, 0) AS FLOAT64) AS cnt_post_score,
    CAST(IFNULL(beh.cnt_spend_virtual_currency, 0) AS FLOAT64) AS cnt_spend_virtual_currency,
    CAST(IFNULL(beh.cnt_ad_reward, 0) AS FLOAT64) AS cnt_ad_reward,
    CAST(IFNULL(beh.cnt_challenge_a_friend, 0) AS FLOAT64) AS cnt_challenge_a_friend,
    CAST(IFNULL(beh.cnt_completed_5_levels, 0) AS FLOAT64) AS cnt_completed_5_levels,
    CAST(IFNULL(beh.cnt_use_extra_steps, 0) AS FLOAT64) AS cnt_use_extra_steps,
    ret.churned as churned,
    CASE
      WHEN churned = 0 THEN ( SELECT class_weight_zero FROM get_class_weights)
      ELSE ( SELECT class_weight_one
       FROM get_class_weights)
    END AS class_weights
FROM
  get_label ret
LEFT OUTER JOIN
  get_demographic_data dem
ON 
  ret.user_pseudo_id = dem.user_pseudo_id
LEFT OUTER JOIN 
  get_behavioral_data beh
ON
  ret.user_pseudo_id = beh.user_pseudo_id
WHERE ret.bounced = 0
"""

In [19]:
run_bq_query(preprocess_sql_query)

### Create table to update entities

In [20]:
processed_sql_query_day_one = f"""
CREATE OR REPLACE TABLE 
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE_DAY_ONE}` AS
SELECT
  *
FROM
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE}`
WHERE
    user_first_engagement < '{MAX_DATE}'
"""

processed_sql_query_day_two = f"""
CREATE OR REPLACE TABLE 
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE_DAY_TWO}` AS
SELECT
  *
FROM
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE}`
WHERE
  user_first_engagement >= '{MAX_DATE}'
"""

In [21]:
queries = processed_sql_query_day_one, processed_sql_query_day_two
for query in queries:
    run_bq_query(query)

## Model Training

We created demographic and aggregate behavioral features. It is time to train our BQML model.


#### Train an Logistic classifier model

In [None]:
train_model_query = f"""
CREATE OR REPLACE MODEL `{PROJECT_ID}.{BQ_DATASET}.{MODEL_NAME}`
OPTIONS(MODEL_TYPE='{MODEL_TYPE}',
        AUTO_CLASS_WEIGHTS={AUTO_CLASS_WEIGHTS},
        MAX_ITERATIONS={MAX_ITERATIONS},
        INPUT_LABEL_COLS=['{INPUT_LABEL_COLS}'])
AS SELECT * EXCEPT(user_first_engagement, user_pseudo_id, class_weights)
   FROM `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE_DAY_ONE}`;
"""

In [None]:
run_bq_query(train_model_query)

## Model Deployment

Once we get the model, you can export it and deploy to an Vertex AI Endpoint. 

This is just one of the 5 ways to use BigQuery and Vertex AI together. [Check](https://cloud.google.com/blog/products/ai-machine-learning/five-integrations-between-vertex-ai-and-bigquery) this article to know more about them. 


#### Export the model

In [None]:
model_extract_job = bigquery.ExtractJob(
    client=bq_client,
    job_id=JOB_ID,
    source=MODEL_SOURCE,
    destination_uris=[DESTINATION_URI],
    job_config=EXTRACT_JOB_CONFIG,
)

In [None]:
try:
    job = model_extract_job.result()
except job.error_result as error:
    print(error)

#### (Locally) Check the SavedModel format

In [None]:
%%bash -s "$SERVING_DIR" "$DESTINATION_URI" 
mkdir -p -m 777 $1
gsutil cp -r $2 $1

In [None]:
%%bash -s "$SERVING_DIR"
saved_model_cli show --dir $1/model/ --all

#### Upload and Deploy Model on Vertex AI Endpoint

In [None]:
bq_model = upload_model(
    display_name=MODEL_NAME,
    serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    artifact_uri=DESTINATION_URI,
)

In [None]:
endpoint = create_endpoint(display_name=ENDPOINT_NAME)

In [None]:
deployed_model = deploy_model(
    model=bq_model,
    machine_type="n1-highcpu-4",
    endpoint=endpoint,
    deployed_model_display_name=DEPLOYED_MODEL_NAME,
    min_replica_count=1,
    max_replica_count=1,
    sync=True,
)

#### Test predictions

In [None]:
instance = {
    "cnt_ad_reward": 0,
    "cnt_challenge_a_friend": 0,
    "cnt_completed_5_levels": 0,
    "cnt_level_complete_quickplay": 0,
    "cnt_level_end_quickplay": 0,
    "cnt_level_reset_quickplay": 0,
    "cnt_level_start_quickplay": 0,
    "cnt_post_score": 0,
    "cnt_spend_virtual_currency": 0,
    "cnt_use_extra_steps": 0,
    "cnt_user_engagement": 14,
    "country": "United States",
    "language": "en-us",
    "operating_system": "ANDROID",
}

In [None]:
bqml_predictions = endpoint_predict_sample(instances=[instance], endpoint=endpoint)

# Serve ML features at scale with low latency

At that point, **we deploy our simple model which would requires fetching aggregated attributes as input features in real time**. 

That's why **we need a datastore optimized for singleton lookup operations** which would be able to scale and serve those aggregated feature online in low latency. 

In other terms, we need to introduce Vertex AI Feature Store. Again, we assume you already know how to set up and work with a Vertex AI Feature store.


## Feature store for features management

In this section, we explore all Feature store management activities from create a Featurestore resource all way down to read feature values online.

Below you can see the feature store data model and a plain representation of how the data will be organized.

<img src="./assets/data_model_3.png"/>


### Create featurestore, ```mobile_gaming```

In [22]:
print(f"Listing all featurestores in {PROJECT_ID}")
feature_store_list = Featurestore.list()
if len(list(feature_store_list)) == 0:
    print(f"The {PROJECT_ID} is empty!")
else:
    for fs in feature_store_list:
        print("Found featurestore: {}".format(fs.resource_name))

Listing all featurestores in inardini-playground
Found featurestore: projects/309823771116/locations/us-central1/featurestores/loan_classifier


In [23]:
try:
    mobile_gaming_feature_store = Featurestore.create(
        featurestore_id=FEATURESTORE_ID,
        online_store_fixed_node_count=ONLINE_STORE_NODES_COUNT,
        labels={"team": "dataoffice", "app": "mobile_gaming"},
        sync=True,
    )
except RuntimeError as error:
    print(error)
else:
    FEATURESTORE_RESOURCE_NAME = mobile_gaming_feature_store.resource_name
    print(f"Feature store created: {FEATURESTORE_RESOURCE_NAME}")

INFO:google.cloud.aiplatform.featurestore.featurestore:Creating Featurestore
INFO:google.cloud.aiplatform.featurestore.featurestore:Create Featurestore backing LRO: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/operations/8350276859992211456
INFO:google.cloud.aiplatform.featurestore.featurestore:Featurestore created. Resource name: projects/309823771116/locations/us-central1/featurestores/mobile_gaming
INFO:google.cloud.aiplatform.featurestore.featurestore:To use this Featurestore in another session:
INFO:google.cloud.aiplatform.featurestore.featurestore:featurestore = aiplatform.Featurestore('projects/309823771116/locations/us-central1/featurestores/mobile_gaming')
Feature store created: projects/309823771116/locations/us-central1/featurestores/mobile_gaming


### Create the ```User``` entity type and its features

In [24]:
try:
    user_entity_type = mobile_gaming_feature_store.create_entity_type(
        entity_type_id=ENTITY_ID, description="User Entity", sync=True
    )
except RuntimeError as error:
    print(error)
else:
    USER_ENTITY_RESOURCE_NAME = user_entity_type.resource_name
    print("Entity type name is", USER_ENTITY_RESOURCE_NAME)

INFO:google.cloud.aiplatform.featurestore.entity_type:Creating EntityType
INFO:google.cloud.aiplatform.featurestore.entity_type:Create EntityType backing LRO: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/operations/2479834745714769920
INFO:google.cloud.aiplatform.featurestore.entity_type:EntityType created. Resource name: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
INFO:google.cloud.aiplatform.featurestore.entity_type:To use this EntityType in another session:
INFO:google.cloud.aiplatform.featurestore.entity_type:entity_type = aiplatform.EntityType('projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user')
Entity type name is projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user


### Set Feature Monitoring

Feature [monitoring](https://cloud.google.com/vertex-ai/docs/featurestore/monitoring) is in preview, so you need to use v1beta1 Python which is a lower-level API than the one we've used so far in this notebook. 

The easiest way to set this for now is using [console UI](https://console.cloud.google.com/vertex-ai/features). For completeness, below is example to do this using v1beta1 SDK.

In [25]:
from google.cloud.aiplatform_v1beta1 import \
    FeaturestoreServiceClient as v1beta1_FeaturestoreServiceClient
from google.cloud.aiplatform_v1beta1.types import \
    entity_type as v1beta1_entity_type_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_monitoring as v1beta1_featurestore_monitoring_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_service as v1beta1_featurestore_service_pb2
from google.protobuf.duration_pb2 import Duration

v1beta1_admin_client = v1beta1_FeaturestoreServiceClient(
    client_options={"api_endpoint": API_ENDPOINT}
)

In [26]:
v1beta1_admin_client.update_entity_type(
    v1beta1_featurestore_service_pb2.UpdateEntityTypeRequest(
        entity_type=v1beta1_entity_type_pb2.EntityType(
            name=v1beta1_admin_client.entity_type_path(
                PROJECT_ID, REGION, FEATURESTORE_ID, ENTITY_ID
            ),
            monitoring_config=v1beta1_featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                snapshot_analysis=v1beta1_featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                    monitoring_interval=Duration(seconds=86400),  # 1 day
                ),
            ),
        ),
    )
)

name: "projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user"
description: "User Entity"
create_time {
  seconds: 1645980601
  nanos: 333247000
}
update_time {
  seconds: 1645980604
  nanos: 513308000
}
etag: "AMEw9yM5ZLlNfcd27-m0WAydruznrVkBgytFnFl1ILDjx4Yhc4MtvPTElP3TNEqqc1NV"
monitoring_config {
  snapshot_analysis {
    monitoring_interval {
      seconds: 86400
    }
    monitoring_interval_days: 1
  }
}

### Create features

#### Create Feature configuration

For simplicity, I created the configuration in a declarative way. Of course, we can create an helper function to built it from Bigquery schema.
Also notice that we want to pass some feature on-fly. In this case, it country, operating system and language looks perfect for that.

In [27]:
feature_configs = {
    "country": {
        "value_type": "STRING",
        "description": "The country of customer",
        "labels": {"status": "passed"},
    },
    "operating_system": {
        "value_type": "STRING",
        "description": "The operating system of device",
        "labels": {"status": "passed"},
    },
    "language": {
        "value_type": "STRING",
        "description": "The language of device",
        "labels": {"status": "passed"},
    },
    "cnt_user_engagement": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement level",
        "labels": {"status": "passed"},
    },
    "cnt_level_start_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with start level",
        "labels": {"status": "passed"},
    },
    "cnt_level_end_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with end level",
        "labels": {"status": "passed"},
    },
    "cnt_level_complete_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with complete status",
        "labels": {"status": "passed"},
    },
    "cnt_level_reset_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with reset status",
        "labels": {"status": "passed"},
    },
    "cnt_post_score": {
        "value_type": "DOUBLE",
        "description": "A variable of user score",
        "labels": {"status": "passed"},
    },
    "cnt_spend_virtual_currency": {
        "value_type": "DOUBLE",
        "description": "A variable of user virtual amount",
        "labels": {"status": "passed"},
    },
    "cnt_ad_reward": {
        "value_type": "DOUBLE",
        "description": "A variable of user reward",
        "labels": {"status": "passed"},
    },
    "cnt_challenge_a_friend": {
        "value_type": "DOUBLE",
        "description": "A variable of user challenges with friends",
        "labels": {"status": "passed"},
    },
    "cnt_completed_5_levels": {
        "value_type": "DOUBLE",
        "description": "A variable of user level 5 completed",
        "labels": {"status": "passed"},
    },
    "cnt_use_extra_steps": {
        "value_type": "DOUBLE",
        "description": "A variable of user extra steps",
        "labels": {"status": "passed"},
    },
    "churned": {
        "value_type": "INT64",
        "description": "A variable of user extra steps",
        "labels": {"status": "passed"},
    },
    "class_weights": {
        "value_type": "STRING",
        "description": "A variable of class weights",
        "labels": {"status": "passed"},
    },
}

#### Create features using `batch_create_features` method

In [28]:
try:
    user_entity_type.batch_create_features(
        feature_configs=feature_configs, sync=True
    )
except RuntimeError as error:
    print(error)
else:
    for feature in user_entity_type.list_features():
        print("")
        print(f"The resource name of {feature.name} feature is", feature.resource_name)

INFO:google.cloud.aiplatform.featurestore.entity_type:Batch creating features EntityType entityType: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
INFO:google.cloud.aiplatform.featurestore.entity_type:Batch create Features EntityType entityType backing LRO: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/operations/856287080047706112
INFO:google.cloud.aiplatform.featurestore.entity_type:EntityType entityType Batch created features. Resource name: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user

The resource name of cnt_level_start_quickplay feature is projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_level_start_quickplay

The resource name of cnt_completed_5_levels feature is projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_completed_5_levels

The resource name of cnt_post_sc

### Search features

In [29]:
feature_query = "feature_id:cnt_user_engagement"
searched_features = Feature.search(query=feature_query)
searched_features

[<google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f79bb11e310> 
 resource name: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_user_engagement]

### Import ```User``` feature values using ```ingest_from_bq``` method

You need to import feature values before you can use them for online/offline serving.

In [32]:
FEATURES_IDS = [feature.name for feature in user_entity_type.list_features()]

In [33]:
try:
    user_entity_type.ingest_from_bq(
        feature_ids=FEATURES_IDS,
        feature_time=FEATURE_TIME,
        bq_source_uri=BQ_SOURCE_URI_DAY_ONE,
        entity_id_field=ENTITY_ID_FIELD,
        disable_online_serving=False,
        worker_count=20,
        sync=True,
    )
except RuntimeError as error:
    print(error)

INFO:google.cloud.aiplatform.featurestore.entity_type:Importing EntityType feature values: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
INFO:google.cloud.aiplatform.featurestore.entity_type:Import EntityType feature values backing LRO: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/operations/7481082131909705728
INFO:google.cloud.aiplatform.featurestore.entity_type:EntityType feature values imported. Resource name: projects/309823771116/locations/us-central1/featurestores/mobile_gaming/entityTypes/user


**Comment: How does Vertex AI Feature Store mitigate training serving skew?**

Let's just think about what is happening for a second. 

We just ingest customer behavioral features we engineered before when we trained the model. And we are now going to serve the same features for online prediction.

But, what if those attributes on the incoming prediction requests would differ with respect to the one calculated during the model training? In particular, what if the correct attributes have different characteristics as the data the model was trained on? At that point, you should start perceiving this idea of **skew** between training and serving data. So what? Imagine now that the mobile gaming app go trending and users start challenging friends more frequently. This would change the distribution of the `cnt_challenge_a_friend`. But the model, which estimates your churn probability, was trained on a different distribution. And if we assume that type and frequency of ads depend on those predictions, it would happen that you target wrong users with wrong ads with an expected frequency because this offline-online feature inconsistency.

**Vertex AI Feature store** addresses those skew by an ingest-one and and re-used many logic. Indeed, once the feature is computed, the same features would be available both in training and serving. 

## Simulate online prediction requests

In [None]:
online_sample = {
    "entity_id": ["DE346CDD4A6F13969F749EA8047F282A"],
    "country": ["United States"],
    "operating_system": ["IOS"],
    "language": ["en"],
}

In [None]:
prediction = simulate_prediction(endpoint=endpoint, online_sample=online_sample)
print(prediction)

# Train a new churn ML model using Vertex AI AutoML

Now assume that you have a meeting with the team and you decide to use Vertex AI AutoML to train a new version of the model.

But while you were discussing about that, new data where ingested into the feature store.


## Ingest new data in the feature store

In [None]:
try:
    user_entity_type.ingest_from_bq(
        feature_ids=FEATURES_IDS,
        feature_time=FEATURE_TIME,
        bq_source_uri=BQ_SOURCE_URI_DAY_TWO,
        entity_id_field=ENTITY_ID_FIELD,
        disable_online_serving=False,
        worker_count=1,
        sync=True,
    )
except RuntimeError as error:
    print(error)

## Avoid data leakage with point-in-time lookup to fetch training data

Now, without a datastore with a timestamp data model, some data leakage would happen and you would end by training the new model on a different dataset. As a consequence, you cannot compare those models. In order to avoid that, **you need to be able to train model on the same data at same specific point in time we use in the previous version of the model**. 

<center><img src="./assets/point_in_time_2.png"/><center/>

**With the Vertex AI Feature store, you can fetch feature values corresponding to a particular timestamp thanks to point-in-time lookup capability.** In terms of SDK, you need to define a `read instances` object which is a list of entity id / timestamp pairs, where the entity id is the `user_pseudo_id` and `user_first_engagement` indicates we want to read the latest information available about that user. In this way, we will be able to reproduce the exact same training sample you need for the new model.

Let's see how to do that. 


### Define query for reading instances at a specific point in time

In [None]:
# WHERE ABS(MOD(FARM_FINGERPRINT(STRING(user_first_engagement, 'UTC')), 10)) < 8

read_instances_query = f"""
CREATE OR REPLACE TABLE
  `{PROJECT_ID}.{BQ_DATASET}.{READ_INSTANCES_TABLE}` AS
    SELECT
      user_pseudo_id  as customer,
      TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), SECOND, "UTC") as timestamp
    FROM
      `{BQ_DATASET}.{FEATURES_TABLE_DAY_ONE}` AS e
    ORDER BY
      user_first_engagement
"""

### Create the BigQuery instances table

In [None]:
run_bq_query(read_instances_query)

### Serve features for batch training


In [None]:
mobile_gaming_feature_store.batch_serve_to_bq(
    bq_destination_output_uri=BQ_DESTINATION_OUTPUT_URI,
    serving_feature_ids=SERVING_FEATURE_IDS,
    read_instances_uri=READ_INSTANCES_URI,
)

## Train and Deploy AutoML model on Vertex AI

Now that we reproduce the training sample, we use the Vertex AI SDK to train an new version of the model using Vertex AI AutoML.


### Create the Managed Tabular Dataset from a CSV

In [None]:
dataset = vertex_ai.TabularDataset.create(
    display_name=DATASET_NAME,
    bq_source=BQ_DESTINATION_OUTPUT_URI,
)

dataset.resource_name

### Create and Launch the Training Job to build the Model

In [None]:
automl_training_job = vertex_ai.AutoMLTabularTrainingJob(
    display_name=AUTOML_TRAIN_JOB_NAME,
    optimization_prediction_type="classification",
    optimization_objective="maximize-au-roc",
    column_transformations=[
        {"categorical": {"column_name": "country"}},
        {"categorical": {"column_name": "operating_system"}},
        {"categorical": {"column_name": "language"}},
        {"numeric": {"column_name": "cnt_user_engagement"}},
        {"numeric": {"column_name": "cnt_level_start_quickplay"}},
        {"numeric": {"column_name": "cnt_level_end_quickplay"}},
        {"numeric": {"column_name": "cnt_level_complete_quickplay"}},
        {"numeric": {"column_name": "cnt_level_reset_quickplay"}},
        {"numeric": {"column_name": "cnt_post_score"}},
        {"numeric": {"column_name": "cnt_spend_virtual_currency"}},
        {"numeric": {"column_name": "cnt_ad_reward"}},
        {"numeric": {"column_name": "cnt_challenge_a_friend"}},
        {"numeric": {"column_name": "cnt_completed_5_levels"}},
        {"numeric": {"column_name": "cnt_use_extra_steps"}},
    ],
)

# This will take around an 2 hours to run
automl_model = automl_training_job.run(
    dataset=dataset,
    target_column=INPUT_LABEL_COLS,
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    weight_column="class_weights",
    model_display_name=AUTOML_MODEL_NAME,
    disable_early_stopping=False,
)

### Deploy Model to the same Endpoint with Traffic Splitting

Vertex AI Endpoint provides a managed traffic splitting service. All you need to do is to define the splitting policy and then the service will deal it for you. 

Be sure that both models have the same serving function. In our case both BQML Logistic classifier and Vertex AI AutoML support same prediction format. 

In [None]:
model_deployed_id = endpoint.list_models()[0].id
RETRAIN_TRAFFIC_SPLIT = {"0": 50, model_deployed_id: 50}

In [None]:
endpoint.deploy(
    automl_model,
    deployed_model_display_name=MODEL_DEPLOYED_NAME,
    traffic_split=RETRAIN_TRAFFIC_SPLIT,
    machine_type=SERVING_MACHINE_TYPE,
    accelerator_count=0,
    min_replica_count=MIN_NODES,
    max_replica_count=MAX_NODES,
)

## Time to simulate online predictions

In [None]:
for i in range(2000):
    simulate_prediction(endpoint=endpoint, online_sample=online_sample)
    time.sleep(1)

Below the Vertex AI Endpoint UI result you would able to see after the online prediction simulation ends

<img src="./assets/prediction_results.jpg"/>

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial

In [None]:
# delete feature store
mobile_gaming_feature_store.delete(sync=True, force=True)

In [None]:
# delete Vertex AI resources
endpoint.undeploy_all()
bq_model.delete()
automl_model.delete()

In [None]:
%%bash -s "$SERVING_DIR"
rm -Rf $1

In [None]:
# Warning: Setting this to true will delete everything in your bucket
delete_bucket = False

if delete_bucket and "BUCKET_URI" in globals():
    ! gsutil -m rm -r $BUCKET_URI

In [None]:
# Delete the BigQuery Dataset
!bq rm -r -f -d $PROJECT_ID:$BQ_DATASET