# Using Vertex AI Feature Store to Explore Mobile Gaming

## Learning objectives

In this notebook, you learn how to:

1. Provide a centralized feature repository with easy APIs to search, discover and fetch features for training or serving. 
2. Simplify deployments of models for online prediction through low latency scalable feature serving.
3. Mitigate training serving skew and data leakage by performing point-in-time lookups to fetch historical data for training.

## Overview

Imagine you are a member of the Data Science team working on the same Mobile Gaming application reported in the [Churn prediction for game developers using Google Analytics 4 (GA4) and BigQuery ML](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml) blog post. 

Business wants to use that information in real-time to take immediate intervention actions in-game to prevent churn. In particular, for each player, they want to provide gaming incentives like new items or bonus packs depending on the customer demographic,  behavioral information and the resulting propensity of return. 

Last year, Google Cloud announced Vertex AI, a managed machine learning (ML) platform that allows data science teams to accelerate the deployment and maintenance of ML models. One of the platform building blocks is Vertex AI Feature store which provides a managed service for low latency scalable feature serving. Also it is a centralized feature repository with easy APIs to search & discover features and feature monitoring capabilities to track drift and other quality issues. 

In this notebook, you learn how the role of Vertex AI Feature Store in a ready to production scenario when the user's activities within the first 24 hours of last engagment and the gaming platform would consume in order to improve UX. Below you can find the high level picture of the system

<img src="./assets/mobile_gaming_architecture_1.png">


### Dataset

The dataset is the public sample export data from an actual mobile game app called "Flood It!" (Android, iOS)

**Notice that we assume that already know how to set up a Vertex AI Feature store. In case you are not, please check out [this detailed notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/gapic-feature-store.ipynb).**

Each learning objective will correspond to a _#TODO_ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/mobile_gaming_feature_store.ipynb)

### Install additional packages

Install additional package dependencies not installed in your notebook environment, such as {XGBoost, AdaNet, or TensorFlow Hub TODO: Replace with relevant packages for the tutorial}. Use the latest major GA version of each package.

In [1]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

In [2]:
# Install additional packages
! pip3 install {USER_FLAG} --upgrade pip
! pip3 install {USER_FLAG} --upgrade google-cloud-aiplatform==1.11.0 -q --no-warn-conflicts
! pip3 install {USER_FLAG} git+https://github.com/googleapis/python-aiplatform.git@main # For features monitoring
! pip3 install {USER_FLAG} --upgrade google-cloud-bigquery==2.24.0 -q --no-warn-conflicts
! pip3 install {USER_FLAG} --upgrade xgboost==1.1.1 -q --no-warn-conflicts

Collecting pip
  Downloading pip-22.1.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m35.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pip
[0mSuccessfully installed pip-22.1.1
[0mCollecting git+https://github.com/googleapis/python-aiplatform.git@main
  Cloning https://github.com/googleapis/python-aiplatform.git (to revision main) to /tmp/pip-req-build-6e4mrryo
  Running command git clone --filter=blob:none --quiet https://github.com/googleapis/python-aiplatform.git /tmp/pip-req-build-6e4mrryo
  Resolved https://github.com/googleapis/python-aiplatform.git to commit 095717c8b77dc5d66e677413a437ea6ed92e0b1a
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: google-cloud-aiplatform
  Building wheel for google-cloud-aiplatform (setup.py) ... [?25ldone
[?25h  Created wheel for google-cloud-aiplatform: filename=google_cloud_aiplatform-1.13.0-py2.py3-

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [3]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Enable the Vertex AI API and Compute Engine API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component). 

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [1]:
import os

PROJECT_ID = "qwiklabs-gcp-01-17ee7907a406" # Replace your project id here 

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Project ID:  qwiklabs-gcp-01-17ee7907a406


Otherwise, set your project ID here.

In [2]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = "qwiklabs-gcp-01-17ee7907a406"  # Replace your project id here 

In [3]:
!gcloud config set project $PROJECT_ID #change it

Updated property [core/project].


#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial.

In [4]:
# Import necessary library and define Timestamp
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

Set the name of your Cloud Storage bucket below. It must be unique across all
Cloud Storage buckets.

You may also change the `REGION` variable, which is used for operations
throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are
available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may
not use a Multi-Regional Storage bucket for training with Vertex AI.

In [5]:
BUCKET_URI = "gs://qwiklabs-gcp-01-17ee7907a406"  # Replace your bucket name here 
REGION = "us-central1"  # @param {type:"string"}

In [6]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://qwiklabs-gcp-01-17ee7907a406": # Replace your bucket name here 
    BUCKET_URI = "gs://" + PROJECT_ID + "-aip-" + TIMESTAMP

if REGION == "[your-region]":
    REGION = "us-central1"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [7]:
! gcloud storage buckets create --location=$REGION --project=$PROJECT_ID $BUCKET_URI

Creating gs://qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/...


Run the following cell to grant access to your Cloud Storage resources from Vertex AI Feature store

In [8]:
! gcloud storage buckets update --uniform-bucket-level-access $BUCKET_URI

Enabling Uniform bucket-level access for gs://qwiklabs-gcp-01-17ee7907a406-aip-20220524080343...


Finally, validate access to your Cloud Storage bucket by examining its contents:

In [9]:
! gcloud storage ls --all-versions --long $BUCKET_URI

### Create a Bigquery dataset

You create the BigQuery dataset to store the data along the demo.

In [10]:
BQ_DATASET = "Mobile_Gaming"  # @param {type:"string"}
LOCATION = "US"

!bq mk --location=$LOCATION --dataset $PROJECT_ID:$BQ_DATASET

Dataset 'qwiklabs-gcp-01-17ee7907a406:Mobile_Gaming' successfully created.


### Import libraries

In [11]:
# General
import os
import random
import sys
import time

# Data Science
import pandas as pd
# Vertex AI and its Feature Store
from google.cloud import aiplatform as vertex_ai
from google.cloud import bigquery
from google.cloud.aiplatform import Feature, Featurestore

### Define constants

In [12]:
# Data Engineering and Feature Engineering
TODAY = "2018-10-03"
TOMORROW = "2018-10-04"
LABEL_TABLE = f"label_table_{TODAY}".replace("-", "")
FEATURES_TABLE = "wide_features_table"  # @param {type:"string"}
FEATURES_TABLE_TODAY = f"wide_features_table_{TODAY}".replace("-", "")
FEATURES_TABLE_TOMORROW = f"wide_features_table_{TOMORROW}".replace("-", "")
FEATURESTORE_ID = "mobile_gaming"  # @param {type:"string"}
ENTITY_TYPE_ID = "user"

# Vertex AI Feature store
ONLINE_STORE_NODES_COUNT = 5
ENTITY_ID = "user"
API_ENDPOINT = f"{REGION}-aiplatform.googleapis.com"
FEATURE_TIME = "timestamp"
ENTITY_ID_FIELD = "user_pseudo_id"
BQ_SOURCE_URI = f"bq://{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE}"
GCS_DESTINATION_PATH = f"data/features/train_features_{TODAY}".replace("-", "")
GCS_DESTINATION_OUTPUT_URI = f"{BUCKET_URI}/{GCS_DESTINATION_PATH}"
SERVING_FEATURE_IDS = {"user": ["*"]}
READ_INSTANCES_TABLE = f"ground_truth_{TODAY}".replace("-", "")
READ_INSTANCES_URI = f"bq://{PROJECT_ID}.{BQ_DATASET}.{READ_INSTANCES_TABLE}"

# Vertex AI Training
BASE_CPU_IMAGE = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest"
DATASET_NAME = f"churn_mobile_gaming_{TODAY}".replace("-", "")
TRAIN_JOB_NAME = f"xgb_classifier_training_{TODAY}".replace("-", "")
MODEL_NAME = f"churn_xgb_classifier_{TODAY}".replace("-", "")
MODEL_PACKAGE_PATH = "train_package"
TRAINING_MACHINE_TYPE = "n1-standard-4"
TRAINING_REPLICA_COUNT = 1
DATA_PATH = f"{GCS_DESTINATION_OUTPUT_URI}/000000000000.csv".replace("gs://", "/gcs/")
MODEL_PATH = f"model/{TODAY}".replace("-", "")
MODEL_DIR = f"{BUCKET_URI}/{MODEL_PATH}".replace("gs://", "/gcs/")

# Vertex AI Prediction
DESTINATION_URI = f"{BUCKET_URI}/{MODEL_PATH}"
VERSION = "v1"
SERVING_CONTAINER_IMAGE_URI = (
    "us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-23:latest"
)
ENDPOINT_NAME = "mobile_gaming_churn"
DEPLOYED_MODEL_NAME = f"churn_xgb_classifier_{VERSION}"
MODEL_DEPLOYED_NAME = "churn_xgb_classifier_v1"
SERVING_MACHINE_TYPE = "n1-highcpu-4"
MIN_NODES = 1
MAX_NODES = 1

In [13]:
# Sampling distributions for categorical features implemented in
# https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_monitoring/model_monitoring.ipynb

LANGUAGE = [
    "en-us",
    "en-gb",
    "ja-jp",
    "en-au",
    "en-ca",
    "de-de",
    "en-in",
    "en",
    "fr-fr",
    "pt-br",
    "es-us",
    "zh-tw",
    "zh-hans-cn",
    "es-mx",
    "nl-nl",
    "fr-ca",
    "en-za",
    "vi-vn",
    "en-nz",
    "es-es",
]

OS = ["IOS", "ANDROID", "null"]
COUNTRY = [
    "United States",
    "India",
    "Japan",
    "Canada",
    "Australia",
    "United Kingdom",
    "Germany",
    "Mexico",
    "France",
    "Brazil",
    "Taiwan",
    "China",
    "Saudi Arabia",
    "Pakistan",
    "Egypt",
    "Netherlands",
    "Vietnam",
    "Philippines",
    "South Africa",
]

USER_IDS = [
    "C8685B0DFA2C4B4E6E6EA72894C30F6F",
    "A976A39B8E08829A5BC5CD3827C942A2",
    "DD2269BCB7F8532CD51CB6854667AF51",
    "A8F327F313C9448DFD5DE108DAE66100",
    "8BE7BF90C971453A34C1FF6FF2A0ACAE",
    "8375B114AFAD8A31DE54283525108F75",
    "4AD259771898207D5869B39490B9DD8C",
    "51E859FD9D682533C094B37DC85EAF87",
    "8C33815E0A269B776AAB4B60A4F7BC63",
    "D7EA8E3645EFFBD6443946179ED704A6",
    "58F3D672BBC613680624015D5BC3ADDB",
    "FF955E4CA27C75CE0BEE9FC89AD275A3",
    "22DC6A6AE86C0AA33EBB8C3164A26925",
    "BC10D76D02351BD4C6F6F5437EE5D274",
    "19DEEA6B15B314DB0ED2A4936959D8F9",
    "C2D17D9066EE1EB9FAE1C8A521BFD4E5",
    "EFBDEC168A2BF8C727B060B2E231724E",
    "E43D3AB2F9B9055C29373523FAF9DB9B",
    "BBDCBE2491658165B7F20540DE652E3A",
    "6895EEFC23B59DB13A9B9A7EED6A766F",
]

### Helpers

In [14]:
def run_bq_query(query: str):
    """
    An helper function to run a BigQuery job
    Args:
        query: a formatted SQL query
    Returns:
        None
    """
    try:
        job = bq_client.query(query)
        _ = job.result()
    except RuntimeError as error:
        print(error)


def upload_model(
    display_name: str,
    serving_container_image_uri: str,
    artifact_uri: str,
    sync: bool = True,
) -> vertex_ai.Model:
    """

    Args:
        display_name: The name of Vertex AI Model artefact
        serving_container_image_uri: The uri of the serving image
        artifact_uri: The uri of artefact to import
        sync:

    Returns: Vertex AI Model

    """
    model = vertex_ai.Model.upload(
        display_name=display_name,
        artifact_uri=artifact_uri,
        serving_container_image_uri=serving_container_image_uri,
        sync=sync,
    )
    model.wait()
    print(model.display_name)
    print(model.resource_name)
    return model


def create_endpoint(display_name: str) -> vertex_ai.Endpoint:
    """
    An utility to create a Vertex AI Endpoint
    Args:
        display_name: The name of Endpoint

    Returns: Vertex AI Endpoint

    """
    endpoint = vertex_ai.Endpoint.create(display_name=display_name)

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint


def deploy_model(
    model: vertex_ai.Model,
    machine_type: str,
    endpoint: vertex_ai.Endpoint = None,
    deployed_model_display_name: str = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    sync: bool = True,
) -> vertex_ai.Model:
    """
    An helper function to deploy a Vertex AI Endpoint
    Args:
        model: A Vertex AI Model
        machine_type: The type of machine to serve the model
        endpoint: An Vertex AI Endpoint
        deployed_model_display_name: The name of the model
        min_replica_count: Minimum number of serving replicas
        max_replica_count: Max number of serving replicas
        sync: Whether to execute method synchronously

    Returns: vertex_ai.Model

    """
    model_deployed = model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        sync=sync,
    )

    model_deployed.wait()

    print(model_deployed.display_name)
    print(model_deployed.resource_name)
    return model_deployed


def endpoint_predict_sample(
    instances: list, endpoint: vertex_ai.Endpoint
) -> vertex_ai.models.Prediction:
    """
    An helper function to get prediction from Vertex AI Endpoint
    Args:
        instances: The list of instances to score
        endpoint: An Vertex AI Endpoint

    Returns:
        vertex_ai.models.Prediction

    """
    prediction = endpoint.predict(instances=instances)
    print(prediction)
    return prediction


def generate_online_sample() -> dict:
    """
    An helper function to generate a sample of online features
    Returns:
        online_sample: dict of online features
    """
    online_sample = {}
    online_sample["entity_id"] = random.choices(USER_IDS)
    online_sample["country"] = random.choices(COUNTRY)
    online_sample["operating_system"] = random.choices(OS)
    online_sample["language"] = random.choices(LANGUAGE)
    return online_sample


def simulate_prediction(endpoint: vertex_ai.Endpoint, n_requests: int, latency: int):
    """
    An helper function to simulate online prediction with customer entity type
        - format entities for prediction
        - retrieve static features with a singleton lookup operations from Vertex AI Feature store
        - run the prediction request and get back the result
    Args:
        endpoint: Vertex AI Endpoint object
        n_requests: number of requests to run
        latency: latency in seconds
    Returns:
        vertex_ai.models.Prediction
    """
    for i in range(n_requests):
        online_sample = generate_online_sample()
        online_features = pd.DataFrame.from_dict(online_sample)
        entity_ids = online_features["entity_id"].tolist()

        customer_aggregated_features = user_entity_type.read(
            entity_ids=entity_ids,
            feature_ids=[
                "cnt_user_engagement",
                "cnt_level_start_quickplay",
                "cnt_level_end_quickplay",
                "cnt_level_complete_quickplay",
                "cnt_level_reset_quickplay",
                "cnt_post_score",
                "cnt_spend_virtual_currency",
                "cnt_ad_reward",
                "cnt_challenge_a_friend",
                "cnt_completed_5_levels",
                "cnt_use_extra_steps",
            ],
        )

        prediction_sample_df = pd.merge(
            customer_aggregated_features.set_index("entity_id"),
            online_features.set_index("entity_id"),
            left_index=True,
            right_index=True,
        ).reset_index(drop=True)

        # prediction_sample = prediction_sample_df.to_dict("records")
        prediction_instance = prediction_sample_df.values.tolist()
        prediction = endpoint.predict(prediction_instance)
        print(
            f"Prediction request: user_id - {entity_ids} - values - {prediction_instance} - prediction - {prediction[0]}"
        )
        time.sleep(latency)

# Setting the realtime scenario

In order to make real-time churn prediction, you need to

1. Collect the historical data about user's events and behaviors
2. Design your data model, build your feature and ingest them into the Feature store to serve both offline for training and online for serving.
3. Define churn and get the data to train a churn model
4. Train the model at scale
5. Deploy the model to an endpoint and generate return the prediction score in real-time

You will cover those steps in details below.

## Initiate clients

In [15]:
# Initiate the clients
bq_client = # TODO 1: Your code goes here(project=PROJECT_ID, location=LOCATION)
vertex_ai.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

## Identify users and build your features

This section we will static features we want to fetch from Vertex AI Feature Store. In particular, we will cover the following steps:

1. Identify users, process demographic features and process behavioral features within the last 24 hours using **BigQuery**

2. Set up the feature store

3. Register features using **Vertex AI Feature Store** and the SDK.

Below you have a picture that shows the process. 

<img src="./assets/feature_store_ingestion_2.png">



The original dataset contains raw event data we cannot ingest in the feature store as they are. We need to pre-process the raw data in order to get user features. 

**Notice we simulate those transformations in different point of time (today and tomorrow).**


### Label, Demographic and Behavioral Transformations

This section is based on the [Churn prediction for game developers using Google Analytics 4 (GA4) and BigQuery ML](https://cloud.google.com/blog/topics/developers-practitioners/churn-prediction-game-developers-using-google-analytics-4-ga4-and-bigquery-ml?utm_source=linkedin&utm_medium=unpaidsoc&utm_campaign=FY21-Q2-Google-Cloud-Tech-Blog&utm_content=google-analytics-4&utm_term=-) blog article by Minhaz Kazi and Polong Lin. 

You will adapt it in order to turn a batch churn prediction (using features within the first 24h user of first engagment) in a real-time churn prediction (using features within the first 24h user of last engagment).

In [16]:
features_sql_query = f"""
CREATE OR REPLACE TABLE
  `{PROJECT_ID}.{BQ_DATASET}.{FEATURES_TABLE}` AS
WITH

  # query to extract demographic data for each user ---------------------------------------------------------
  get_demographic_data AS (
  SELECT * EXCEPT (row_num)
  FROM (
    SELECT
      user_pseudo_id,
      geo.country as country,
      device.operating_system as operating_system,
      device.language as language,
      ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp DESC) AS row_num
    FROM `firebase-public-project.analytics_153293282.events_*`)
  WHERE row_num = 1),

  # query to extract behavioral data for each user ----------------------------------------------------------
  get_behavioral_data AS (
  SELECT
    event_timestamp,
    user_pseudo_id,
    SUM(IF(event_name = 'user_engagement', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_user_engagement,
    SUM(IF(event_name = 'level_start_quickplay', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_level_start_quickplay,
    SUM(IF(event_name = 'level_end_quickplay', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_level_end_quickplay,
    SUM(IF(event_name = 'level_complete_quickplay', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_level_complete_quickplay,
    SUM(IF(event_name = 'level_reset_quickplay', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_level_reset_quickplay,
    SUM(IF(event_name = 'post_score', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_post_score,
    SUM(IF(event_name = 'spend_virtual_currency', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_spend_virtual_currency,
    SUM(IF(event_name = 'ad_reward', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_ad_reward,
    SUM(IF(event_name = 'challenge_a_friend', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_challenge_a_friend,
    SUM(IF(event_name = 'completed_5_levels', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_completed_5_levels,
    SUM(IF(event_name = 'use_extra_steps', 1, 0)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC RANGE BETWEEN 86400000000 PRECEDING
      AND CURRENT ROW ) AS cnt_use_extra_steps,
  FROM (
    SELECT
      e.*
    FROM
      `firebase-public-project.analytics_153293282.events_*` AS e
    )
)

SELECT
    -- PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', CONCAT('{TODAY}', ' ', STRING(TIME_TRUNC(CURRENT_TIME(), SECOND))), 'UTC') as timestamp,
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(beh.event_timestamp))) AS timestamp,
    dem.*,
    CAST(IFNULL(beh.cnt_user_engagement, 0) AS FLOAT64)  AS cnt_user_engagement,
    CAST(IFNULL(beh.cnt_level_start_quickplay, 0) AS FLOAT64) AS cnt_level_start_quickplay,
    CAST(IFNULL(beh.cnt_level_end_quickplay, 0) AS FLOAT64) AS cnt_level_end_quickplay,
    CAST(IFNULL(beh.cnt_level_complete_quickplay, 0) AS FLOAT64) AS cnt_level_complete_quickplay,
    CAST(IFNULL(beh.cnt_level_reset_quickplay, 0) AS FLOAT64) AS cnt_level_reset_quickplay,
    CAST(IFNULL(beh.cnt_post_score, 0) AS FLOAT64) AS cnt_post_score,
    CAST(IFNULL(beh.cnt_spend_virtual_currency, 0) AS FLOAT64) AS cnt_spend_virtual_currency,
    CAST(IFNULL(beh.cnt_ad_reward, 0) AS FLOAT64) AS cnt_ad_reward,
    CAST(IFNULL(beh.cnt_challenge_a_friend, 0) AS FLOAT64) AS cnt_challenge_a_friend,
    CAST(IFNULL(beh.cnt_completed_5_levels, 0) AS FLOAT64) AS cnt_completed_5_levels,
    CAST(IFNULL(beh.cnt_use_extra_steps, 0) AS FLOAT64) AS cnt_use_extra_steps,
FROM
  get_demographic_data dem
LEFT OUTER JOIN 
  get_behavioral_data beh
ON
  dem.user_pseudo_id = beh.user_pseudo_id
"""

In [17]:
run_bq_query(features_sql_query)

## Create a Vertex AI Feature store and ingest your features

Now you have the wide table of features. It is time to ingest them into the feature store. 

Before to moving on, you may have a question: **Why do I need a feature store**
in this scenario at that point?

One of the reason would be to make those features accessable across team by calculating once and reuse them many times. And in order to make it possible you need also be able to monitor those features over time to guarantee freshness and in case have a new feature engineerign run to refresh them. 

If it is not your case, I will give even more reasons about why you should consider feature store in the following sections. Just keep following me for now.

One of the most important thing is related to its data model. As you can see in the picture below, Vertex AI Feature Store organizes resources hierarchically in the following order: `Featurestore -> EntityType -> Feature`. You must create these resources before you can ingest data into Vertex AI Feature Store.

<img src="./assets/feature_store_data_model_3.png">

In our case we are going to create **mobile_gaming** featurestore resource containing **user** entity type and all its associated **features** such as country or the number of times a user challenged a friend (cnt_challenge_a_friend).

### Create featurestore, ```mobile_gaming```

You need to create a `featurestore` resource to contain entity types, features, and feature values. In your case, you would call it `mobile_gaming`.

In [18]:
try:
    mobile_gaming_feature_store = Featurestore.create(
        featurestore_id=FEATURESTORE_ID,
        online_store_fixed_node_count=ONLINE_STORE_NODES_COUNT,
        labels={"team": "dataoffice", "app": "mobile_gaming"},
        sync=True,
    )
except RuntimeError as error:
    print(error)
else:
    FEATURESTORE_RESOURCE_NAME = mobile_gaming_feature_store.resource_name
    print(f"Feature store created: {FEATURESTORE_RESOURCE_NAME}")

Creating Featurestore
Create Featurestore backing LRO: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/operations/9138517845456453632
Featurestore created. Resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming
To use this Featurestore in another session:
featurestore = aiplatform.Featurestore('projects/292484118381/locations/us-central1/featurestores/mobile_gaming')
Feature store created: projects/292484118381/locations/us-central1/featurestores/mobile_gaming


### Create the ```User``` entity type and its features

You define your own entity types which represents one or more level you decide to refer your features. In your case, it would have a `user` entity. 

In [19]:
try:
    user_entity_type = mobile_gaming_feature_store.create_entity_type(
        entity_type_id=ENTITY_ID, description="User Entity", sync=True
    )
except RuntimeError as error:
    print(error)
else:
    USER_ENTITY_RESOURCE_NAME = user_entity_type.resource_name
    print("Entity type name is", USER_ENTITY_RESOURCE_NAME)

Creating EntityType
Create EntityType backing LRO: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/operations/7156934009413435392
EntityType created. Resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
To use this EntityType in another session:
entity_type = aiplatform.EntityType('projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user')
Entity type name is projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user


### Set Feature Monitoring

Notice that Vertex AI Feature store has [feature monitoring capability](https://cloud.google.com/vertex-ai/docs/featurestore/monitoring). It is in preview, so you need to use v1beta1 Python which is a lower-level API than the one we've used so far in this notebook. 

The easiest way to set this for now is using [console UI](https://console.cloud.google.com/vertex-ai/features). For completeness, below is example to do this using v1beta1 SDK.

In [20]:
# Import required libraries
from google.cloud.aiplatform_v1beta1 import \
    FeaturestoreServiceClient as v1beta1_FeaturestoreServiceClient
from google.cloud.aiplatform_v1beta1.types import \
    entity_type as v1beta1_entity_type_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_monitoring as v1beta1_featurestore_monitoring_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_service as v1beta1_featurestore_service_pb2
from google.protobuf.duration_pb2 import Duration

v1beta1_admin_client = v1beta1_FeaturestoreServiceClient(
    client_options={"api_endpoint": API_ENDPOINT}
)

In [21]:
v1beta1_admin_client.update_entity_type(
    v1beta1_featurestore_service_pb2.UpdateEntityTypeRequest(
        entity_type=v1beta1_entity_type_pb2.EntityType(
            name=v1beta1_admin_client.entity_type_path(
                PROJECT_ID, REGION, FEATURESTORE_ID, ENTITY_ID
            ),
            monitoring_config=v1beta1_featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                snapshot_analysis=v1beta1_featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                    monitoring_interval=Duration(seconds=86400),  # 1 day
                ),
            ),
        ),
    )
)

name: "projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user"
description: "User Entity"
create_time {
  seconds: 1653379832
  nanos: 358513000
}
update_time {
  seconds: 1653379925
  nanos: 191733000
}
etag: "AMEw9yPAckg9Y6WKA_fLTek2pFw0IzM0X5mrjNNSfe3hJ-2cPMz3T10ngKK-FLW2QF4="
monitoring_config {
  snapshot_analysis {
    monitoring_interval {
      seconds: 86400
    }
    monitoring_interval_days: 1
    staleness_days: 21
  }
  numerical_threshold_config {
    value: 0.3
  }
  categorical_threshold_config {
    value: 0.3
  }
}

### Create features

In order to ingest features, you need to provide feature configuration and create them as featurestore resources.


#### Create Feature configuration

For simplicity, I created the configuration in a declarative way. Of course, we can create an helper function to built it from Bigquery schema.
Also notice that we want to pass some feature on-fly. In this case, it country, operating system and language looks perfect for that.

In [22]:
feature_configs = {
    "country": {
        "value_type": "STRING",
        "description": "The country of customer",
        "labels": {"status": "passed"},
    },
    "operating_system": {
        "value_type": "STRING",
        "description": "The operating system of device",
        "labels": {"status": "passed"},
    },
    "language": {
        "value_type": "STRING",
        "description": "The language of device",
        "labels": {"status": "passed"},
    },
    "cnt_user_engagement": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement level",
        "labels": {"status": "passed"},
    },
    "cnt_level_start_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with start level",
        "labels": {"status": "passed"},
    },
    "cnt_level_end_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with end level",
        "labels": {"status": "passed"},
    },
    "cnt_level_complete_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with complete status",
        "labels": {"status": "passed"},
    },
    "cnt_level_reset_quickplay": {
        "value_type": "DOUBLE",
        "description": "A variable of user engagement with reset status",
        "labels": {"status": "passed"},
    },
    "cnt_post_score": {
        "value_type": "DOUBLE",
        "description": "A variable of user score",
        "labels": {"status": "passed"},
    },
    "cnt_spend_virtual_currency": {
        "value_type": "DOUBLE",
        "description": "A variable of user virtual amount",
        "labels": {"status": "passed"},
    },
    "cnt_ad_reward": {
        "value_type": "DOUBLE",
        "description": "A variable of user reward",
        "labels": {"status": "passed"},
    },
    "cnt_challenge_a_friend": {
        "value_type": "DOUBLE",
        "description": "A variable of user challenges with friends",
        "labels": {"status": "passed"},
    },
    "cnt_completed_5_levels": {
        "value_type": "DOUBLE",
        "description": "A variable of user level 5 completed",
        "labels": {"status": "passed"},
    },
    "cnt_use_extra_steps": {
        "value_type": "DOUBLE",
        "description": "A variable of user extra steps",
        "labels": {"status": "passed"},
    },
}

#### Create features using `batch_create_features` method

Once you have the feature configuration, you can create feature resources using `batch_create_features` method.

In [23]:
try:
    user_entity_type.batch_create_features(feature_configs=feature_configs, sync=True)
except RuntimeError as error:
    print(error)
else:
    for feature in user_entity_type.list_features():
        print("")
        print(f"The resource name of {feature.name} feature is", feature.resource_name)

Batch creating features EntityType entityType: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
Batch create Features EntityType entityType backing LRO: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/operations/4981695389393485824
EntityType entityType Batch created features. Resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user

The resource name of cnt_challenge_a_friend feature is projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_challenge_a_friend

The resource name of cnt_user_engagement feature is projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_user_engagement

The resource name of cnt_level_reset_quickplay feature is projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_level_reset_quickplay

The resource name of c

### Search features

Vertex AI Feature store supports serching capabilities. Below you have a simple example that show how to filter a feature based on its name. 

In [24]:
feature_query = "feature_id:cnt_user_engagement"
searched_features = Feature.search(query=feature_query)
searched_features

[<google.cloud.aiplatform.featurestore.feature.Feature object at 0x7ffa46f2ab50> 
 resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/features/cnt_user_engagement]

## Ingest features 

At that point, you create all resources associated to the feature store. You just need to import feature values before you can use them for online/offline serving.

In [25]:
FEATURES_IDS = [feature.name for feature in user_entity_type.list_features()]

In [26]:
try:
    user_entity_type.ingest_from_bq(
        feature_ids=FEATURES_IDS,
        feature_time=FEATURE_TIME,
        bq_source_uri=BQ_SOURCE_URI,
        entity_id_field=ENTITY_ID_FIELD,
        disable_online_serving=False,
        worker_count=10,
        sync=True,
    )
except RuntimeError as error:
    print(error)

Importing EntityType feature values: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user
Import EntityType feature values backing LRO: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user/operations/2896528761920946176
EntityType feature values imported. Resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/entityTypes/user


# Train and deploy a real-time churn ML model using Vertex AI Training and Endpoints

Now that you have your features and you are almost ready to train our churn model.

Below an high level picture

<img src="./assets/train_model_4.png">

Let's dive into each step of this process.


## Fetch training data with point-in-time query using BigQuery and Vertex AI Feature store  

As we mentioned above, in real time churn prediction, it is so important defining the label you want to predict with your model. 

Let's assume that you decide to predict the churn probability over the last 24 hr. So now you have your label. Next step is to define your training sample. But let's think about that for a second. 

In that churn real time system, you have a high volume of transactions you could use to calculate those features which keep floating and are collected constantly over time. It implies that you always get fresh data to reconstruct features. And depending on when you decide to calculate one feature or another you can end up with a set of features that are not aligned in time. 

When you have labels available, it would be incredibly difficult to say which set of features contains the most up to date historical information associated with the label you want to predict. And, when you are not able to guarantee that, the performance of your model would be badly affected because you serve no representative features of the data and the label from the field when it goes live. So you need a way to get the most updated features you calculated over time before the label becomes available in order to avoid this informational skew.

**With the Vertex AI Feature store, you can fetch feature values corresponding to a particular timestamp thanks to point-in-time lookup capability.** In our case, it would be the timestamp associated to the label you want to predict with your model. In this way, you will avoid data leakage and you will get the most updated features to train your model. 

Let's see how to do that. 


### Define query for reading instances at a specific point in time

First thing, you need to define the set of reading instances at a specific point in time you want to consider in order to generate your training sample.

In [27]:
read_instances_query = f"""
CREATE OR REPLACE TABLE
  `{PROJECT_ID}.{BQ_DATASET}.{READ_INSTANCES_TABLE}` AS
WITH

  # get training threshold ----------------------------------------------------------------------------------
  get_training_threshold AS (
  SELECT
    (MAX(event_timestamp) - 86400000000) AS training_thrs
  FROM
    `firebase-public-project.analytics_153293282.events_*`
  WHERE
    event_name="user_engagement"
    AND
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(event_timestamp))) < '{TODAY}'),

  # query to create label -----------------------------------------------------------------------------------
  get_label AS (
  SELECT
    user_pseudo_id,
    user_last_engagement,
    #label = 1 if last_touch within last hour hr else 0
  IF
    (user_last_engagement < (
      SELECT
        training_thrs
      FROM
        get_training_threshold),
      1,
      0 ) AS churned
  FROM (
    SELECT
      user_pseudo_id,
      MAX(event_timestamp) AS user_last_engagement
    FROM
      `firebase-public-project.analytics_153293282.events_*`
    WHERE
      event_name="user_engagement"
    AND
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(event_timestamp))) < '{TODAY}'
    GROUP BY
      user_pseudo_id )
  GROUP BY
    1,
    2),

  # query to create class weights --------------------------------------------------------------------------------
  get_class_weights AS (
  SELECT
    CAST(COUNT(*) / (2*(COUNT(*) - SUM(churned))) AS STRING) AS class_weight_zero,
    CAST(COUNT(*) / (2*SUM(churned)) AS STRING) AS class_weight_one,
  FROM
    get_label )

SELECT
  user_pseudo_id as user,
  PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S', CONCAT('{TODAY}', ' ', STRING(TIME_TRUNC(CURRENT_TIME(), SECOND))), 'UTC') as timestamp,
  churned AS churned,
  CASE
      WHEN churned = 0 THEN ( SELECT class_weight_zero FROM get_class_weights)
      ELSE ( SELECT class_weight_one
       FROM get_class_weights)
    END AS class_weights
FROM
  get_label 
"""

### Create the BigQuery instances tables

You store those instances in a Bigquery table.

In [28]:
run_bq_query(read_instances_query)

### Serve features for batch training

Then you use the `batch_serve_to_gcs` in order to generate your training sample and store it as csv file in a target cloud bucket.


In [29]:
# Serve features for batch training
# TODO 2: Your code goes here(
    gcs_destination_output_uri_prefix=GCS_DESTINATION_OUTPUT_URI,
    gcs_destination_type="csv",
    serving_feature_ids=SERVING_FEATURE_IDS,
    read_instances_uri=READ_INSTANCES_URI,
    pass_through_fields=["churned", "class_weights"],
)

Serving Featurestore feature values: projects/292484118381/locations/us-central1/featurestores/mobile_gaming
Serve Featurestore feature values backing LRO: projects/292484118381/locations/us-central1/featurestores/mobile_gaming/operations/3274831130620067840
Featurestore feature values served. Resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming


<google.cloud.aiplatform.featurestore.featurestore.Featurestore object at 0x7ffa46ecee10> 
resource name: projects/292484118381/locations/us-central1/featurestores/mobile_gaming

## Train a custom model on Vertex AI with Training Pipelines

Now that we produce the training sample, we use the Vertex AI SDK to train an new version of the model using Vertex AI Training.


### Create training package and training sample

In [30]:
!rm -Rf train_package #if train_package already exist

In [32]:
!mkdir -m 777 -p trainer data/ingest data/raw model config
!gcloud storage cp --recursive $GCS_DESTINATION_OUTPUT_URI/*.csv data/ingest
!head -n 1000 data/ingest/*.csv > data/raw/sample.csv

Copying gs://qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/data/features/train_features_20181003/000000000000.csv...
/ [1/1 files][  1.6 MiB/  1.6 MiB] 100% Done                                    
Operation completed over 1 objects/1.6 MiB.                                      


### Create training script

You create the training script to train a XGboost model.

In [33]:
!touch trainer/__init__.py

In [34]:
%%writefile trainer/task.py
import os
from pathlib import Path
import argparse
import yaml

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
import xgboost as xgb
import joblib
import warnings
warnings.filterwarnings("ignore")

def get_args():
    """
    Get arguments from command line.
    Returns:
        args: parsed arguments
    """
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--data_path',
        required=False,
        default=os.getenv('AIP_TRAINING_DATA_URI'),
        type=str,
        help='path to read data')
    parser.add_argument(
        '--learning_rate',
        required=False,
        default=0.01,
        type=int,
        help='number of epochs')
    parser.add_argument(
        '--model_dir',
        required=False,
        default=os.getenv('AIP_MODEL_DIR'),
        type=str,
        help='dir to store saved model')
    parser.add_argument(
        '--config_path',
        required=False,
        default='../config.yaml',
        type=str,
        help='path to read config file')
    args = parser.parse_args()
    return args


def ingest_data(data_path, data_model_params):
    """
    Ingest data
    Args:
        data_path: path to read data
        data_model_params: data model parameters
    Returns:
        df: dataframe
    """
    # read training data
    df = pd.read_csv(data_path, sep=',',
                     dtype={col: 'string' for col in data_model_params['categorical_features']})
    return df


def preprocess_data(df, data_model_params):
    """
    Preprocess data
    Args:
        df: dataframe
        data_model_params: data model parameters
    Returns:
        df: dataframe
    """

    # convert nan values because pd.NA ia not supported by SimpleImputer
    # bug in sklearn 0.23.1 version: https://github.com/scikit-learn/scikit-learn/pull/17526
    # decided to skip NAN values for now
    df.replace({pd.NA: np.nan}, inplace=True)
    df.dropna(inplace=True)

    # get features and labels
    x = df[data_model_params['numerical_features'] + data_model_params['categorical_features'] + [
        data_model_params['weight_feature']]]
    y = df[data_model_params['target']]

    # train-test split
    x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                        test_size=data_model_params['train_test_split']['test_size'],
                                                        random_state=data_model_params['train_test_split'][
                                                            'random_state'])
    return x_train, x_test, y_train, y_test


def build_pipeline(learning_rate, model_params):
    """
    Build pipeline
    Args:
        learning_rate: learning rate
        model_params: model parameters
    Returns:
        pipeline: pipeline
    """
    # build pipeline
    pipeline = Pipeline([
        # ('imputer', SimpleImputer(strategy='most_frequent')),
        ('encoder', OneHotEncoder(handle_unknown='ignore')),
        ('model', xgb.XGBClassifier(learning_rate=learning_rate,
                                    use_label_encoder=False, #deprecated and breaks Vertex AI predictions
                                    **model_params))
    ])
    return pipeline


def main():
    print('Starting training...')
    args = get_args()
    data_path = args.data_path
    learning_rate = args.learning_rate
    model_dir = args.model_dir
    config_path = args.config_path

    # read config file
    with open(config_path, 'r') as f:
        config = yaml.load(f, Loader=yaml.FullLoader)
    f.close()
    data_model_params = config['data_model_params']
    model_params = config['model_params']

    # ingest data
    print('Reading data...')
    data_df = ingest_data(data_path, data_model_params)

    # preprocess data
    print('Preprocessing data...')
    x_train, x_test, y_train, y_test = preprocess_data(data_df, data_model_params)
    sample_weight = x_train.pop(data_model_params['weight_feature'])
    sample_weight_eval_set = x_test.pop(data_model_params['weight_feature'])

    # train lgb model
    print('Training model...')
    xgb_pipeline = build_pipeline(learning_rate, model_params)
    # need to use fit_transform to get the encoded eval data
    x_train_transformed = xgb_pipeline[:-1].fit_transform(x_train)
    x_test_transformed = xgb_pipeline[:-1].transform(x_test)
    xgb_pipeline[-1].fit(x_train_transformed, y_train,
                         sample_weight=sample_weight,
                         eval_set=[(x_test_transformed, y_test)],
                         sample_weight_eval_set=[sample_weight_eval_set],
                         eval_metric='error',
                         early_stopping_rounds=50,
                         verbose=True)
    # save model
    print('Saving model...')
    model_path = Path(model_dir)
    model_path.mkdir(parents=True, exist_ok=True)
    joblib.dump(xgb_pipeline, f'{model_dir}/model.joblib')


if __name__ == "__main__":
    main()

Writing trainer/task.py


### Create requirements.txt

You write the requirement file to build the training container.

In [35]:
%%writefile requirements.txt
pip==22.0.4
PyYAML==5.3.1
joblib==0.15.1
numpy==1.18.5
pandas==1.0.4
scipy==1.4.1
scikit-learn==0.23.1
xgboost==1.1.1

Writing requirements.txt


### Create training configuration

You create a training configuration with data and model params. 

In [36]:
%%writefile config/config.yaml
data_model_params:
  target: churned
  categorical_features:
    - country
    - operating_system
    - language
  numerical_features:
    - cnt_user_engagement
    - cnt_level_start_quickplay
    - cnt_level_end_quickplay
    - cnt_level_complete_quickplay
    - cnt_level_reset_quickplay
    - cnt_post_score
    - cnt_spend_virtual_currency
    - cnt_ad_reward
    - cnt_challenge_a_friend
    - cnt_completed_5_levels
    - cnt_use_extra_steps
  weight_feature: class_weights
  train_test_split:
    test_size: 0.2
    random_state: 8
model_params:
  booster: gbtree
  objective: binary:logistic
  max_depth: 80
  n_estimators: 100
  random_state: 8

Writing config/config.yaml


### Test the model locally with `local-run`

You leverage the Vertex AI SDK `local-run` to test the script locally.

In [37]:
test_job_script = f"""
gcloud ai custom-jobs local-run \
--executor-image-uri={BASE_CPU_IMAGE} \
--python-module=trainer.task \
--extra-dirs=config,data,model \
-- \
--data_path data/raw/sample.csv \
--model_dir model \
--config_path config/config.yaml
"""

with open("local_train_job_run.sh", "w+") as s:
    s.write(test_job_script)
s.close()

In [38]:
# Launch the job locally
!chmod +x ./local_train_job_run.sh && ./local_train_job_run.sh

Package is set to /home/jupyter/vertex-ai-samples/notebooks/community/feature_store/mobile_gaming.
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
  self.stdout = io.open(c2pread, 'rb', bufsize)
Sending build context to Docker daemon  2.747MB
Step 1/13 : FROM us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest
latest: Pulling from vertex-ai/training/scikit-learn-cpu.0-23
2c11b7cecaa5: Pulling fs layer
04637fa56252: Pulling fs layer
d6e6af23a0f3: Pulling fs layer
b4a424de92ad: Pulling fs layer
59fdb7d45b6c: Pulling fs layer
7759500808dd: Pulling fs layer
89fa8d1cd3c8: Pulling fs layer
41c3544d11de: Pulling fs layer
32634d002d01: Pulling fs layer
3d8caa4d24d5: Pulling fs layer
0b38311ee46d: Pulling fs layer
36e73f53893e: Pulling fs layer
d27528dc2d9f: Pulling fs layer
2ca38bff7b4b: Pulling fs layer
29731ba486cc: Pulling fs layer
266a99828791: Pulling fs layer
ab02fbc06098: Pulling fs layer
090a178c469c: Pulling fs layer
34c760d06da9: Pulling fs layer
07e648356d7b: Pulling f

### Create and Launch the Custom training pipeline to train the model with `autopackaging`.

You use `autopackaging` from Vertex AI SDK in order to 

1. Build a custom Docker training image.
2. Push the image to Container Registry.
3. Start a Vertex AI CustomJob.


In [39]:
!mkdir -m 777 -p {MODEL_PACKAGE_PATH} && mv -t {MODEL_PACKAGE_PATH} trainer requirements.txt config

In [40]:
train_job_script = f"""
gcloud ai custom-jobs create \
--region={REGION} \
--display-name={TRAIN_JOB_NAME} \
--worker-pool-spec=machine-type={TRAINING_MACHINE_TYPE},replica-count={TRAINING_REPLICA_COUNT},executor-image-uri={BASE_CPU_IMAGE},local-package-path={MODEL_PACKAGE_PATH},python-module=trainer.task,extra-dirs=config \
--args=--data_path={DATA_PATH},--model_dir={MODEL_DIR},--config_path=config/config.yaml \
--verbosity='info'
"""

with open("train_job_run.sh", "w+") as s:
    s.write(train_job_script)
s.close()

In [41]:
# Launch the Custom training Job using chmod command
# TODO 3: Your code goes here

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
INFO: Running command: docker build --no-cache -t gcr.io/qwiklabs-gcp-01-17ee7907a406/cloudai-autogenerated/xgb_classifier_training_20181003:20220524.08.53.59.472712 --rm -f- train_package
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
  self.stdout = io.open(c2pread, 'rb', bufsize)
Sending build context to Docker daemon  12.34kB
Step 1/11 : FROM us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest
 ---> 6da3b9be283f
Step 2/11 : RUN mkdir -m 777 -p /usr/app /home
 ---> Running in f6a57abe6cf6
Removing intermediate container f6a57abe6cf6
 ---> abdada5981b8
Step 3/11 : WORKDIR /usr/app
 ---> Running in fa0b28452deb
Removing intermediate container fa0b28452deb
 ---> fa0d996e5602
Step 4/11 : ENV HOME=/home
 ---> Running in e4eb649c6687
Removing intermediate container e4eb649c6687
 ---> a0639c9e6da3
Step 5/11 : ENV PYTHONDONTWRITEBYTECODE=1
 ---> Running in 67039216c453
Removing intermediate container 67039216c453

### Check the status of training job and the result. 

You can use the following commands to monitor the status of your job and check for the artefact in the bucket once the training successfully run. 

In [44]:
TRAIN_JOB_RESOURCE_NAME = "projects/292484118381/locations/us-central1/customJobs/7374149059830874112"  # Replace this with your job path

In [47]:
# Check the status of training job
!gcloud ai custom-jobs describe $TRAIN_JOB_RESOURCE_NAME

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
createTime: '2022-05-24T08:56:01.872482Z'
displayName: xgb_classifier_training_20181003
endTime: '2022-05-24T08:59:16Z'
jobSpec:
  workerPoolSpecs:
  - containerSpec:
      args:
      - --data_path=/gcs/qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/data/features/train_features_20181003/000000000000.csv
      - --model_dir=/gcs/qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/model/20181003
      - --config_path=config/config.yaml
      imageUri: gcr.io/qwiklabs-gcp-01-17ee7907a406/cloudai-autogenerated/xgb_classifier_training_20181003:20220524.08.53.59.472712
    diskSpec:
      bootDiskSizeGb: 100
      bootDiskType: pd-ssd
    machineSpec:
      machineType: n1-standard-4
    replicaCount: '1'
name: projects/292484118381/locations/us-central1/customJobs/7374149059830874112
startTime: '2022-05-24T08:59:15Z'
state: JOB_STATE_SUCCEEDED
updateTime: '2022-05-24T08:59:21.153159Z'


In [48]:
!gcloud storage ls $DESTINATION_URI

gs://qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/model/20181003/
gs://qwiklabs-gcp-01-17ee7907a406-aip-20220524080343/model/20181003/model.joblib


### Upload and Deploy Model on Vertex AI Endpoint

You use a custom function to upload your model to a Vertex AI Model Registry.

In [49]:
# Upload the model
xgb_model = upload_model(
    display_name=MODEL_NAME,
    serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    artifact_uri=DESTINATION_URI,
)

Creating Model
Create Model backing LRO: projects/292484118381/locations/us-central1/models/5521048105295806464/operations/8075668333397016576
Model created. Resource name: projects/292484118381/locations/us-central1/models/5521048105295806464
To use this Model in another session:
model = aiplatform.Model('projects/292484118381/locations/us-central1/models/5521048105295806464')
churn_xgb_classifier_20181003
projects/292484118381/locations/us-central1/models/5521048105295806464


### Deploy Model to the same Endpoint with Traffic Splitting

Now that you have registered in the model registry, you can deploy it in an endpoint. So you firstly create the endpoint and then you deploy your model.

In [50]:
# Create endpoint
endpoint = create_endpoint(display_name=ENDPOINT_NAME)

Creating Endpoint
Create Endpoint backing LRO: projects/292484118381/locations/us-central1/endpoints/241250443320098816/operations/2351593207009116160
Endpoint created. Resource name: projects/292484118381/locations/us-central1/endpoints/241250443320098816
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/292484118381/locations/us-central1/endpoints/241250443320098816')
mobile_gaming_churn
projects/292484118381/locations/us-central1/endpoints/241250443320098816


In [51]:
# Deploy the model
deployed_model = # TODO 4: Your code goes here(
    model=xgb_model,
    machine_type=SERVING_MACHINE_TYPE,
    endpoint=endpoint,
    deployed_model_display_name=DEPLOYED_MODEL_NAME,
    min_replica_count=1,
    max_replica_count=1,
    sync=False,
)

Deploying model to Endpoint : projects/292484118381/locations/us-central1/endpoints/241250443320098816
Deploy Endpoint model backing LRO: projects/292484118381/locations/us-central1/endpoints/241250443320098816/operations/7884265349233770496
Endpoint model deployed. Resource name: projects/292484118381/locations/us-central1/endpoints/241250443320098816
mobile_gaming_churn
projects/292484118381/locations/us-central1/endpoints/241250443320098816


# Serve ML features at scale with low latency

At that time, you are ready **to deploy our simple model which would requires fetching preprocessed attributes as input features in real time**. 

Below you can see how it works

<img src="./assets/online_serving_5.png" width="600">

But think about those features for a second. 

Your behavioral features used to trained your model, they cannot be computed when you are going to serve the model online. 

How could you compute the number of time a user challenged a friend withing the last 24 hours on the fly?

You simply can't do that. You need to be computed this feature on the server side and serve it with low latency. And becuase Bigquery is not optimized for those read operations, we need a different service that allows singleton lookup where the result is a single row with many columns.

Also, even if it was not the case, when you deploy a model that requires preprocessing your data, you need to be sure to reproduce the same preprocessing steps you had when you trained it. If you are not able to do that a skew between training and serving data would happen and it will affect badly your model performance (and in the worst scenario break your serving system). 

You need a way to mitigate that in a way you don't need to implement those preprocessing steps online but just serve the same aggregated features you already have for training to generate online prediction. 

These are other valuable reasons to introduce Vertex AI Feature Store. With it, you have a service which helps you to serve feature at scale with low latency as they were available at training time mitigating in that way possible training-serving skew.

Now that you know **why you need a feature store**, let's closing this journey by deploying your model and use feature store to retrieve features online, pass them to endpoint and generate predictions.


## Time to simulate online predictions

Once the model is ready to receive prediction requests, you can use the `simulate_prediction` function to generate them. 

In particular, that function

- format entities for prediction
- retrieve static features with a singleton lookup operations from Vertex AI Feature store
- run the prediction request and get back the result

for a number of requests and some latency you define. **It will nearly take about 17 minutes to run this cell.** 


In [52]:
# Simulate online predictions
# TODO 5: Your code goes here(endpoint=endpoint, n_requests=1000, latency=1)

Prediction request: user_id - ['DD2269BCB7F8532CD51CB6854667AF51'] - values - [[13.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 'Germany', 'ANDROID', 'ja-jp']] - prediction - [1.0]
Prediction request: user_id - ['BBDCBE2491658165B7F20540DE652E3A'] - values - [[6.0, 6.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 'Brazil', 'ANDROID', 'en-us']] - prediction - [1.0]
Prediction request: user_id - ['E43D3AB2F9B9055C29373523FAF9DB9B'] - values - [[145.0, 57.0, 55.0, 52.0, 1.0, 52.0, 0.0, 0.0, 0.0, 0.0, 0.0, 'Taiwan', 'null', 'de-de']] - prediction - [1.0]
Prediction request: user_id - ['8C33815E0A269B776AAB4B60A4F7BC63'] - values - [[71.0, 35.0, 34.0, 34.0, 0.0, 34.0, 0.0, 0.0, 0.0, 0.0, 0.0, 'Japan', 'IOS', 'fr-fr']] - prediction - [0.0]
Prediction request: user_id - ['22DC6A6AE86C0AA33EBB8C3164A26925'] - values - [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 'Taiwan', 'IOS', 'es-mx']] - prediction - [1.0]
Prediction request: user_id - ['19DEEA6B15B314DB0ED2A4936959D8F9