In [1]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/feature_store/featurestore-tutorials-demo.ipynb"">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/feature_store/featurestore-tutorials-demo.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>

# Featurestore Tutorials Demonstarting ImportFeatureValues, ExportFeatureValues APIs and batch_serve_to_df method.

## Overview

This Colab introduces Vertex AI Feature Store, a managed cloud service for machine learning engineers and data scientists to store, serve, manage and share machine learning features at a large scale.

This Colab assumes that you understand basic Google Cloud concepts such as [Project](https://cloud.google.com/storage/docs/projects), [Storage](https://cloud.google.com/storage), and [Vertex AI](https://cloud.google.com/vertex-ai/docs). Some machine learning knowledge is also helpful but not required.


### Dataset

This Colab uses an ecommerce dataset as an example throughout all the sessions. The task is to train a model to recommend products for a user based on the ratings. 

The dataset used in this notebook consists of order items data since 2018 for an online ecommerce store. This dataset is publicly available at bigquery-public-data.thelook_ecommerce.order_items Big-Query table which can be accessed by pinning the bigquery-public-data project in BigQuery. The table consists of various fields related to each of the order items like the order_id, product_id, user_id, status, and price when it is created when it has been shipped, etc. Among these fields, the current
notebook makes use of the following fields assuming their purpose is as described below :

    * user_id: The Id of the user. 
    * product_id: The Id of the product.
    * created_at: When the user has placed the order.
    * status: The status of the order (Shipped, Processing, Cancelled, Returned, and Completed).


### Objective

The objectives of this notebook include: 

* Load the dataset from BigQuery.
* Preprocess the features in the dataset(Feature Engineering).
* Import your features into Vertex AI Feature Store using ImportFeatureValues API.
* Read feature values into dataframe using batch_serve_to_df.
* Use custom trained Keras model to build a recommender system that recommends products for a user.
* Export Feature Values into Big Query Table using ExportFeatureValues API.
* Use Exported data, to recommend products to users.

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage
* Cloud BigQuery

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or Google Cloud Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

1. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3. Activate the virtual environment.

1. To install Jupyter, run `pip install jupyter` on the
command-line in a terminal shell.

1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

For this Colab, you need the Vertex SDK for Python.

In [2]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

In [3]:
! pip install {USER_FLAG} --upgrade google-cloud-aiplatform



### Restart the kernel

After you install the SDK, you need to restart the notebook kernel so it can find the packages. You can restart kernel from *Kernel -> Restart Kernel*, or by running the following:

In [4]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Select a GPU runtime

**Make sure you're running this notebook in a GPU runtime if you have the option. In Colab, select "Runtime --> Change runtime type > GPU"**

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API and Compute Engine API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component).

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [1]:
import os

PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Project ID:  christian-cool-project


Otherwise, set your project ID here.

In [2]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = "python-docs-samples-tests"  # @param {type:"string"}

### Authenticate your Google Cloud account

**If you are using Google Cloud Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [3]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Feature Engineering on the BQ Table

In this section, we perform the Feature Engineering on the `bigquery-public-data.thelook_ecommerce.order_items` using Pandas. 
* Load the data from the BigQuery into the Pandas Dataframe. 
* The columns used from above table are mentioned in [Dataset](#Dataset) Section.
* Created `ratings` columns.

In [4]:
import pandas as pd
from datetime import datetime,timedelta

In [5]:
def ecomm_bq_tables():
    from google.cloud import bigquery

    bqclient_master = bigquery.Client()

    # Download query results.
    query_string = """
      SELECT
        CAST(user_id AS STRING) AS user_id,
        product_id,
        created_at,
        status
      FROM
        `bigquery-public-data.thelook_ecommerce.order_items`
    """

    bq_table = (
        bqclient_master.query(query_string)
        .result()
        .to_dataframe(
            create_bqstorage_client=True,
        )
    )
    return bq_table

In [6]:
df_bq = ecomm_bq_tables()

In [7]:
df_bq.shape

(181235, 4)

In [8]:
df_bq.head()

Unnamed: 0,user_id,product_id,created_at,status
0,29089,13606,2021-10-07 13:09:00+00:00,Shipped
1,57515,13606,2022-07-09 11:08:26+00:00,Complete
2,98027,13606,2022-07-03 08:39:17+00:00,Returned
3,21718,13606,2021-11-07 15:21:48+00:00,Cancelled
4,58187,13606,2021-09-24 03:16:25+00:00,Cancelled


In [9]:
# calculate the rating
score_mapping = {
    "Cancelled": 0,
    "Returned": 1,
    "Processing": 2,
    "Shipped": 3,
    "Complete": 4,
}
df_bq["rating"] = df_bq["status"].map(score_mapping)

In [10]:
min_rating = min(df_bq["rating"])
max_rating = max(df_bq["rating"])
df_bq["rating"]=df_bq["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values

In [11]:
df_bq.head()

Unnamed: 0,user_id,product_id,created_at,status,rating
0,29089,13606,2021-10-07 13:09:00+00:00,Shipped,0.75
1,57515,13606,2022-07-09 11:08:26+00:00,Complete,1.0
2,98027,13606,2022-07-03 08:39:17+00:00,Returned,0.25
3,21718,13606,2021-11-07 15:21:48+00:00,Cancelled,0.0
4,58187,13606,2021-09-24 03:16:25+00:00,Cancelled,0.0


##### Prepare a list of users who bought products until last week. The dataset has `product_id` and `list of users`.

In [12]:
past_week_date = datetime.now()-pd.to_timedelta("7day")

df_filtered = df_bq[(df_bq['created_at'] < past_week_date.isoformat() + "Z")].reset_index()

result=df_filtered.groupby(['product_id'])['user_id'].apply(list).to_dict()

df_prod_user_list = pd.DataFrame(result.items(),columns=["product_id","user_id"])

df_prod_user_list['product_id'] = df_prod_user_list['product_id'].astype('string')

In [13]:
df_prod_user_list.head()

Unnamed: 0,product_id,user_id
0,1,"[10044, 17057, 62183, 62770, 48508, 63949, 713..."
1,2,"[4066, 16761, 34709]"
2,3,"[19563, 67295, 76971, 11057, 53165]"
3,4,"[44504, 19432, 6299, 86842, 50585, 75086]"
4,5,"[22681, 38697]"


In [14]:
df_bq.drop("status",axis=1, inplace=True)

### Loading the Feature Engineered data into the BQ

In this section, we will load the pandas feature engineered data from the previous section into a BigQuery Table which will be used as a source table for the `ImportFeatureValues` data ingestion for `users` and `products` Entity Type.

#### Create a dataset for output

You need a BigQuery dataset to host the output data in `us-central1`. Input the name of the dataset you want to created and specify the name of the table you want to store the output later. These will be used later in the notebook.

**Make sure that the table name does NOT already exist**.

In [15]:
from google.cloud import bigquery

In [16]:
# DESTINATION dataset
DESTINATION_DATA_SET = "product_recommendation"  # @param {type:"string"}
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DESTINATION_DATA_SET = "{prefix}_{timestamp}".format(
    prefix=DESTINATION_DATA_SET, timestamp=TIMESTAMP
)

DESTINATION_PATTERN = "bq://{project}.{dataset}.{table}"

In [17]:
USERS_SOURCE_TABLE_NAME = "user_prod_rating_data"
USERS_SOURCE_TABLE_URI = DESTINATION_PATTERN.format(
    project=PROJECT_ID, dataset=DESTINATION_DATA_SET, table=USERS_SOURCE_TABLE_NAME
)

PRODUCTS_SOURCE_TABLE_NAME = "prod_users_list_data"
PRODUCTS_SOURCE_TABLE_URI = DESTINATION_PATTERN.format(
    project=PROJECT_ID, dataset=DESTINATION_DATA_SET, table=PRODUCTS_SOURCE_TABLE_NAME
)

In [18]:
# Create dataset
REGION = "us-central1"  # @param {type:"string"}
client = bigquery.Client(project=PROJECT_ID)
dataset_id = "{}.{}".format(client.project, DESTINATION_DATA_SET)
dataset = bigquery.Dataset(dataset_id)
dataset.location = REGION
dataset = client.create_dataset(dataset)
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))

Created dataset christian-cool-project.product_recommendation_20220812230922


Loading `df_bq` data into BQ.

In [19]:
# Load data to BQ
job = client.load_table_from_dataframe(df_bq, f"{DESTINATION_DATA_SET}.{USERS_SOURCE_TABLE_NAME}")
print(job.errors,job.state)

while job.running():
    from time import sleep

    sleep(30)
    print("Running ...")
    
print(job.errors, job.state)

None RUNNING
Running ...
None DONE


Loading `df_prod_user_list` data into BQ.

In [20]:
# Create a table
schema = [
    bigquery.SchemaField("product_id", "STRING"),
    bigquery.SchemaField("user_id", "STRING", "REPEATED"),
]
table_id = f"{PROJECT_ID}.{DESTINATION_DATA_SET}.{PRODUCTS_SOURCE_TABLE_NAME}"
table = bigquery.Table(table_id, schema=schema)
client.create_table(table, exists_ok=True)
print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

Created table christian-cool-project.product_recommendation_20220812230922.prod_users_list_data


In [21]:
# Load data to BQ
job = client.load_table_from_dataframe(df_prod_user_list, table_id)
print(job.errors,job.state)

while job.running():
    from time import sleep

    sleep(30)
    print("Running ...")
    
print(job.errors, job.state)

None RUNNING
Running ...
None DONE


## Import libraries and define constants

In [22]:
# Other than project ID, featurestore ID and endpoints needs to be set
API_ENDPOINT = "us-central1-aiplatform.googleapis.com"  # @param {type:"string"}
FEATURESTORE_ID = "ecomm_recommendation"

In [23]:
import google.cloud.aiplatform as aiplatform
from google.cloud.aiplatform_v1beta1 import (FeaturestoreOnlineServingServiceClient,
                                        FeaturestoreServiceClient)
from google.cloud.aiplatform_v1beta1.types import FeatureSelector, IdMatcher, DestinationFeatureSetting
from google.cloud.aiplatform_v1beta1.types import entity_type as entity_type_pb2
from google.cloud.aiplatform_v1beta1.types import feature as feature_pb2
from google.cloud.aiplatform_v1beta1.types import featurestore as featurestore_pb2
from google.cloud.aiplatform_v1beta1.types import \
    FeaturestoreMonitoringConfig as featurestore_monitoring_config_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_online_service as featurestore_online_service_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_service as featurestore_service_pb2
from google.cloud.aiplatform_v1beta1.types import io as io_pb2
from google.protobuf.duration_pb2 import Duration
from google.protobuf.timestamp_pb2 import Timestamp

# Create admin_client for CRUD and data_client for reading feature values.
admin_client = FeaturestoreServiceClient(client_options={"api_endpoint": API_ENDPOINT})
data_client = FeaturestoreOnlineServingServiceClient(
    client_options={"api_endpoint": API_ENDPOINT}
)

# Represents featurestore resource path.
BASE_RESOURCE_PATH = admin_client.common_location_path(PROJECT_ID, REGION)

## Terminology and Concept

### Featurestore Data model

Vertex AI Feature Store organizes data with the following 3 important hierarchical concepts:
```
Featurestore -> EntityType -> Feature
```
* **Featurestore**: the place to store your features
* **EntityType**: under a Featurestore, an *EntityType* describes an object to be modeled, a real one, or a virtual one.
* **Feature**: under an EntityType, a *feature* describes an attribute of the EntityType

In this ecommerce example, you will create a feature store called *ecomm_recommendation*. This store has 2 entity types: *users* and *products*. The user's entity type has the product_id and rating features. The product entity type has user_list and product_name features.

## Create Featurestore and Define Schemas

### Create Featurestore

The method to create a featurestore returns a
[long-running operation](https://google.aip.dev/151) (LRO). An LRO starts an asynchronous job. LROs are returned by other API
methods too. Calling
`create_fs_lro.result()` waits for the LRO to complete.

In [24]:
try:
    create_fs_lro = admin_client.create_featurestore(
        featurestore_service_pb2.CreateFeaturestoreRequest(
            parent=BASE_RESOURCE_PATH,
            featurestore_id=FEATURESTORE_ID,
            featurestore=featurestore_pb2.Featurestore(
                online_serving_config=featurestore_pb2.Featurestore.OnlineServingConfig(
                    scaling=featurestore_pb2.Featurestore.OnlineServingConfig.Scaling(
                        min_node_count=1,
                        max_node_count=10
                    ),
                ),
            ),
        )
    )
    # Wait for LRO to finish and get the LRO result.
    print(create_fs_lro.result())
except Exception as e:
    print(e)

name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation"



### Create Users Entity Type

The method to create a EntityType returns a
[long-running operation](https://google.aip.dev/151) (LRO). Calling
`users_entity_type_lro.result()` waits for the LRO to complete.

In [25]:
try:
    users_entity_type_lro = admin_client.create_entity_type(
        featurestore_service_pb2.CreateEntityTypeRequest(
            parent=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
            entity_type_id="users",
            entity_type=entity_type_pb2.EntityType(
                description="Details of the users in ecommerce website",
                monitoring_config=featurestore_monitoring_config_pb2(
                    import_features_analysis=featurestore_monitoring_config_pb2.ImportFeaturesAnalysis(
                        state=featurestore_monitoring_config_pb2.ImportFeaturesAnalysis.State.ENABLED
                    ),
                    snapshot_analysis=featurestore_monitoring_config_pb2.SnapshotAnalysis(
                        monitoring_interval=Duration(seconds=86400), #1 day
                    )
                )
            ),
        )
    )
    # Similarly, wait for EntityType creation operation.
    print(users_entity_type_lro.result())
except Exception as e:
    print(e)

name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/entityTypes/users"



### Create Products Entity Type

 Calling
`products_entity_type_lro.result()` waits for the LRO to complete.

In [26]:
try:
    products_entity_type_lro = admin_client.create_entity_type(
        featurestore_service_pb2.CreateEntityTypeRequest(
            parent=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
            entity_type_id="products",
            entity_type=entity_type_pb2.EntityType(
                description="Details of the products in ecommerce website",
                monitoring_config=featurestore_monitoring_config_pb2(
                    import_features_analysis=featurestore_monitoring_config_pb2.ImportFeaturesAnalysis(
                        state=featurestore_monitoring_config_pb2.ImportFeaturesAnalysis.State.ENABLED
                    ),
                    snapshot_analysis=featurestore_monitoring_config_pb2.SnapshotAnalysis(
                        monitoring_interval=Duration(seconds=86400), #1 day
                    )
                )
            ),
        )
    )
    # Similarly, wait for EntityType creation operation.
    print(products_entity_type_lro.result())
except Exception as e:
    print(e)

name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/entityTypes/products"



### Create Batch Features

The method to create batch features returns a
[long-running operation](https://google.aip.dev/151) (LRO). Calling
`users_features_lro.result()` waits for the LRO to complete.

In [27]:
# Features for users Entity Type
users_features_info = {
    "product_id": ["INT64", "Id of the product"],
    "rating": ["DOUBLE", "Rating of the product"]
}

In [28]:
# Create features for the 'users' entity.
try:
    users_features_lro = admin_client.batch_create_features(
        parent=admin_client.entity_type_path(
            PROJECT_ID, REGION, FEATURESTORE_ID, "users"
        ),
        requests=[
            featurestore_service_pb2.CreateFeatureRequest(
                feature=feature_pb2.Feature(
                    value_type=feature_pb2.Feature.ValueType[info[0]],
                    description=info[1]
                ),
                feature_id=column_name,
            ) for column_name, info in users_features_info.items()  
        ],
    )
    print(users_features_lro.result())
except Exception as e:
    print(e)
    

features {
  name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/entityTypes/users/features/product_id"
}
features {
  name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/entityTypes/users/features/rating"
}



Calling `products_features_lro.result()` waits for the LRO to complete.

In [29]:
# Features for products Entity Type
products_features_info = {
    "users_list": ["STRING_ARRAY", "List of user ids who bought product"]
}

In [30]:
# Create features for the 'products' entity.
try:
    products_features_lro = admin_client.batch_create_features(
        parent=admin_client.entity_type_path(
            PROJECT_ID, REGION, FEATURESTORE_ID, "products"
        ),
        requests=[
            featurestore_service_pb2.CreateFeatureRequest(
                feature=feature_pb2.Feature(
                    value_type=feature_pb2.Feature.ValueType[info[0]],
                    description=info[1]
                ),
                feature_id=column_name,
            ) for column_name, info in products_features_info.items()  
        ],
    )
    print(products_features_lro.result())
except Exception as e:
    print(e)
    

features {
  name: "projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/entityTypes/products/features/users_list"
}



## Import Feature Values

You need to import feature values before you can use them for online/offline serving. In this step, you will learn how to import feature values by calling the ImportFeatureValues API.

### Source Data Format and Layout

As mentioned above, BigQuery table/Avro/CSV are supported. No matter what format you are using, each imported entity *must* have an ID; also, each entity can *optionally* have a timestamp, specifying when the feature values are generated. This Colab uses Feature Engineered Big Query table as an input. The table schema for two entity types is as follows:

**For the Users entity**:
```
schema = {
  "name": "users",
  "fields": [
      {
       "name":"product_id",
       "type":["null","integer"]
      },
      {
       "name":"rating",
       "type":["null","double"]
      },
  ]
 }
```

**For the Products entity**:
```
schema = {
  "name": "products",
  "fields": [
      {
       "name":"users_list",
       "type":["null","string_array"]
      }
  ]
 }
```

### Import Feature Values for `users` Entity Type

In [31]:
import_users_request = featurestore_service_pb2.ImportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(
        PROJECT_ID, REGION, FEATURESTORE_ID, "users"
    ),
    bigquery_source=io_pb2.BigQuerySource(
        input_uri = USERS_SOURCE_TABLE_URI           
    ),
    entity_id_field="user_id",
    feature_specs=[
        # Features  
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="product_id"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="rating")
    ],
    feature_time_field="created_at",
    worker_count=10,
)

In [32]:
# Start to import, will take a couple of minutes
ingestion_lro = admin_client.import_feature_values(import_users_request)

In [33]:
# Polls for the LRO status and prints when the LRO has completed
ingestion_lro.result()

imported_entity_count: 181235
imported_feature_value_count: 362470

### Import Feature Values for `products` Entity Type

In [34]:
def current_time():
    return int(datetime.now().timestamp())

def past_6days():
    return int((datetime.now()-timedelta(days=6)).timestamp())

In [35]:
import_products_request = featurestore_service_pb2.ImportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(
        PROJECT_ID, REGION, FEATURESTORE_ID, "products"
    ),
    bigquery_source=io_pb2.BigQuerySource(
        input_uri = PRODUCTS_SOURCE_TABLE_URI
    ),
    entity_id_field="product_id",
    feature_specs=[
        # Features  
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="users_list", source_field="user_id")
    ],
    feature_time=Timestamp(seconds=past_6days()),
    worker_count=10,
)

In [36]:
# Start to import, will take a couple of minutes
ingestion_lro = admin_client.import_feature_values(import_products_request)

In [37]:
# Polls for the LRO status and prints when the LRO has completed
ingestion_lro.result()

imported_entity_count: 29042
imported_feature_value_count: 29042

## Batch Serving

Batch Serving is used to fetch a large batch of feature values for high throughput, typically for training a model or batch prediction. In this section, you will learn how to train a model example by calling the `batch_serve_to_df` method.

### Use case

**The task** is to prepare a dataset to train a model, which recommends products for a given user. To achieve this, you need 2 sets of input:

*   Features: you already imported into the feature store.
*   Labels: the ground-truth data recorded that is rating.

To be more specific, the ground-truth observation is described in Table 1 and the desired dataset is described in Table 2. Each row in Table 2 is a result of joining the imported feature values from Vertex AI Feature Store according to the entity IDs and timestamps in Table 1. In this example,  the `product_id` and `rating` features from `users` are chosen to batch train. 

batch_serve_to_df method takes Table 1 as
input for read_instances_df argument joins all required feature values from the feature store, and returns Table 2 for training.

<h4 align="center">Table 1. Ground-truth Data</h4>

users | timestamp            
----- | -------------------- 
87228 | 2022-07-01T00:00:00Z 
16173 | 2022-07-01T18:09:43Z 
...   | ...      | ...     


<h4 align="center">Table 2. Expected Training Data Generated by batch_serve_to_df (Positive Samples)</h4>

feature_timestamp            | entity_type_users | product_id | rating |
-------------------- | ----------------- | --------------- | ---------------- |
2022-07-01T00:00:00Z | 87228 | 4567 | 0.5 |
2022-07-01T00:00:00Z | 16173 | 5490 | 0.75 |
... | ... | ... | ... | ...  

#### Why timestamp?

Note that there is a `timestamp` column in Table 2. This indicates the time when the ground-truth was observed. This is to avoid data inconsistency.

For example, the 1st row of Table 2 indicates that id `87228` brought product on `2022-07-01T00:00:00Z`. The feature store keeps feature values for all timestamps but fetches feature values *only* at the given timestamp during batch serving.
             
### Batch Serve To DataFrame

Assemble the request which specifies the following info:

*   Where is the label data, i.e., Table 1.
*   Which features are read, i.e., the column names in Table 1.

In this section, we will get the dataframe from the feature store using batch_serve_to_df and store it into a csv file that will be used for training the recommender model in Vertex AI.

* Create the GCS Bucket.
* Enable Uniform Bucket Level Access for the created bucket.
* Export the entityType Id (`users`) and `timestamp` columns as csv into the created GCS bucket.

**Creation of df_batch dataframe which is used as input to batch_serve_to_df**

In [38]:
BUCKET_NAME = "featurestore-bucket"
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET_URI = f"gs://{BUCKET_NAME}-{TIMESTAMP}"

In [39]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
    BUCKET_NAME = PROJECT_ID + "fs-" + TIMESTAMP
    BUCKET_URI = "gs://" + BUCKET_NAME

##### Only if your bucket doesn’t already exist: Run the following cell to create your Cloud Storage bucket.

In [40]:
! gsutil mb -l $REGION $BUCKET_URI

Creating gs://featurestore-bucket-20220812231952/...


Finally, validate access to your Cloud Storage bucket by examining its contents:

In [41]:
! gsutil uniformbucketlevelaccess set on $BUCKET_URI

Enabling Uniform bucket-level access for gs://featurestore-bucket-20220812231952...


In [42]:
! gsutil ls -al $BUCKET_URI

In [43]:
from datetime import timezone
past_week_date = (datetime.now()-pd.to_timedelta("7day")).isoformat()+"Z"
df_sorted = df_bq.sort_values('created_at',ascending = False,ignore_index=True)
df_sorted.rename(columns = {'user_id':'users'}, inplace = True)
df_sorted = df_sorted[df_sorted['created_at']<=past_week_date].reset_index()
df_sorted['created_at'] = df_sorted['created_at'].astype(str)
df_sorted['timestamp'] = df_sorted['created_at'].map(lambda x: datetime.fromisoformat(x).astimezone(timezone.utc))
df_batch = df_sorted[['users','timestamp']]

In [44]:
df_batch.head()

Unnamed: 0,users,timestamp
0,56876,2022-08-05 23:18:31+00:00
1,94801,2022-08-05 23:18:08+00:00
2,11215,2022-08-05 23:14:47+00:00
3,58474,2022-08-05 23:14:22+00:00
4,65028,2022-08-05 23:12:57+00:00


In [45]:
my_featurestore = aiplatform.Featurestore(featurestore_name=FEATURESTORE_ID)
batch_serve = my_featurestore.batch_serve_to_df(
serving_feature_ids= {'users':['product_id','rating']},
read_instances_df = df_batch
)

Serving Featurestore feature values: projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation
Serve Featurestore feature values backing LRO: projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation/operations/7600792285017538560
Featurestore feature values served. Resource name: projects/813969514060/locations/us-central1/featurestores/ecomm_recommendation


In [46]:
batch_serve.head()

Unnamed: 0,timestamp,entity_type_users,product_id,rating
0,2019-11-25 01:19:43+00:00,12994,768,0.0
1,2022-03-20 04:06:31+00:00,70428,768,0.0
2,2020-06-01 06:34:33+00:00,40034,1024,0.0
3,2021-08-06 15:43:55+00:00,60029,1280,0.0
4,2021-03-04 02:51:01+00:00,46254,1536,0.0


Input csv file for training the model. This contains data till past week.

In [47]:
INPUT_FILE_NAME = "training_data"
INPUT_CSV_URI = f"{BUCKET_URI}/{INPUT_FILE_NAME}_{TIMESTAMP}.csv"

In [48]:
batch_serve.to_csv(INPUT_CSV_URI, index = False)

### Training Keras Model in Vertex AI

In this section, we train the Custom Keras Model for recommending products for a given user with data from the `batch_serve_to_df`method shown in the previous section.

You create a custom-trained model using Keras from a Python script in a Docker container using the Vertex SDK for Python, and then get a prediction from the deployed model by sending data.

The steps performed include:

- Create a Vertex AI custom `TrainingPipeline` for training a model.
- Train a TensorFlow model.
- Deploy the `Model` resource to a serving `Endpoint` resource.
- Make a prediction.
- Undeploy the `Model` resource.


#### Import Vertex AI SDK for Python

Import the Vertex AI SDK for Python into your Python environment and initialize it.

In [49]:
from google.cloud.aiplatform import gapic as aip

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

#### Set hardware accelerators

You can set hardware accelerators for both training and prediction.

Set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Tesla K80 GPUs allocated to each VM, you would specify:

    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)

See the [locations where accelerators are available](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators).

Otherwise specify `(None, None)` to use a container image to run on a CPU.

Learn [which accelerators are available in your region.](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators)

In [50]:
TRAIN_GPU, TRAIN_NGPU = (aip.AcceleratorType.NVIDIA_TESLA_K80, 1)

DEPLOY_GPU, DEPLOY_NGPU = (aip.AcceleratorType.NVIDIA_TESLA_K80, 1)

In [51]:
TRAIN_VERSION = "tf-gpu.2-8"
DEPLOY_VERSION = "tf2-gpu.2-8"

TRAIN_IMAGE = "us-docker.pkg.dev/vertex-ai/training/{}:latest".format(TRAIN_VERSION)
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(DEPLOY_VERSION)

print("Training:", TRAIN_IMAGE, TRAIN_GPU, TRAIN_NGPU)
print("Deployment:", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)

Training: us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-8:latest AcceleratorType.NVIDIA_TESLA_K80 1
Deployment: us-docker.pkg.dev/vertex-ai/prediction/tf2-gpu.2-8:latest AcceleratorType.NVIDIA_TESLA_K80 1


In [52]:
MACHINE_TYPE = "n1-standard"

VCPU = "4"
TRAIN_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", TRAIN_COMPUTE)

MACHINE_TYPE = "n1-standard"

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Deploy machine type", DEPLOY_COMPUTE)

Train machine type n1-standard-4
Deploy machine type n1-standard-4


#### Create a managed tabular dataset from CSV file.

Your first step in training a model is to create a managed dataset instance.

In [53]:
dataset = aiplatform.TabularDataset.create(
    display_name="sample-product-recommend",gcs_source=[INPUT_CSV_URI]
)

Creating TabularDataset
Create TabularDataset backing LRO: projects/813969514060/locations/us-central1/datasets/9136152177469816832/operations/3322372639015567360
TabularDataset created. Resource name: projects/813969514060/locations/us-central1/datasets/9136152177469816832
To use this TabularDataset in another session:
ds = aiplatform.TabularDataset('projects/813969514060/locations/us-central1/datasets/9136152177469816832')


#### Train a model

There are two ways you can train a model using a container image:

- **Use a Vertex AI pre-built container**. If you use a pre-built training container, you must additionally specify a Python package to install into the container image. This Python package contains your training code.

- **Use your own custom container image**. If you use your own container, the container image must contain your training code.

#### Define the command args for the training script

Prepare the command-line arguments to pass to your training script.
- `args`: The command line arguments to pass to the corresponding Python module. In this example, they are:
  - `"--epochs=" + EPOCHS`: The number of epochs for training.
  - `"--batch_size=" + BATCH_SIZE`: The batch size for training.
  - `"--distribute=" + TRAIN_STRATEGY` : The training distribution strategy to use for single or distributed training.
     - `"single"`: single device.
     - `"mirror"`: all GPU devices on a single compute instance.
     - `"multi"`: all GPU devices on all compute instances.
  - `"--training_data=" + GCS_PATH`: The path to the csv w training data from feature store.

In [54]:
JOB_NAME = "custom_job_" + TIMESTAMP

if not TRAIN_NGPU or TRAIN_NGPU < 2:
    TRAIN_STRATEGY = "single"
else:
    TRAIN_STRATEGY = "mirror"

EPOCHS = 20
BATCH_SIZE = 10

CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE),
    "--distribute=" + TRAIN_STRATEGY,
    "--training_data=" + INPUT_CSV_URI
]

#### Training script

In the next cell, write the contents of the training script, `task.py`. In summary, the script does the following:

- Loads the data from the BigQuery table using the BigQuery Python client library.
- Builds a model using TF.Keras model API.
- Compiles the model (`compile()`).
- Sets a training distribution strategy according to the argument `args.distribute`.
- Trains the model (`fit()`) with epochs and batch size according to the arguments `args.epochs` and `args.batch_size`
- Gets the directory where to save the model artifacts from the environment variable `AIP_MODEL_DIR`. This variable is [set by the training service](https://cloud.google.com/vertex-ai/docs/training/code-requirements#environment-variables).
- Saves the trained model to the model directory.

In [55]:
%%writefile task.py

import argparse
import tensorflow as tf
import numpy as np
import os

import pandas as pd
import tensorflow as tf

from google.cloud import bigquery
from google.cloud import storage

# Read args
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', dest='epochs',
                    default=10, type=int,
                    help='Number of epochs.')
parser.add_argument('--batch_size', dest='batch_size',
                    default=10, type=int,
                    help='Batch size.')
parser.add_argument('--distribute', dest='distribute', type=str, default='single',
                    help='Distributed training strategy.')
parser.add_argument('--training_data', dest='training_data', type=str,
                    help="URI of the training data in BQ")

args = parser.parse_args()


# Collect the arguments
training_data_uri = args.training_data

# Single Machine, single compute device
if args.distribute == 'single':
    if tf.test.is_gpu_available():
        strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")
    else:
        strategy = tf.distribute.OneDeviceStrategy(device="/cpu:0")
# Single Machine, multiple compute device
elif args.distribute == 'mirror':
    strategy = tf.distribute.MirroredStrategy()
# Multiple Machine, multiple compute device
elif args.distribute == 'multi':
    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

# Set up training variables
LABEL_COLUMN = "rating"
UNUSED_COLUMNS = ["timestamp"]
NA_VALUES = ["NA", ".", " ", "", "null"]

# # Possible categorical values
RATING = [0,1,2,3,4]

# Set up BigQuery clients
bqclient = bigquery.Client()

df_train = pd.read_csv(training_data_uri)

# Remove NA values
def clean_dataframe(df):
    return df.replace(to_replace=NA_VALUES, value=np.NaN).dropna()

df_train = clean_dataframe(df_train)

# Declaring the constants
NUM_USERS = df_train['entity_type_users'].nunique()
NUM_PRODUCTS = df_train['product_id'].nunique()

def convert_dataframe_to_dataset(
    df_train,
):
    NUMERIC_COLUMNS = ["entity_type_users","product_id","rating"]
    df_train[NUMERIC_COLUMNS] = df_train[NUMERIC_COLUMNS].astype("float32")
    df_train = df_train.drop(columns=UNUSED_COLUMNS)

    df_train_x, df_train_y = df_train, df_train.pop(LABEL_COLUMN)

    y_train = np.asarray(df_train_y).astype("float32")

    # Convert to numpy representation
    x_train = np.asarray(df_train_x)

    dataset_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    return dataset_train

# Create datasets
dataset_train = convert_dataframe_to_dataset(df_train)

# Shuffle train set
dataset_train = dataset_train.shuffle(len(df_train))

EMBEDDING_SIZE = 50
class RecommenderNet(tf.keras.Model):
        def __init__(self, num_users, num_products, embedding_size, **kwargs):
            super(RecommenderNet, self).__init__(**kwargs)
            self.num_users = num_users
            self.num_products = num_products
            self.embedding_size = embedding_size
            self.user_embedding = tf.keras.layers.Embedding(
                num_users,
                embedding_size,
                embeddings_initializer="he_normal",
                embeddings_regularizer=tf.keras.regularizers.l2(1e-6),
            )
            self.user_bias = tf.keras.layers.Embedding(num_users, 1)
            self.product_embedding = tf.keras.layers.Embedding(
                num_products,
                embedding_size,
                embeddings_initializer="he_normal",
                embeddings_regularizer=tf.keras.regularizers.l2(1e-6),
            )
            self.product_bias = tf.keras.layers.Embedding(num_products, 1)

        def call(self, inputs):
            user_vector = self.user_embedding(inputs[:, 0])
            user_bias = self.user_bias(inputs[:, 0])
            product_vector = self.product_embedding(inputs[:, 1])
            product_bias = self.product_bias(inputs[:, 1])
            dot_user_product = tf.tensordot(user_vector, product_vector, 2)
            # Add all the components (including bias)
            x = dot_user_product + user_bias + product_bias
            # The sigmoid activation forces the rating to between 0 and 1
            return tf.nn.sigmoid(x)

def create_model(num_users,num_products):
    # Create model
        model = RecommenderNet(num_users, num_products, EMBEDDING_SIZE)
        model.compile(
            loss=tf.keras.losses.BinaryCrossentropy(),
            optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
        )
        return model

# Create the model
with strategy.scope():
    model = create_model(num_users=NUM_USERS,num_products=NUM_PRODUCTS)

# Set up datasets
NUM_WORKERS = strategy.num_replicas_in_sync
# Here the batch size scales up by number of workers since
# `tf.data.Dataset.batch` expects the global batch size.
GLOBAL_BATCH_SIZE = args.batch_size * NUM_WORKERS
dataset_train = dataset_train.batch(GLOBAL_BATCH_SIZE)

# Train the model
model.fit(dataset_train, epochs=args.epochs)

tf.saved_model.save(model, os.getenv("AIP_MODEL_DIR"))

Writing task.py


#### Train the model

Define your custom `TrainingPipeline` on Vertex AI.

Use the `CustomTrainingJob` class to define the `TrainingPipeline`. The class takes the following parameters:

- `display_name`: The user-defined name of this training pipeline.
- `script_path`: The local path to the training script.
- `container_uri`: The URI of the training container image.
- `requirements`: The list of Python package dependencies of the script.
- `model_serving_container_image_uri`: The URI of a container that can serve predictions for your model — either a pre-built container or a custom container.

Use the `run` function to start training. The function takes the following parameters:

- `args`: The command line arguments to be passed to the Python script.
- `replica_count`: The number of worker replicas.
- `model_display_name`: The display name of the `Model` if the script produces a managed `Model`.
- `machine_type`: The type of machine to use for training.
- `accelerator_type`: The hardware accelerator type.
- `accelerator_count`: The number of accelerators to attach to a worker replica.

The `run` function creates a training pipeline that trains and creates a `Model` object. After the training pipeline completes, the `run` function returns the `Model` object.

In [None]:
job = aiplatform.CustomTrainingJob(
    display_name=JOB_NAME,
    script_path="task.py",
    container_uri=TRAIN_IMAGE,
    requirements=["google-cloud-bigquery>=2.20.0", "db-dtypes"],
    model_serving_container_image_uri=DEPLOY_IMAGE,
)

MODEL_DISPLAY_NAME = "product-recommender-" + TIMESTAMP

# Start the training
if TRAIN_GPU:
    model = job.run(
        dataset=dataset,
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_type=TRAIN_GPU.name,
        accelerator_count=TRAIN_NGPU,
    )
else:
    model = job.run(
        dataset=dataset,
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_count=0,
    )

Training script copied to:
gs://featurestore-bucket-20220812231952/aiplatform-2022-08-12-23:20:47.464-aiplatform_custom_trainer_script-0.1.tar.gz.
Training Output directory:
gs://featurestore-bucket-20220812231952/aiplatform-custom-training-2022-08-12-23:20:47.566 
No dataset split provided. The service will use a default split.
View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/175281120043073536?project=813969514060
CustomTrainingJob projects/813969514060/locations/us-central1/trainingPipelines/175281120043073536 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomTrainingJob projects/813969514060/locations/us-central1/trainingPipelines/175281120043073536 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomTrainingJob projects/813969514060/locations/us-central1/trainingPipelines/175281120043073536 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomTrainingJob projects/813969514060/locations/us-central1/trainingPipelines/17528

#### Deploy the model

Before you use your model to make predictions, you must deploy it to an `Endpoint`. You can do this by calling the `deploy` function on the `Model` resource. This will do two things:

1. Create an `Endpoint` resource for deploying the `Model` resource.
2. Deploy the `Model` resource to the `Endpoint` resource.


The function takes the following parameters:

- `deployed_model_display_name`: A human-readable name for the deployed model.
- `traffic_split`: Percent of traffic at the endpoint that goes to this model, which is specified as a dictionary of one or more key/value pairs.
   - If only one model, then specify `{ "0": 100 }`, where "0" refers to this model being uploaded and 100 means 100% of the traffic.
   - If there are existing models on the endpoint, for which the traffic will be split, then use `model_id` to specify `{ "0": percent, model_id: percent, ... }`, where `model_id` is the ID of an existing `DeployedModel` on the endpoint. The percentages must add up to 100.
- `machine_type`: The type of machine to use for training.
- `accelerator_type`: The hardware accelerator type.
- `accelerator_count`: The number of accelerators to attach to a worker replica.
- `starting_replica_count`: The number of compute instances to initially provision.
- `max_replica_count`: The maximum number of compute instances to scale to. In this tutorial, only one instance is provisioned.

#### Traffic split

The `traffic_split` parameter is specified as a Python dictionary. You can deploy more than one instance of your model to an endpoint, and then set the percentage of traffic that goes to each instance.

You can use a traffic split to introduce a new model gradually into production. For example, if you had one existing model in production with 100% of the traffic, you could deploy a new model to the same endpoint, direct 10% of traffic to it, and reduce the original model's traffic to 90%. This allows you to monitor the new model's performance while minimizing the disruption to the majority of users.

#### Compute instance scaling

You can specify a single instance (or node) to serve your online prediction requests. This tutorial uses a single node, so the variables `MIN_NODES` and `MAX_NODES` are both set to `1`.

If you want to use multiple nodes to serve your online prediction requests, set `MAX_NODES` to the maximum number of nodes you want to use. Vertex AI auto-scales the number of nodes used to serve your predictions, up to the maximum number you set. Refer to the [pricing page](https://cloud.google.com/vertex-ai/pricing#prediction-prices) to understand the costs of autoscaling with multiple nodes.

#### Endpoint

The method will block until the model is deployed and eventually return an `Endpoint` object. If this is the first time a model is deployed to the endpoint, it may take a few additional minutes to complete the provisioning of resources.

In [None]:
DEPLOYED_NAME = "product_recommendation_deployed-" + TIMESTAMP

TRAFFIC_SPLIT = {"0": 100}

MIN_NODES = 1
MAX_NODES = 1

if DEPLOY_GPU:
    endpoint = model.deploy(
        deployed_model_display_name=DEPLOYED_NAME,
        traffic_split=TRAFFIC_SPLIT,
        machine_type=DEPLOY_COMPUTE,
        accelerator_type=DEPLOY_GPU.name,
        accelerator_count=DEPLOY_NGPU,
        min_replica_count=MIN_NODES,
        max_replica_count=MAX_NODES,
    )
else:
    endpoint = model.deploy(
        deployed_model_display_name=DEPLOYED_NAME,
        traffic_split=TRAFFIC_SPLIT,
        machine_type=DEPLOY_COMPUTE,
        accelerator_type=DEPLOY_COMPUTE.name,
        accelerator_count=0,
        min_replica_count=MIN_NODES,
        max_replica_count=MAX_NODES,
    )

### Exporting the users and products Entity Types Features Values

#### Export Feature Values

Export feature values for all entities of a single entity type to a BigQuery table or a Cloud Storage bucket. You can choose to get a snapshot or to fully export feature values. A snapshot returns a single value per feature compared to a full export, which can return multiple values per feature. You cannot select particular entity IDs or include multiple entity types when exporting feature values.

In this use case, we are getting past 6 days' data using Export Feature Values to do online prediction and recommend products for a user.

Both the snapshot and full export options let you query data by specifying a single timestamp (either the start time or end time) or both timestamps. 

For snapshots, Vertex AI Feature Store returns the latest feature value within a given time range. In the output, the associated timestamp with each feature value is the snapshot timestamp (not the feature value timestamp).

For full exports, Vertex AI Feature Store returns all feature values within a given time range. In the output, the associated timestamp with each feature value is the feature timestamp (the specified timestamp when the feature value was ingested).

When you export feature values, you choose which features to query and whether it is a snapshot or a full export. The following sections show an example of full export.

#### Null values
For snapshots, if the latest feature value is null at a given timestamp, Vertex AI Feature Store returns the previous non-null feature value. If there are no previous non-null values, Vertex AI Feature Store returns null.

For full exports, if a feature value is null at a given timestamp, Vertex AI Feature Store returns null for that timestamp.

In [None]:
USERS_EXPORT_DESTINATION_TABLE_NAME = "product_recommendation"  # @param {type:"string"}

USERS_EXPORT_DESTINATION_TABLE_URI = DESTINATION_PATTERN.format(
    project=PROJECT_ID, dataset=DESTINATION_DATA_SET, table=USERS_EXPORT_DESTINATION_TABLE_NAME
)

PRODUCTS_EXPORT_DESTINATION_TABLE_NAME = "prod_users_list_lookup"  # @param {type:"string"}

PRODUCTS_EXPORT_DESTINATION_TABLE_URI = DESTINATION_PATTERN.format(
    project=PROJECT_ID, dataset=DESTINATION_DATA_SET, table=PRODUCTS_EXPORT_DESTINATION_TABLE_NAME
)

Export Feature Values from `users` Entity Type

In [None]:
export_users_request = featurestore_service_pb2.ExportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(
        PROJECT_ID, REGION, FEATURESTORE_ID, "users"
    ),
    destination=featurestore_service_pb2.FeatureValueDestination(
        bigquery_destination=io_pb2.BigQueryDestination(
            # Output to BigQuery table created earlier
            output_uri=USERS_EXPORT_DESTINATION_TABLE_URI
        )
    ),
    feature_selector=FeatureSelector(
                id_matcher=IdMatcher(ids=["*"])
            ),
    full_export=featurestore_service_pb2.ExportFeatureValuesRequest.FullExport(
        start_time=Timestamp(seconds=past_6days()),
        end_time=Timestamp(seconds=current_time())
    ),
)

In [None]:
# Execute the Export Feature Values
export_serving_lro = admin_client.export_feature_values(export_users_request)

In [None]:
# Polls for the LRO status and prints when the LRO has completed
export_serving_lro.result()

Export Feature Values from `products` Entity Type

In [None]:
export_products_request = featurestore_service_pb2.ExportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(
        PROJECT_ID, REGION, FEATURESTORE_ID, "products"
    ),
    destination=featurestore_service_pb2.FeatureValueDestination(
        bigquery_destination=io_pb2.BigQueryDestination(
            # Output to BigQuery table created earlier
            output_uri=PRODUCTS_EXPORT_DESTINATION_TABLE_URI
        )
    ),
    feature_selector=FeatureSelector(
                id_matcher=IdMatcher(ids=["*"])
            ),
    full_export=featurestore_service_pb2.ExportFeatureValuesRequest.FullExport(
        end_time=Timestamp(seconds=past_6days())
    ),
)

In [None]:
# Execute the Export Feature Values
export_serving_lro = admin_client.export_feature_values(export_products_request)

In [None]:
# Polls for the LRO status and prints when the LRO has completed
export_serving_lro.result()

After the LRO finishes, you should be able to see the result from the [BigQuery console](https://console.cloud.google.com/bigquery) in the dataset created earlier

### Product Recommendation for a user

In this section, we perform the online recommendation using the data from the `ExportFeatureValues` API.

* Load the `ExportFeatureValues` data into dataframe.
* Get Recommendation for a user.

In [None]:
def export_table(table_name):
    from google.cloud import bigquery

    bqclient_master = bigquery.Client()

    # Download query results.
    query_string = f"""
    SELECT
         *
    FROM
      `{PROJECT_ID}`.{DESTINATION_DATA_SET}.{table_name}
    """

    bq_export = (
        bqclient_master.query(query_string)
        .result()
        .to_dataframe(
            create_bqstorage_client=True,
        )
    )
    return bq_export

In [None]:
products_export = export_table(PRODUCTS_EXPORT_DESTINATION_TABLE_NAME)
users_export = export_table(USERS_EXPORT_DESTINATION_TABLE_NAME)

Selecting one user_id to get recommendations

In [None]:
user_id = users_export.entity_type_users.sample(1).iloc[0]

In [None]:
products_export.shape

#### Make an online prediction request

Send an online prediction request to your deployed model.

#### Send the prediction request

Now that you have test data, you can use it to send a prediction request. Use the `Endpoint` object's `predict` function, which takes the following parameters:

- `instances`: A list of penguin measurement instances. According to your custom model, each instance should be an array of numbers. You prepared this list in the previous step.

The `predict` function returns a list, where each element in the list corresponds to the instance in the request. In the output for each prediction, you will see the following:

- Confidence level for the prediction (`predictions`), between 0 and 1.

##### Get list of products bought and not bought by the user from `products` entity type till past week

In [None]:
filtered = products_export.apply(lambda products_export: user_id in products_export['users_list'], axis=1)
user_not_bought = products_export[~filtered]['entity_type_products'].astype("float32").to_list()
user_bought = products_export[filtered]['entity_type_products'].astype("float32").to_list()

In [None]:
Instance_input = [[float(user_id), k] for k in user_not_bought]

##### Online prediction using keras custom model for a user

In [None]:
predictions = endpoint.predict(instances=Instance_input)

#### Getting Top 10 products recommendation

Based upon the ratings predicted by recommendation model, We selected top 10 products for the selected `user_id`.

In [None]:
import numpy as np

predictions_array = np.array([predictions.predictions[k][0] for k in range(len(predictions.predictions))])
top_rating_indices = predictions_array.argsort()[-10:][::-1]
top_predictions = predictions_array[top_rating_indices]
top_10_products = [int(Instance_input[k][1]) for k in top_rating_indices]

In [None]:
top_10_products

Here, we can see Top 10 list of products recommended for a user.

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Training Job
- Model
- Endpoint
- Cloud Storage Bucket
- Featurestore
- BQ Dataset

#### Undeploy the model

To undeploy your `Model` resource from the serving `Endpoint` resource, use the endpoint's `undeploy` method with the following parameter:

- `deployed_model_id`: The model deployment identifier returned by the endpoint service when the `Model` resource was deployed. You can retrieve the deployed models using the endpoint's `deployed_models` property.

Since this is the only deployed model on the `Endpoint` resource, you can omit `traffic_split`.

In [None]:
deployed_model_id = endpoint.list_models()[0].id
endpoint.undeploy(deployed_model_id=deployed_model_id)

In [None]:
# Delete the training job
job.delete()

# Delete the model
model.delete()

# Delete the endpoint
endpoint.delete()

# Delete Bucket
! gsutil -m rm -r $BUCKET_URI

##### Deleting Featurestore and BigQuery dataset

In [None]:
delete_fs_lro = admin_client.delete_featurestore(
    request=featurestore_service_pb2.DeleteFeaturestoreRequest(
        name=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
        force=True,
    )
)

print("Deleted Featurestore", delete_fs_lro.result())

client.delete_dataset(
    DESTINATION_DATA_SET, delete_contents=True, not_found_ok=True
)  # Make an API request.

print("Deleted dataset '{}'.".format(DESTINATION_DATA_SET))