In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI Feature Store FeatureView Service Agents Tutorial

<table align="left">
<a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/vertex_ai_feature_store_feature_view_service_agents.ipynb\"><img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt="Colab logo">Run in Colab
<a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/vertex_ai_feature_store_feature_view_service_agents.ipynb\"><img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt="GitHub logo">View on GitHub
<a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/feature_store/vertex_ai_feature_store_feature_view_service_agents.ipynb\"><img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">Open in Vertex AI Workbench"

## Overview
In this tutorial, you learn how to enable FeatureView Service Agents and grant each FeatureView access to the specific source data that is used.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/overview).

## Objective
In this tuturial, you learn how to use FeatureView Service Agents to enable fine-grained data access in an end to end workflow extracting data in `BigQuery` and serving features in `Vertex AI Feature Store`.

This tutorial uses the following Google Cloud ML services and resources:
* `Vertex AI Feature Store`

The steps performed include:
* When creating a FeatureView, pass service_agent_type = `SERVICE_AGENT_TYPE_FEATURE_VIEW`. The default is `SERVICE_AGENT_TYPE_PROJECT`.
* A service account is created for each FeatureView. Such service account is used to sync data from BigQuery.
* Get/List FeatureView API returns the auto-created service account. Users need to manually call `cloud projects add-iam-policy-binding` command to grant `roles/bigquery.dataViewer` to the service account.

## Note
This is a Preview release. By using the feature, you acknowledge that you're aware of the open issues and that this preview is provided “as is” under the pre-GA terms of service.

## Costs
This tutorial uses billable components of Google Cloud:
* `Vertex AI`
* `BigQuery`
* `Cloud Storage`

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and [BigQuery pricing](https://cloud.google.com/bigquery/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Installation

Install the following packages required to execute this notebook.

In [None]:
# Install the packages
! pip3 install --upgrade --quiet google-cloud-aiplatform\
                                 google-cloud-bigquery\
                                 db-dtypes

Install the Python SDK for the Feature Store 2.0 experimental release.

In [None]:
# Dowload and install the private SDK
!pip uninstall google-cloud-aiplatform -y
!gsutil cp gs://caip-featurestore-sdk/20240215/aiplatform-v1beta1-py.tar.gz .
!pip install --user aiplatform-v1beta1-py.tar.gz
!rm aiplatform-v1beta1-py.tar.gz

## Colab only: Uncomment the following cell to restart the kernel.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "ethangordon-fs"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

### Set up env variable


In [None]:
REGION="us-central1" # @param {type: "string"}
VERTEX_AI_SERVICE="aiplatform.googleapis.com"
API_ENDPOINT=f"{REGION}-{VERTEX_AI_SERVICE}"

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Import libraries

In [None]:
from google.cloud.aiplatform_v1beta1 import FeatureOnlineStoreAdminServiceClient
from google.cloud.aiplatform_v1beta1 import FeatureRegistryServiceClient
from google.cloud.aiplatform_v1beta1 import FeatureOnlineStoreServiceClient
from google.cloud.aiplatform_v1beta1.types import feature_online_store_admin_service as feature_online_store_admin_service_pb2
from google.cloud.aiplatform_v1beta1.types import feature_registry_service as feature_registry_service_pb2
from google.cloud.aiplatform_v1beta1.types import featurestore_service as featurestore_service_pb2
from google.cloud.aiplatform_v1beta1.types import feature_online_store_service as feature_online_store_service_pb2
from google.cloud.aiplatform_v1beta1.types import feature_group as feature_group_pb2
from google.cloud.aiplatform_v1beta1.types import feature as feature_pb2
from google.cloud.aiplatform_v1beta1.types import feature_online_store as feature_online_store_pb2
from google.cloud.aiplatform_v1beta1.types import feature_view as feature_view_pb2
from google.cloud.aiplatform_v1beta1.types import io as io_pb2

admin_client = FeatureOnlineStoreAdminServiceClient(client_options={"api_endpoint": API_ENDPOINT})
registry_client = FeatureRegistryServiceClient(client_options={"api_endpoint": API_ENDPOINT})
data_client = FeatureOnlineStoreServiceClient(client_options={"api_endpoint": API_ENDPOINT})

## Create a Feature Group
First, let's create a FeatureGroup

In [None]:
DATASET_ID = "test_data"
TABLE_ID = "tableA"
DATA_SOURCE= f"bq://{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}" # @param {type:"string"}

FEATURE_GROUP_ID="test_fg" # @param {type: "string"}

FEATURE_IDS=["feature1", "feature2"] # @param

In [None]:
!bq mk --dataset_id={DATASET_ID}

In [None]:
!bq query --nouse_legacy_sql \
"CREATE TABLE {DATASET_ID}.{TABLE_ID} AS (" \
"SELECT * FROM UNNEST(ARRAY<STRUCT<entity_id STRING, feature_timestamp TIMESTAMP, feature1 INT64, feature2 INT64>>[" \
"('test', TIMESTAMP('2024-02-26 08:00:00 UTC'), 10, 20)," \
"('test', TIMESTAMP('2024-02-27 08:00:00 UTC'), 30, 40)," \
"('test', TIMESTAMP('2024-02-28 08:00:00 UTC'), 50, 60)]))"

In [None]:
# Create a FeatureGroup
feature_group_config = feature_group_pb2.FeatureGroup(
  big_query=feature_group_pb2.FeatureGroup.BigQuery(
    big_query_source=io_pb2.BigQuerySource(input_uri=DATA_SOURCE),
    entity_id_columns=["entity_id"]),
  description="This is a FeatureGroup for testing")

create_group_lro = registry_client.create_feature_group(
    feature_registry_service_pb2.CreateFeatureGroupRequest(
        parent=f"projects/{PROJECT_ID}/locations/{REGION}",
        feature_group_id=FEATURE_GROUP_ID,
        feature_group = feature_group_config))
print(create_group_lro.result())

# Create features under the FeatureGroup
create_feature_lros = []
for id in FEATURE_IDS:
  create_feature_lros.append(registry_client.create_feature(
      featurestore_service_pb2.CreateFeatureRequest(
          parent=f"projects/{PROJECT_ID}/locations/{REGION}/featureGroups/{FEATURE_GROUP_ID}",
          feature_id=id,
          feature=feature_pb2.Feature())))
for lro in create_feature_lros:
  print(lro.result())

Verify the created FeatureGroup

In [None]:
# Verify FeatureGroup is created.
registry_client.get_feature_group(name=f"projects/{PROJECT_ID}/locations/{REGION}/featureGroups/{FEATURE_GROUP_ID}")

Verify the created Features

In [None]:
# Use list to verify the features are created.
registry_client.list_features(
    parent=f"projects/{PROJECT_ID}/locations/{REGION}/featureGroups/{FEATURE_GROUP_ID}")

### Create Feature Online Store

Next, let's create a standard online store.

In [None]:
FEATURE_ONLINE_STORE_ID = "test_fos" #@param {type:"string"}

In [None]:
online_store_config = feature_online_store_pb2.FeatureOnlineStore(
  bigtable=feature_online_store_pb2.FeatureOnlineStore.Bigtable(
    auto_scaling=feature_online_store_pb2.FeatureOnlineStore.Bigtable.AutoScaling(
      min_node_count=1,
      max_node_count=1,
      cpu_utilization_target=50)))

create_store_lro = admin_client.create_feature_online_store(cs
    feature_online_store_admin_service_pb2.CreateFeatureOnlineStoreRequest(
        parent=f"projects/{PROJECT_ID}/locations/{REGION}",
        feature_online_store_id=FEATURE_ONLINE_STORE_ID,
        feature_online_store = online_store_config))

# Wait for the LRO to finish and get the LRO result.
# This operation might take up to 10 minutes to complete.
print(create_store_lro.result())

Verify the created FeatureOnlineStore

In [None]:
# Use list to verify the store is created.
admin_client.get_feature_online_store(
    name=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}")

### Create FeatureView

In [None]:
import json

FEATURE_VIEW_ID="test_fv" # @param {type: "string"}

# A schedule will be created based on this cron setting.
CRON_SCHEDULE="TZ=America/Los_Angeles 0 12 * * *" # @param {type: "string"}

# Create FeatureView
feature_registry_source = feature_view_pb2.FeatureView.FeatureRegistrySource(
    feature_groups = [
        feature_view_pb2.FeatureView.FeatureRegistrySource.FeatureGroup(
            feature_group_id=FEATURE_GROUP_ID,
            feature_ids=FEATURE_IDS)
        ])

# Set cron schedule.
sync_config = feature_view_pb2.FeatureView.SyncConfig(cron = CRON_SCHEDULE)

create_view_lro = admin_client.create_feature_view(
        parent=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}",
        feature_view_id="test_fv",
        feature_view = feature_view_pb2.FeatureView(
            feature_registry_source = feature_registry_source,
            sync_config = sync_config,
            service_agent_type=feature_view_pb2.FeatureView.ServiceAgentType.SERVICE_AGENT_TYPE_FEATURE_VIEW,
))

In [None]:
print(create_view_lro.result())

Verify the created FeatureView

In [None]:
# Use list to verify the store is created.
admin_client.get_feature_view(
    name=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}")

### Grant BigQuery access to the FeatureView Service Agent

Next, let's grant the BigQuery Data Viewer role to the created FeatureView Service Agent. This takes two steps:
1. Find the FeatureView `service_account_email`.
2. Update the IAM policy on the BigQuery Source.

In [None]:
# Step 1: Find the FeatureView service_account_email.

# call GetFeatureView
feature_view = admin_client.get_feature_view(
    name=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}")
SERVICE_ACCOUNT=feature_view.service_account_email

In [None]:
# Step 2: Update the IAM policy on the BigQuery Source.

!bq add-iam-policy-binding --member=serviceAccount:$SERVICE_ACCOUNT --role=roles/bigquery.dataViewer {DATASET_ID}.{TABLE_ID}

If you skip the above step, sync will fail.

Let's run on-demand batch sync

In [None]:
sync_response=admin_client.sync_feature_view(
    feature_view=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}")

Confirm the status of batch sync.

In [None]:
admin_client.get_feature_view_sync(name = sync_response.feature_view_sync)

### Start online serving

After the data sync is complete, use the `FetchFeatureValues` API to retrieve the data.

In [None]:
data_client = FeatureOnlineStoreServiceClient(
    client_options={"api_endpoint": API_ENDPOINT})

Read the synced data from feature online store.

In [None]:
data_client.fetch_feature_values(
    request=feature_online_store_service_pb2.FetchFeatureValuesRequest(
        feature_view=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}",
        data_key=feature_online_store_service_pb2.FeatureViewDataKey(key="test")
    ))

# Clean up

To clean up all the Google Cloud resources used in this project, delete the individual resources you created in this tutorial.

In [None]:
# Delete FeatureView
admin_client.delete_feature_view(
  name=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}")

# Delete OnlineStore
admin_client.delete_feature_online_store(
  name=f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}")

# Delete Features
for feature_id in FEATURE_IDS:
  registry_client.delete_feature(name=f"projects/{PROJECT_ID}/locations/{REGION}/featureGroups/{FEATURE_GROUP_ID}/features/{feature_id}")

# Delete FeatureGroup
registry_client.delete_feature_group(name=f"projects/{PROJECT_ID}/locations/{REGION}/featureGroups/{FEATURE_GROUP_ID}")

# Delete test data
!bq rm -f {DATASET_ID}.{TABLE_ID}

After deleting resources, please search for the deleted resources in Dataplex search. Confirm the resource is no longer discoverable.