In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Feature Store: Streaming ingestion SDK

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/feature_store_streaming_ingestion_sdk.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/feature_store_streaming_ingestion_sdk.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/feature_store/feature_store_streaming_ingestion_sdk.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.9

## Overview

This notebook demonstrates how to use Vertex AI Feature Store's streaming ingestion at the SDK layer.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore).

### Objective

In this tutorial, you learn how to ingest features from a `Pandas DataFrame` into your Vertex AI Feature Store using `write_feature_values` method from the Vertex AI SDK.

This tutorial uses the following Google Cloud ML services and resources:

- Vertex AI Feature Store


The steps performed include:

- Create `Feature Store`
- Create new `Entity Type` for your `Feature Store`
- Ingest feature values from `Pandas DataFrame` into `Feature Store`'s `Entity Types`.

### Dataset

The dataset used for this notebook is the penguins dataset from [BigQuery public datasets](https://cloud.google.com/bigquery/public-data). This dataset has the following features: `culmen_length_mm`, `culmen_depth_mm`, `flipper_length_mm`, `body_mass_g`, `species`, and `sex`.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.


## Installation

Install the following packages required to execute this notebook.

In [None]:
# Install the packages
! pip3 install --upgrade google-cloud-aiplatform\
                         google-cloud-bigquery\
                         numpy\
                         pandas\
                         db-dtypes\
                         pyarrow -q

### Colab only: Uncomment the following cell to restart the kernel.

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [2]:
PROJECT_ID = "ds-training-380514"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

Updated property [core/project].


#### Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [3]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### UUID

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.

In [4]:
import random
import string


# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

### Import libraries

In [5]:
import numpy as np
import pandas as pd
from google.cloud import aiplatform, bigquery

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project.

In [6]:
aiplatform.init(project=PROJECT_ID, location=REGION)

## Download and prepare the data

In [7]:
def download_bq_table(bq_table_uri: str) -> pd.DataFrame:
    # Remove bq:// prefix if present
    prefix = "bq://"
    if bq_table_uri.startswith(prefix):
        bq_table_uri = bq_table_uri[len(prefix) :]

    table = bigquery.TableReference.from_string(bq_table_uri)

    # Create a BigQuery client
    bqclient = bigquery.Client(project=PROJECT_ID)

    # Download the table rows
    rows = bqclient.list_rows(
        table,
    )
    return rows.to_dataframe()

In [8]:
BQ_SOURCE = "bq://bigquery-public-data.ml_datasets.penguins"

# Download penguins BigQuery table
penguins_df = download_bq_table(BQ_SOURCE)

### Prepare the data

Feature values to be written to the Feature Store can take the form of a list of `WriteFeatureValuesPayload` objects, a Python `dict` of the form

`{entity_id : {feature_id : feature_value}, ...},`

or a pandas `Dataframe`, where the `index` column holds the unique entity ID strings and each remaining column represents a feature.  In this notebook, since you use a pandas `DataFrame` for ingesting features we convert the index column data type to `string` to be used as `Entity ID`.

In [9]:
# Prepare the data
penguins_df.index = penguins_df.index.map(str)

In [10]:
# Remove null values
NA_VALUES = ["NA", "."]
penguins_df = penguins_df.replace(to_replace=NA_VALUES, value=np.NaN).dropna()

## Create Feature Store and define schemas

Vertex AI Feature Store organizes resources hierarchically in the following order:

`Featurestore -> EntityType -> Feature`

You must create these resources before you can ingest data into Vertex AI Feature Store.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore)

### Create a Feature Store

You create a Feature Store using `aiplatform.Featurestore.create` with the following parameters:

* `featurestore_id (str)`: The ID to use for this Featurestore, which will become the final component of the Featurestore's resource name. The value must be unique within the project and location.
* `online_store_fixed_node_count`: Configuration for online serving resources.
* `project`: Project to create EntityType in. If not set, project set in `aiplatform.init` is used.
* `location`: Location to create EntityType in. If not set, location set in `aiplatform.init` is used.
* `sync`:  Whether to execute this creation synchronously.

In [11]:
FEATURESTORE_ID = f"penguins_{UUID}"

penguins_feature_store = aiplatform.Featurestore.create(
    featurestore_id=FEATURESTORE_ID,
    online_store_fixed_node_count=1,
    project=PROJECT_ID,
    location=REGION,
    sync=True,
)

Creating Featurestore
Create Featurestore backing LRO: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/operations/482463297086423040
Featurestore created. Resource name: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5
To use this Featurestore in another session:
featurestore = aiplatform.Featurestore('projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5')


##### Verify that the Feature Store is created
Check if the Feature Store was successfully created by running the following code block.

In [12]:
fs = aiplatform.Featurestore(
    featurestore_name=FEATURESTORE_ID,
    project=PROJECT_ID,
    location=REGION,
)
print(fs.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5"
create_time {
  seconds: 1680094696
  nanos: 605578000
}
update_time {
  seconds: 1680094697
  nanos: 166034000
}
etag: "AMEw9yNkjcd49a5-0ufkRQ-rl2Ld6cYuYJpjwoA_65_9UVT0DeD_J5pxEOAAvNkJnfE="
online_serving_config {
  fixed_node_count: 1
}
state: STABLE



### Create an EntityType

An entity type is a collection of semantically related features. You define your own entity types, based on the concepts that are relevant to your use case. For example, a movie service might have the entity types `movie` and `user`, which group related features that correspond to movies or users.

Here, you create an entity type entity type named `penguin_entity_type` using `create_entity_type` with the following parameters:
* `entity_type_id (str)`: The ID to use for the EntityType, which will become the final component of the EntityType's resource name. The value must be unique within a Feature Store.
* `description`: Description of the EntityType.

In [13]:
ENTITY_TYPE_ID = f"penguin_entity_type_{UUID}"

# Create penguin entity type
penguins_entity_type = penguins_feature_store.create_entity_type(
    entity_type_id=ENTITY_TYPE_ID,
    description="Penguins entity type",
)

Creating EntityType
Create EntityType backing LRO: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5/operations/4517688563210387456
EntityType created. Resource name: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5
To use this EntityType in another session:
entity_type = aiplatform.EntityType('projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5')


##### Verify that the EntityType is created
Check if the Entity Type was successfully created by running the following code block.

In [14]:
entity_type = penguins_feature_store.get_entity_type(entity_type_id=ENTITY_TYPE_ID)

print(entity_type.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5"
description: "Penguins entity type"
create_time {
  seconds: 1680095563
  nanos: 521661000
}
update_time {
  seconds: 1680095563
  nanos: 521661000
}
etag: "AMEw9yO9WTwDvWDvmVMgdEXojk8Oh7LpDWDD6ae-gPwB3QA6VpkLLv_vqv-H_5COZkw5"
monitoring_config {
}



### Create Features
A feature is a measurable property or attribute of an entity type. For example, `penguin` entity type has features such as `flipper_length_mm`, and `body_mass_g`. Features can be created within each entity type.

When you create a feature, you specify its value type such as `DOUBLE`, and `STRING`. This value determines what value types you can ingest for a particular feature.

Learn more about [Feature Value Types](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.featurestores.entityTypes.features)

In [15]:
penguins_feature_configs = {
    "species": {
        "value_type": "STRING",
    },
    "island": {
        "value_type": "STRING",
    },
    "culmen_length_mm": {
        "value_type": "DOUBLE",
    },
    "culmen_depth_mm": {
        "value_type": "DOUBLE",
    },
    "flipper_length_mm": {
        "value_type": "DOUBLE",
    },
    "body_mass_g": {"value_type": "DOUBLE"},
    "sex": {"value_type": "STRING"},
}

You can create features either using `create_feature` or `batch_create_features`. Here, for convinience, you have added all feature configs in one variabel, so we use `batch_create_features`.

In [16]:
penguin_features = penguins_entity_type.batch_create_features(
    feature_configs=penguins_feature_configs,
)

Batch creating features EntityType entityType: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5
Batch create Features EntityType entityType backing LRO: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5/operations/5947581444900519936
EntityType entityType Batch created features. Resource name: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5


### Write features to the Feature Store
Use the `write_feature_values` API to write a feature to the Feature Store with the following parameter:

* `instances`: Feature values to be written to the Feature Store that can take the form of a list of WriteFeatureValuesPayload objects, a Python dict, or a pandas Dataframe.

This streaming ingestion feature has been introduced to the Vertex AI SDK under the **preview** namespace. Here, you pass the pandas `Dataframe` you created from penguins dataset as `instances` parameter.

Learn more about [Streaming ingestion API](https://github.com/googleapis/python-aiplatform/blob/e6933503d2d3a0f8a8f7ef8c178ed50a69ac2268/google/cloud/aiplatform/preview/featurestore/entity_type.py#L36)

In [17]:
penguins_entity_type.preview.write_feature_values(instances=penguins_df)

Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5
EntityType feature values written. Resource name: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5


<google.cloud.aiplatform.preview.featurestore.entity_type.EntityType object at 0x7f92f1df6f90> 
resource name: projects/354621994428/locations/us-central1/featurestores/penguins_aeo1w1v5/entityTypes/penguin_entity_type_aeo1w1v5

## Read back written features

Wait a few seconds for the write to propagate, then do an online read to confirm the write was successful.

In [18]:
ENTITY_IDS = [str(x) for x in range(100)]
penguins_entity_type.read(entity_ids=ENTITY_IDS)

Unnamed: 0,entity_id,culmen_depth_mm,culmen_length_mm,sex,body_mass_g,species,flipper_length_mm,island
0,15,16.5,37.0,FEMALE,3400.0,Adelie Penguin (Pygoscelis adeliae),185.0,Dream
1,22,17.1,40.2,FEMALE,3400.0,Adelie Penguin (Pygoscelis adeliae),193.0,Dream
2,13,17.3,47.0,FEMALE,3700.0,Chinstrap penguin (Pygoscelis antarctica),185.0,Dream
3,14,17.1,34.0,FEMALE,3400.0,Adelie Penguin (Pygoscelis adeliae),185.0,Dream
4,2,18.9,40.9,MALE,3900.0,Adelie Penguin (Pygoscelis adeliae),184.0,Dream
...,...,...,...,...,...,...,...,...
95,99,19.5,36.3,MALE,3800.0,Adelie Penguin (Pygoscelis adeliae),190.0,Dream
96,80,19.6,49.0,MALE,4300.0,Chinstrap penguin (Pygoscelis antarctica),212.0,Dream
97,61,18.9,46.0,FEMALE,4150.0,Chinstrap penguin (Pygoscelis antarctica),195.0,Dream
98,11,18.7,39.0,MALE,3650.0,Adelie Penguin (Pygoscelis adeliae),185.0,Dream


## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
penguins_feature_store.delete(force=True)