## Overview

This notebook demonstrates how to use Vertex AI Feature Store's streaming ingestion at the SDK layer.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore).

### Objective

In this session, you learn how to ingest features from a `Pandas DataFrame` into your Vertex AI Feature Store using `write_feature_values` method from the Vertex AI SDK.

This tutorial uses the following Google Cloud ML services and resources:

- Vertex AI Feature Store


The steps performed include:

- Create `Feature Store`
- Create new `Entity Type` for your `Feature Store`
- Ingest feature values from `Pandas DataFrame` into `Feature Store`'s `Entity Types`.

In [2]:
PROJECT_ID = "ds-training-380514"
REGION = "us-central1"

### Import libraries

In [3]:
import numpy as np
import pandas as pd
from google.cloud import aiplatform, bigquery

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project.

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)

In [60]:
census_df = pd.read_csv("gs://aaa-aca-ml-workshop/census-income/adult.test")

In [61]:
census_df

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,|1x3 Cross validator,,,,,,,,,,,,,,
1,25,Private,226802.0,11th,7.0,Never-married,Machine-op-inspct,Own-child,Black,Male,0.0,0.0,40.0,United-States,<=50K.
2,38,Private,89814.0,HS-grad,9.0,Married-civ-spouse,Farming-fishing,Husband,White,Male,0.0,0.0,50.0,United-States,<=50K.
3,28,Local-gov,336951.0,Assoc-acdm,12.0,Married-civ-spouse,Protective-serv,Husband,White,Male,0.0,0.0,40.0,United-States,>50K.
4,44,Private,160323.0,Some-college,10.0,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688.0,0.0,40.0,United-States,>50K.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16277,39,Private,215419.0,Bachelors,13.0,Divorced,Prof-specialty,Not-in-family,White,Female,0.0,0.0,36.0,United-States,<=50K.
16278,64,?,321403.0,HS-grad,9.0,Widowed,?,Other-relative,Black,Male,0.0,0.0,40.0,United-States,<=50K.
16279,38,Private,374983.0,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Husband,White,Male,0.0,0.0,50.0,United-States,<=50K.
16280,44,Private,83891.0,Bachelors,13.0,Divorced,Adm-clerical,Own-child,Asian-Pac-Islander,Male,5455.0,0.0,40.0,United-States,<=50K.


In [62]:
census_df.shape

(16282, 15)

In [63]:
census_df.dtypes

age                object
workclass          object
fnlwgt            float64
education          object
education-num     float64
marital-status     object
occupation         object
relationship       object
race               object
sex                object
capital-gain      float64
capital-loss      float64
hours-per-week    float64
native-country     object
income             object
dtype: object

In [64]:
census_df.shape

(16282, 15)

## Download and prepare the data

### Prepare the data

Feature values to be written to the Feature Store can take the form of a list of `WriteFeatureValuesPayload` objects, a Python `dict` of the form

`{entity_id : {feature_id : feature_value}, ...},`

or a pandas `Dataframe`, where the `index` column holds the unique entity ID strings and each remaining column represents a feature.  In this notebook, since you use a pandas `DataFrame` for ingesting features we convert the index column data type to `string` to be used as `Entity ID`.

In [65]:
census_df.columns = census_df.columns.str.replace('-', '_')

In [66]:
census_df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education_num',
       'marital_status', 'occupation', 'relationship', 'race', 'sex',
       'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
       'income'],
      dtype='object')

In [67]:
# Prepare the data
census_df.index = census_df.index.map(str)

In [68]:
# Remove null values
NA_VALUES = ["NA", "."]
census_df = census_df.replace(to_replace=NA_VALUES, value=np.NaN).dropna()

## Create Feature Store and define schemas

Vertex AI Feature Store organizes resources hierarchically in the following order:

`Featurestore -> EntityType -> Feature`

You must create these resources before you can ingest data into Vertex AI Feature Store.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore)

### Create a Feature Store

You create a Feature Store using `aiplatform.Featurestore.create` with the following parameters:

* `featurestore_id (str)`: The ID to use for this Featurestore, which will become the final component of the Featurestore's resource name. The value must be unique within the project and location.
* `online_store_fixed_node_count`: Configuration for online serving resources.
* `project`: Project to create EntityType in. If not set, project set in `aiplatform.init` is used.
* `location`: Location to create EntityType in. If not set, location set in `aiplatform.init` is used.
* `sync`:  Whether to execute this creation synchronously.

In [78]:
FEATURESTORE_ID = f"census_income_3"

census_feature_store = aiplatform.Featurestore.create(
    featurestore_id=FEATURESTORE_ID,
    online_store_fixed_node_count=1,
    project=PROJECT_ID,
    location=REGION,
    sync=True,
)

Creating Featurestore
Create Featurestore backing LRO: projects/354621994428/locations/us-central1/featurestores/census_income_3/operations/2726159710074961920
Featurestore created. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3
To use this Featurestore in another session:
featurestore = aiplatform.Featurestore('projects/354621994428/locations/us-central1/featurestores/census_income_3')


##### Verify that the Feature Store is created
Check if the Feature Store was successfully created by running the following code block.

In [79]:
fs = aiplatform.Featurestore(
    featurestore_name=FEATURESTORE_ID,
    project=PROJECT_ID,
    location=REGION,
)
print(fs.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/census_income_3"
create_time {
  seconds: 1680781145
  nanos: 691867000
}
update_time {
  seconds: 1680781145
  nanos: 833884000
}
etag: "AMEw9yMmnVhuWsll6DGDWr51vqDPceon7wC52KV0s6Kmar9JIgh2LBUe9OzgHzCo04LI"
online_serving_config {
  fixed_node_count: 1
}
state: STABLE



### Create an EntityType

An entity type is a collection of semantically related features. You define your own entity types, based on the concepts that are relevant to your use case.

Here, you create an entity type entity type named `census_entity_type` using `create_entity_type` with the following parameters:
* `entity_type_id (str)`: The ID to use for the EntityType, which will become the final component of the EntityType's resource name. The value must be unique within a Feature Store.
* `description`: Description of the EntityType.

In [80]:
ENTITY_TYPE_ID = f"census_entity_type_3"

# Create penguin entity type
census_entity_type = census_feature_store.create_entity_type(
    entity_type_id=ENTITY_TYPE_ID,
    description="Census entity type",
)

Creating EntityType
Create EntityType backing LRO: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/operations/634237683161366528
EntityType created. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
To use this EntityType in another session:
entity_type = aiplatform.EntityType('projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3')


##### Verify that the EntityType is created
Check if the Entity Type was successfully created by running the following code block.

In [81]:
entity_type = census_feature_store.get_entity_type(entity_type_id=ENTITY_TYPE_ID)

print(entity_type.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3"
description: "Census entity type"
create_time {
  seconds: 1680781231
  nanos: 214476000
}
update_time {
  seconds: 1680781231
  nanos: 214476000
}
etag: "AMEw9yP2cpSW1Wtd-WRikLHK3tzThgB5uG_g8rnBmSRCKnPEmdR69J1BgcN9MFUegJg="
monitoring_config {
}



In [82]:
column_json = {}
for col in census_df.columns:
    type_dict = {}
    if census_df[col].dtype == "O":
        type_dict['value_type'] = "STRING"
        column_json[col] = type_dict
    elif census_df[col].dtype == "float64":
        type_dict['value_type'] = "DOUBLE"
        column_json[col] = type_dict
    elif census_df[col].dtype == "int64":
        type_dict['value_type'] = "INT64"
        column_json[col] = type_dict
    elif census_df[col].dtype == "bool":
        type_dict['value_type'] = "BOOL"
        column_json[col] = type_dict

In [83]:
column_json

{'age': {'value_type': 'STRING'},
 'workclass': {'value_type': 'STRING'},
 'fnlwgt': {'value_type': 'DOUBLE'},
 'education': {'value_type': 'STRING'},
 'education_num': {'value_type': 'DOUBLE'},
 'marital_status': {'value_type': 'STRING'},
 'occupation': {'value_type': 'STRING'},
 'relationship': {'value_type': 'STRING'},
 'race': {'value_type': 'STRING'},
 'sex': {'value_type': 'STRING'},
 'capital_gain': {'value_type': 'DOUBLE'},
 'capital_loss': {'value_type': 'DOUBLE'},
 'hours_per_week': {'value_type': 'DOUBLE'},
 'native_country': {'value_type': 'STRING'},
 'income': {'value_type': 'STRING'}}

In [84]:
census_feature_configs = column_json

### Create Features
A feature is a measurable property or attribute of an entity type. . Features can be created within each entity type.

When you create a feature, you specify its value type such as `DOUBLE`, and `STRING`. This value determines what value types you can ingest for a particular feature.

Learn more about [Feature Value Types](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.featurestores.entityTypes.features)

You can create features either using `create_feature` or `batch_create_features`. Here, for convinience, you have added all feature configs in one variabel, so we use `batch_create_features`.

In [85]:
census_features = census_entity_type.batch_create_features(
    feature_configs=census_feature_configs,
)

Batch creating features EntityType entityType: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
Batch create Features EntityType entityType backing LRO: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/operations/8629815871604260864
EntityType entityType Batch created features. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3


In [86]:
census_entity_type.list_features()

[<google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f97fd124050> 
 resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/features/age,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f97fd124610> 
 resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/features/race,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f97fd1241d0> 
 resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/features/education_num,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f97fd124190> 
 resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3/features/hours_per_week,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f97fd135290> 
 resource name: proj

### Write features to the Feature Store
Use the `write_feature_values` API to write a feature to the Feature Store with the following parameter:

* `instances`: Feature values to be written to the Feature Store that can take the form of a list of WriteFeatureValuesPayload objects, a Python dict, or a pandas Dataframe.

This streaming ingestion feature has been introduced to the Vertex AI SDK under the **preview** namespace. Here, you pass the pandas `Dataframe` you created from penguins dataset as `instances` parameter.

Learn more about [Streaming ingestion API](https://github.com/googleapis/python-aiplatform/blob/e6933503d2d3a0f8a8f7ef8c178ed50a69ac2268/google/cloud/aiplatform/preview/featurestore/entity_type.py#L36)

In [87]:

max_instances = 100

num_instances = len(census_df)
num_batches, remainder = divmod(num_instances, max_instances)
if remainder:
    num_batches += 1

instance_batches = []
for i in range(num_batches):
    start_index = i * max_instances
    end_index = min((i + 1) * max_instances, num_instances)
    instance_batches.append(census_df.iloc[start_index:end_index])

for batch in instance_batches:
    census_entity_type.preview.write_feature_values(instances=batch)


Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
EntityType feature values written. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
EntityType feature values written. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
EntityType feature values written. Resource name: projects/354621994428/locations/us-central1/featurestores/census_income_3/entityTypes/census_entity_type_3
Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/census_inc

In [88]:
#census_entity_type.preview.write_feature_values(instances=census_df)

## Read back written features

Wait a few seconds for the write to propagate, then do an online read to confirm the write was successful.

In [89]:
ENTITY_IDS = [str(x) for x in range(100)]
census_entity_type.read(entity_ids=ENTITY_IDS)

Unnamed: 0,entity_id,education_num,sex,education,fnlwgt,capital_loss,income,occupation,age,workclass,capital_gain,hours_per_week,native_country,race,relationship,marital_status
0,0,,,,,,,,,,,,,,,
1,1,7.0,Male,11th,226802.0,0.0,<=50K.,Machine-op-inspct,25,Private,0.0,40.0,United-States,Black,Own-child,Never-married
2,10,4.0,Male,7th-8th,104996.0,0.0,<=50K.,Craft-repair,55,Private,0.0,10.0,United-States,White,Husband,Married-civ-spouse
3,11,9.0,Male,HS-grad,184454.0,0.0,>50K.,Machine-op-inspct,65,Private,6418.0,40.0,United-States,White,Husband,Married-civ-spouse
4,12,13.0,Male,Bachelors,212465.0,0.0,<=50K.,Adm-clerical,36,Federal-gov,0.0,40.0,United-States,White,Husband,Married-civ-spouse
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,95,14.0,Male,Masters,198751.0,0.0,<=50K.,Other-service,34,Private,0.0,40.0,United-States,Amer-Indian-Eskimo,Not-in-family,Never-married
96,96,9.0,Male,HS-grad,479296.0,0.0,<=50K.,Handlers-cleaners,20,Private,0.0,40.0,United-States,White,Own-child,Never-married
97,97,13.0,Female,Bachelors,235218.0,0.0,<=50K.,Exec-managerial,25,Private,0.0,40.0,United-States,White,Own-child,Never-married
98,98,6.0,Male,10th,164877.0,0.0,<=50K.,Farming-fishing,49,Private,0.0,40.0,United-States,White,Husband,Married-civ-spouse


## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
census_feature_store.delete(force=True)