## Overview

This notebook introduces Vertex AI Feature Store, a managed cloud service for machine learning engineers and data scientists to store, serve, manage and share machine learning features at a large scale.

This notebook assumes that you understand basic Google Cloud concepts such as [Project](https://cloud.google.com/storage/docs/projects), [Storage](https://cloud.google.com/storage) and [Vertex AI](https://cloud.google.com/vertex-ai/docs). Some machine learning knowledge is also helpful but not required.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore).

### Objective

In this tutorial, you learn how to ingest features from a `Pandas DataFrame` into your Vertex AI Feature Store using `write_feature_values` method from the Vertex AI SDK.

This tutorial uses the following Google Cloud ML services and resources:

- Vertex AI Feature Store


The steps performed include:

- Create `Feature Store`
- Create new `Entity Type` for your `Feature Store`
- Ingest feature values from `Pandas DataFrame` into `Feature Store`'s `Entity Types`.

## Installation

Install the following packages required to execute this notebook.

In [None]:
# Install the packages
#! pip3 install --upgrade google-cloud-aiplatform\
                         # google-cloud-bigquery\
                         # numpy\
                         # pandas\
                         # pyarrow -q

In [1]:
PROJECT_ID = "ds-training-380514"
REGION = "us-central1"

### Import libraries

In [2]:
import numpy as np
import pandas as pd
from google.cloud import aiplatform, bigquery

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project.

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)

## Download and prepare the data

In [105]:
beatles_df = pd.read_feather('test_data/inference_sample.feather')

In [106]:
beatles_df

Unnamed: 0,user_name,30_Seconds_to_Mars,65daysofstatic,A_Perfect_Circle,A_Tribe_Called_Quest,ABBA,ACDC,Adele,Aerosmith,Air,...,tag_shoegazer,tag_hair_metal,tag_rapcore,tag_underground_hip_hop,tag_symphonic_black_metal,tag_darkwave,tag_world,tag_latin,tag_spanish,Like_The_Beatles
0,thegiant,1.0,,,,,,11.0,1.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
1,nezter,,,,,,,,,3.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,False
2,augustohp,,52.0,502.0,,1.0,452.0,1.0,215.0,14.0,...,0.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,True
3,stalphonzo,,,,,,6.0,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
4,davenall,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
5,Andy_Greenwell,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
6,lilyean,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
7,absentbebnim,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
8,adherr,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
9,auserzz,,,,,,,25.0,,,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,False


### Feature store name must begin with a lowercase letter or underscore and can only contain lowercase letters, numbers, and underscores.

Renaming the dataframe columns as per feature store requirements

In [107]:
beatles_df.columns = beatles_df.columns.str.lower()
for col in beatles_df.columns:
    if col[0].isdigit():
        beatles_df.rename(columns = {col: "_"+col}, inplace = True)

### Prepare the data

Feature values to be written to the Feature Store can take the form of a list of `WriteFeatureValuesPayload` objects, a Python `dict` of the form

`{entity_id : {feature_id : feature_value}, ...},`

or a pandas `Dataframe`, where the `index` column holds the unique entity ID strings and each remaining column represents a feature.  In this notebook, since you use a pandas `DataFrame` for ingesting features we convert the index column data type to `string` to be used as `Entity ID`.

In [108]:
# Prepare the data
beatles_df.index = beatles_df.index.map(str)

In [109]:
# Remove null values
NA_VALUES = ["NA", ".",None]
beatles_df = beatles_df.replace(to_replace=NA_VALUES, value=0)

In [110]:
beatles_df.shape

(10, 514)

## Create Feature Store and define schemas

Vertex AI Feature Store organizes resources hierarchically in the following order:

`Featurestore -> EntityType -> Feature`

You must create these resources before you can ingest data into Vertex AI Feature Store.

Learn more about [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore)

### Create a Feature Store

You create a Feature Store using `aiplatform.Featurestore.create` with the following parameters:

* `featurestore_id (str)`: The ID to use for this Featurestore, which will become the final component of the Featurestore's resource name. The value must be unique within the project and location.
* `online_store_fixed_node_count`: Configuration for online serving resources.
* `project`: Project to create EntityType in. If not set, project set in `aiplatform.init` is used.
* `location`: Location to create EntityType in. If not set, location set in `aiplatform.init` is used.
* `sync`:  Whether to execute this creation synchronously.

In [13]:
FEATURESTORE_ID = f"beatles_1"

beatles_feature_store = aiplatform.Featurestore.create(
    featurestore_id=FEATURESTORE_ID,
    online_store_fixed_node_count=1,
    project=PROJECT_ID,
    location=REGION,
    sync=True,
)

Creating Featurestore
Create Featurestore backing LRO: projects/354621994428/locations/us-central1/featurestores/beatles_1/operations/4823429761546059776
Featurestore created. Resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1
To use this Featurestore in another session:
featurestore = aiplatform.Featurestore('projects/354621994428/locations/us-central1/featurestores/beatles_1')


##### Verify that the Feature Store is created
Check if the Feature Store was successfully created by running the following code block.

In [129]:
fs = aiplatform.Featurestore(
    featurestore_name=FEATURESTORE_ID,
    project=PROJECT_ID,
    location=REGION,
)
print(fs.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/beatles_1"
create_time {
  seconds: 1680761518
  nanos: 486890000
}
update_time {
  seconds: 1680761518
  nanos: 595467000
}
etag: "AMEw9yP2tFokDVHZdEvY5GkzbtoNhdjZTsUb0iCtdTWRnpje9LT7up1ShBQvhyBn3rQz"
online_serving_config {
  fixed_node_count: 1
}
state: STABLE



### Create an EntityType

An entity type is a collection of semantically related features. You define your own entity types, based on the concepts that are relevant to your use case. For example, a movie service might have the entity types `movie` and `user`, which group related features that correspond to movies or users.

Here, you create an entity type entity type named `penguin_entity_type` using `create_entity_type` with the following parameters:
* `entity_type_id (str)`: The ID to use for the EntityType, which will become the final component of the EntityType's resource name. The value must be unique within a Feature Store.
* `description`: Description of the EntityType.

In [131]:
ENTITY_TYPE_ID = f"beatles_entity_type_4"

# Create beatles entity type
beatles_entity_type = beatles_feature_store.create_entity_type(
    entity_type_id=ENTITY_TYPE_ID,
    description="Beatles entity type",
)

Creating EntityType
Create EntityType backing LRO: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/operations/5061909435564163072
EntityType created. Resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4
To use this EntityType in another session:
entity_type = aiplatform.EntityType('projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4')


##### Verify that the EntityType is created
Check if the Entity Type was successfully created by running the following code block.

In [132]:
entity_type = beatles_feature_store.get_entity_type(entity_type_id=ENTITY_TYPE_ID)

print(entity_type.gca_resource)

name: "projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4"
description: "Beatles entity type"
create_time {
  seconds: 1680767531
  nanos: 604546000
}
update_time {
  seconds: 1680767531
  nanos: 604546000
}
etag: "AMEw9yN9el4pEIwtMECduAe2j6rwLJU8ro2xqcih4Lg8KWcOO_BLLSZDU8yuzpo1pboU"
monitoring_config {
}



### Create Features
A feature is a measurable property or attribute of an entity type. Features can be created within each entity type.

When you create a feature, you specify its value type such as `DOUBLE`, and `STRING`. This value determines what value types you can ingest for a particular feature.

Learn more about [Feature Value Types](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.featurestores.entityTypes.features)

In [133]:
column_json = {}
for col in beatles_df.columns:
    type_dict = {}
    if beatles_df[col].dtype == "O":
        type_dict['value_type'] = "STRING"
        column_json[col] = type_dict
    elif beatles_df[col].dtype == "float64":
        type_dict['value_type'] = "DOUBLE"
        column_json[col] = type_dict
    elif beatles_df[col].dtype == "int64":
        type_dict['value_type'] = "INT64"
        column_json[col] = type_dict
    elif beatles_df[col].dtype == "bool":
        type_dict['value_type'] = "BOOL"
        column_json[col] = type_dict

In [135]:
len(column_json)

514

In [136]:
column_json

{'user_name': {'value_type': 'STRING'},
 '_30_seconds_to_mars': {'value_type': 'DOUBLE'},
 '_65daysofstatic': {'value_type': 'DOUBLE'},
 'a_perfect_circle': {'value_type': 'DOUBLE'},
 'a_tribe_called_quest': {'value_type': 'INT64'},
 'abba': {'value_type': 'DOUBLE'},
 'acdc': {'value_type': 'DOUBLE'},
 'adele': {'value_type': 'DOUBLE'},
 'aerosmith': {'value_type': 'DOUBLE'},
 'air': {'value_type': 'DOUBLE'},
 'alanis_morissette': {'value_type': 'DOUBLE'},
 'alice_in_chains': {'value_type': 'DOUBLE'},
 'amon_amarth': {'value_type': 'DOUBLE'},
 'amon_tobin': {'value_type': 'DOUBLE'},
 'amorphis': {'value_type': 'DOUBLE'},
 'anal_cunt': {'value_type': 'INT64'},
 'anathema': {'value_type': 'DOUBLE'},
 'animal_collective': {'value_type': 'DOUBLE'},
 'aphex_twin': {'value_type': 'DOUBLE'},
 'apocalyptica': {'value_type': 'DOUBLE'},
 'arcade_fire': {'value_type': 'DOUBLE'},
 'arctic_monkeys': {'value_type': 'DOUBLE'},
 'audioslave': {'value_type': 'DOUBLE'},
 'autechre': {'value_type': 'DOUB

In [137]:
beatles_feature_configs = column_json

You can create features either using `create_feature` or `batch_create_features`. Here, for convinience, you have added all feature configs in one variabel, so we use `batch_create_features`.

In [138]:
beatles_features = beatles_entity_type.batch_create_features(
    feature_configs=beatles_feature_configs,
)

Batch creating features EntityType entityType: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4
Batch create Features EntityType entityType backing LRO: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/operations/8176359684123394048
EntityType entityType Batch created features. Resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4


In [142]:
beatles_entity_type.list_features()

[<google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f8dac520a50> 
 resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/features/belle_and_sebastian,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f8dac2bbe50> 
 resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/features/tag_nintendo,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f8dac2bbed0> 
 resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/features/ratatat,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f8dac5413d0> 
 resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4/features/tag_alt_country,
 <google.cloud.aiplatform.featurestore.feature.Feature object at 0x7f8dac549b10> 
 resource name: proje

### Write features to the Feature Store
Use the `write_feature_values` API to write a feature to the Feature Store with the following parameter:

* `instances`: Feature values to be written to the Feature Store that can take the form of a list of WriteFeatureValuesPayload objects, a Python dict, or a pandas Dataframe.

This streaming ingestion feature has been introduced to the Vertex AI SDK under the **preview** namespace. Here, you pass the pandas `Dataframe` you created from penguins dataset as `instances` parameter.

Learn more about [Streaming ingestion API](https://github.com/googleapis/python-aiplatform/blob/e6933503d2d3a0f8a8f7ef8c178ed50a69ac2268/google/cloud/aiplatform/preview/featurestore/entity_type.py#L36)

### Wait for few minutes for ingesting data to the features

In [144]:
beatles_entity_type.preview.write_feature_values(instances=beatles_df)

Writing EntityType feature values: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4
EntityType feature values written. Resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4


<google.cloud.aiplatform.preview.featurestore.entity_type.EntityType object at 0x7f8dac53b9d0> 
resource name: projects/354621994428/locations/us-central1/featurestores/beatles_1/entityTypes/beatles_entity_type_4

## Read back written features

Wait a few seconds for the write to propagate, then do an online read to confirm the write was successful.

In [147]:
ENTITY_IDS = [str(x) for x in range(10)]
beatles_entity_type.read(entity_ids=ENTITY_IDS)

Unnamed: 0,entity_id,tag_house,tag_geek_rock,tag_trip_hop,rem,flying_lotus,evanescence,tag_pop,tag_metalcore,korn,...,_30_seconds_to_mars,howard_shore,tag_american,ludwig_van_beethoven,tag_british_invasion,tag_sludge,tag_drum_and_bass,tag_gothic,tag_east_coast_rap,king_crimson
0,0,1.0,1.0,3.0,,,1.0,9.0,0.0,,...,1.0,,0.0,,0.0,0.0,0.0,1.0,0.0,
1,1,2.0,0.0,5.0,,2.0,,5.0,0.0,,...,,,1.0,,0.0,0.0,0.0,0.0,1.0,
2,2,2.0,0.0,5.0,1.0,3.0,1.0,20.0,1.0,3.0,...,,5.0,1.0,28.0,0.0,1.0,1.0,2.0,1.0,
3,3,0.0,1.0,0.0,11.0,,,3.0,0.0,1.0,...,,,0.0,,1.0,0.0,0.0,0.0,0.0,2.0
4,4,0.0,0.0,0.0,,,,0.0,0.0,,...,,,0.0,,0.0,0.0,0.0,0.0,0.0,
5,5,0.0,0.0,0.0,,,,1.0,0.0,,...,,,0.0,,0.0,0.0,1.0,0.0,0.0,
6,6,1.0,0.0,0.0,,,,0.0,0.0,,...,,,0.0,,0.0,0.0,0.0,0.0,0.0,
7,7,0.0,0.0,0.0,,,,1.0,0.0,,...,,,0.0,,0.0,0.0,0.0,0.0,0.0,
8,8,0.0,0.0,0.0,,,,1.0,0.0,,...,,,0.0,,0.0,0.0,0.0,0.0,0.0,
9,9,0.0,0.0,1.0,,,,13.0,0.0,,...,,,0.0,,0.0,0.0,0.0,0.0,1.0,


## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
beatles_feature_store.delete(force=True)