# Vertex AI Feature Store

In this exercise we will create a feature store for the iris dataset and upload some data to it. Here are the steps you need to perform:
1. Create a new feature store
2. Create a new entity in the feature store and some features for that entity
3. Upload data to the feature store

In [None]:
! pip3 install --upgrade google-cloud-aiplatform --user -q

### Restart the kernel

After you install the SDK, you need to restart the notebook kernel so it can find the packages. You can restart kernel from *Kernel -> Restart Kernel*, or by running the following:

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

#### Setting the project ID and region

In [None]:
shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
PROJECT_ID = shell_output[0]
print("Project ID:", PROJECT_ID)

REGION = "us-central1"

In [None]:
! gcloud config set project $PROJECT_ID

#### UUID

Some resources like the cloud bucket will need to have a unique name. An easy way to do that is to use a UUID.

In [None]:
import random
import string


# Generate a UUID of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

### Import libraries and define constants

In [None]:
from google.cloud.aiplatform import Feature, Featurestore

FEATURESTORE_ID = "iris" + UUID
INPUT_CSV_FILE = "bq://bigquery-public-data.ml_datasets.iris"
ONLINE_STORE_FIXED_NODE_COUNT = 1

### Create featurestore

In [None]:
fs = Featurestore.create(
    featurestore_id=FEATURESTORE_ID,
    online_store_fixed_node_count=ONLINE_STORE_FIXED_NODE_COUNT,
    project=PROJECT_ID,
    location=REGION,
    sync=True,
)

Use the following function call to retrieve a featurestore and check that it has been created.


In [None]:
fs = Featurestore(
    featurestore_name=FEATURESTORE_ID,
    project=PROJECT_ID,
    location=REGION,
)
print(fs.gca_resource)

### Create entity Type

In [None]:
# Create the `users` entity type
species_entity_type = fs.create_entity_type(
    entity_type_id="species",
    description="Species entity",
)

To retrieve an entity type or check that it has been created use the [get_entity_type](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/featurestore/featurestore.py#L106) or [list_entity_types](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/featurestore/featurestore.py#L278) methods on the Featurestore object.


In [None]:
species_entity_type = fs.get_entity_type(entity_type_id="species")

print(species_entity_type)

In [None]:
fs.list_entity_types()

### Create feature


In [None]:
# To create one feature at a time, use:
species_feature_sepal_length = species_entity_type.create_feature(
    feature_id="sepal_length",
    value_type="DOUBLE",
    description="Sepal Length",
)

species_feature_sepal_width = species_entity_type.create_feature(
    feature_id="sepal_width",
    value_type="DOUBLE",
    description="Sepal Width",
)

Use the [`list_features`](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/featurestore/entity_type.py#L349) method to list all the features of a given entity type.

In [None]:
users_entity_type.list_features()

### Search created features

In [None]:
my_features = Feature.search(query="featurestore_id={}".format(FEATURESTORE_ID))
my_features

Now, narrow down the search to features that are of type `DOUBLE`.

In [None]:
double_features = Feature.search(
    query="value_type=DOUBLE AND featurestore_id={}".format(FEATURESTORE_ID)
)
double_features[0].gca_resource

Or, limit the search results to features with specific keywords in their ID and type.

## Import feature values

In [None]:
USERS_FEATURES_IDS = [feature.name for feature in users_entity_type.list_features()]
USERS_FEATURE_TIME = "update_time"
USERS_ENTITY_ID_FIELD = "species"
USERS_GCS_SOURCE_URI = (
    "bq://bigquery-public-data.ml_datasets.iris"
)
GCS_SOURCE_TYPE = "table"
WORKER_COUNT = 1
print(USERS_FEATURES_IDS)

In [None]:
import datetime
USERS_FEATURE_TIME=datetime.datetime.now()
USERS_FEATURE_TIME

In [None]:
species_entity_type.ingest_from_bq(
    feature_ids=USERS_FEATURES_IDS,
    feature_time=USERS_FEATURE_TIME,
    entity_id_field=USERS_ENTITY_ID_FIELD,
    bq_source_uri=USERS_GCS_SOURCE_URI
)

### Read one entity per request

In [None]:
species_entity_type.read(entity_ids="versicolor")

### Remember to delete all the resources you created to save costs

In [None]:
# Delete Featurestore
fs.delete(force=True)

# Delete BigQuery dataset
client = bigquery.Client(project=PROJECT_ID)
client.delete_dataset(
    DESTINATION_DATA_SET, delete_contents=True, not_found_ok=True
)  # Make an API request.

print("Deleted dataset '{}'.".format(DESTINATION_DATA_SET))