Terms of Service:

This is a [Preview](https://cloud.google.com/products#section-22) release.

This colab demonstrates how to use the Experimental release of the AI Platform
Feature Store. This release delivers a small subset of the core functionality of
the Feature Store. In particular it enables:

1.  Batch ingestion of feature values,
2.  Batch serving of feature values (for model training), and
3.  Online serving of feature values (for models deployed for online
    prediction).

This colab assumes that the reader has overall familiarity with Google Cloud AI
Platform's planned Feature Store. Please contact
caip-featurestore-feedback@google.com if you want access to slides providing an
overview.

Note that there is a *Troubleshooting* section at the end.

# Terminology

Here's a quick primer of terminology used in this colab and in the
[Feature Store API Documentation](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference).

## Feature Store Data Model

```
Featurestore -> EntityType -> Feature
```

### Featurestore

A featurestore is an instance of Feature Store. It contains all the resources
(storage, compute, etc) to support the functionality. Its API resource name is
Featurestore.

### Entity Type

An entity type defines a collection of semantically-related features. For
example, in an application, `user` might be an entity type. A specific user (like `user_foo`) is an instance of the entity type and is an entity. Its API resource name is EntityType.

### Feature

A feature defines a measurable property of an entity type. For example, `user`
is an entity type, `user_foo` is an entity, `age` is a feature of entity type
`user`, whereas `25` is a feature value. Types of feature values can range from
primitives, to strings, to arrays. A complete list of feature value types
supported by Feature Store can be found
[here](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#valuetype).
Its API resource name is Feature.

## Important terms

### Entity

A Featurestore is a three-dimensional store: Each Feature value is indexed by an
entity ID, Feature ID, and feature generation timestamp.

An entity is a collection of Feature values materialized in the Featurestore;
more specifically, it is an instance of a defined EntityType. Thus, an entity ID
refers to a physical location in the Featurestore.

# Prerequisite

**Note:** This colab is tested on
[AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). Please
[create a new notebook instance](https://cloud.google.com/ai-platform/notebooks/docs/create-new)
with "Python 3" and upload this colab to the notebook instance.

## Set up a Google Cloud project

Please fill out the
[Feature Store EAP Registration](https://docs.google.com/forms/d/e/1FAIpQLSd8aocomwk9nV_S3iEvOto4PGxjprUbG223_F1NlPl9-NPUlA/viewform)
to allowlist your project and get access to the
[Feature Store API documentation](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference).
After that, enable the **"Cloud AI Platform API"**:

```
gcloud services enable aiplatform.googleapis.com --project ${your-project-id}
```

## Set up credentials

You can only use the private Feature Store SDK in the allowlisted project, with
credentials that can access the project. To obtain credentials in the notebook,
open a new terminal session (File > New > Terminal). In the terminal, run
`gcloud auth login` and follow the instructions to authenticate.

## Install Feature Store Python SDK

> **Note**: If you have installed a previous release of the
> google-cloud-aiplatform SDK, you should uninstall it and install the SDK
> accompanying this release.

In [None]:
# Uninstall previous version of google-cloud-aiplatform SDK
!pip uninstall google-cloud-aiplatform -y

In [None]:
!gsutil cp gs://cloud-aiplatform-featurestore/sdk/v1beta1/210413/aiplatform-v1beta1-py.tar.gz ./

In [None]:
!pip install aiplatform-v1beta1-py.tar.gz
!rm aiplatform-v1beta1-py.tar.gz

Now, **restart the kernel** and verify the SDK installation by `pip freeze |
grep aiplatform`.

In [1]:
!pip freeze | grep aiplatform

aiplatform-pipelines-client @ file:///home/jupyter/aiplatform_pipelines_client-0.1.0.caip20210415-py3-none-any.whl
google-cloud-aiplatform @ file:///home/jupyter/aiplatform-v1beta1-py.tar.gz


# Environment setup

In [1]:
# Set up project, location, featurestore ID and endpoints
PROJECT_ID = "kubeflow-on-gcp-123"  #@param {type:"string"}
LOCATION = "us-central1"  #@param {type:"string"}
API_ENDPOINT = "us-central1-aiplatform.googleapis.com"  #@param {type:"string"}

In [2]:
from google.api_core import operations_v1
from google.cloud.aiplatform_v1beta1 import FeaturestoreOnlineServingServiceClient
from google.cloud.aiplatform_v1beta1 import FeaturestoreServiceClient
from google.cloud.aiplatform_v1beta1.types import featurestore_online_service as featurestore_online_service_pb2
from google.cloud.aiplatform_v1beta1.types import entity_type as entity_type_pb2
from google.cloud.aiplatform_v1beta1.types import feature as feature_pb2
from google.cloud.aiplatform_v1beta1.types import feature_selector as feature_selector_pb2
from google.cloud.aiplatform_v1beta1.types import featurestore as featurestore_pb2
from google.cloud.aiplatform_v1beta1.types import featurestore_service as featurestore_service_pb2
from google.cloud.aiplatform_v1beta1.types import io as io_pb2

In [3]:
# Create admin_client for CRUD and data_client for reading feature values.
admin_client = FeaturestoreServiceClient(
    client_options={"api_endpoint": API_ENDPOINT})
data_client = FeaturestoreOnlineServingServiceClient(
    client_options={"api_endpoint": API_ENDPOINT})

# Represents featurestore resource path.
BASE_RESOURCE_PATH = admin_client.common_location_path(PROJECT_ID, LOCATION)

In [4]:
# Create operation client to poll LRO status.
lro_client = operations_v1.OperationsClient(admin_client.transport.grpc_channel)

# Featurestore Admin API

An administrator can create/manage a featurestore using the
[Feature Store Admin API](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#featurestoreservice).
The administrator can configure some of the featurestore's underlying resources, such as the number of Bigtable nodes.

## Create featurestore

The method to create a featurestore returns a
[long-running operation](https://google.aip.dev/151) (LRO). An LRO starts an asynchronous job. LROs are returned for other API
methods too, such as updating or deleting a featurestore. Calling
`create_fs_lro.result()` waits for the LRO to complete.

In [5]:
FEATURESTORE_ID = "featurestore_demo"
create_lro = admin_client.create_featurestore(
    featurestore_service_pb2.CreateFeaturestoreRequest(
        parent=BASE_RESOURCE_PATH,
        featurestore_id=FEATURESTORE_ID,
        featurestore=featurestore_pb2.Featurestore(
            display_name="My demo featurestore",
            online_serving_config=featurestore_pb2.Featurestore
            .OnlineServingConfig(fixed_node_count=3))))

In [6]:
# Wait for LRO to finish and get the LRO result.
print(create_lro.result())

name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo"



You can use [GetFeaturestore](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.GetFeaturestore) or [ListFeaturestores](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.ListFeaturestores) to check if the featurestore was successfully created. The following example gets the details of the featurestore.


In [7]:
admin_client.get_featurestore(name = admin_client.featurestore_path(PROJECT_ID, LOCATION, FEATURESTORE_ID))

name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo"
create_time {
  seconds: 1629225762
  nanos: 770855000
}
update_time {
  seconds: 1629225762
  nanos: 856510000
}
etag: "AMEw9yNeT8OElvbCjA_0ZiZWHTrwbzp1vX-xxOzY18Ufsq9V5y0NSZZODPJF-ivbQ_c1"
online_serving_config {
  fixed_node_count: 3
}
state: STABLE

# Feature Management API

The
[Feature Management API](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#featurestoreservice)
allows you to make CRUD calls for entity types and features. You must create the relevant entity types and features before you run batch ingestion
to import feature values.

## Create entity type

In [8]:
entity_type_lro = admin_client.create_entity_type(
    featurestore_service_pb2.CreateEntityTypeRequest(
        parent=admin_client.featurestore_path(PROJECT_ID, LOCATION, FEATURESTORE_ID),
        entity_type_id="bikes", 
        entity_type=entity_type_pb2.EntityType(description="Bike features")))

# Similarly, wait for EntityType creation operation.
print(entity_type_lro.result())

name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes"
etag: "AMEw9yPss-TPI1qvq3XX9rozSpjhvipj952PWddpO0UBI5G3fxMy"



## Create features

Next, register 3 features (`duration_minutes`, `subscriber_type` and `station_array` of type `INT64`, `STRING` and `INT64_ARRAY` respectively) by using the [BatchCreateFeatures](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.BatchCreateFeatures) method. You can also use the [CreateFeature](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.CreateFeature) method to add one feature at a time. 

In [9]:
admin_client.batch_create_features(
    parent=admin_client.entity_type_path(PROJECT_ID, LOCATION, FEATURESTORE_ID, "bikes"),
    requests=[
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.INT64,
                description="Bike duration minutes"),
            feature_id="duration_minutes"),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.STRING,
                description="Subscriber type"),
            feature_id="subscriber_type"),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.INT64_ARRAY,
                description="Station location array"),
            feature_id="station_array")
    ]).result()

features {
  name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/duration_minutes"
  etag: "AMEw9yOfztkSE3fcAS88dPSxUxIBtZRVMvZKGUYMh0CQR3ZmHFuy"
}
features {
  name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/subscriber_type"
  etag: "AMEw9yOn6mTqkXQ1afm-X5ZzNFz3L7_qeD5z1jGCsBGJE8yELjI0"
}
features {
  name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/station_array"
  etag: "AMEw9yN2mbbGr4QhpaeA5jzyng3OvGYBp4SwIFRuVpP97t8adTA2"
}

Create the `station` entity type and its features.

In [10]:
print(
    admin_client.create_entity_type(
        featurestore_service_pb2.CreateEntityTypeRequest(
            parent=admin_client.featurestore_path(PROJECT_ID, LOCATION,
                                                  FEATURESTORE_ID),
            entity_type_id="stations",
            entity_type=entity_type_pb2.EntityType(
                description="Station features"))).result())

print(
    admin_client.batch_create_features(
        parent=admin_client.entity_type_path(PROJECT_ID, LOCATION,
                                             FEATURESTORE_ID, "stations"),
        requests=[
            featurestore_service_pb2.CreateFeatureRequest(
                feature=feature_pb2.Feature(
                    value_type=feature_pb2.Feature.ValueType.DOUBLE,
                    description="Station latitude"),
                feature_id="latitude"),
            featurestore_service_pb2.CreateFeatureRequest(
                feature=feature_pb2.Feature(
                    value_type=feature_pb2.Feature.ValueType.DOUBLE,
                    description="Station longitude"),
                feature_id="longitude")
        ]).result())

name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations"
etag: "AMEw9yPC-vJLDzF-uUkfwATcgnCd_uOT-MF5Kg4KnRFK3VPr5QH6"

features {
  name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/latitude"
  etag: "AMEw9yOzJNMYVAFMts2vKC8HBSY9wEebkGl09K9BFkr4lN3J-eCX"
}
features {
  name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/longitude"
  etag: "AMEw9yNn6iOmWGwvElPFGr--7LcpHWsZdUmE-b58VZ0sZKjoHIOs"
}



## Search features

While the [ListFeatures](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.ListFeatures) method allows you to easily view all features of a single
entity type, the [SearchFeatures](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.SearchFeatures) method searches across all featurestores
and entity types in a given location (such as `us-central1`). This can help you discover features that were created by someone else.

You can query based on feature properties including feature ID, entity type ID,
and feature description. You can also limit results by filtering on a specific
featurestore, feature value type, and/or labels.

In [11]:
# Search for all features across all featurestores.
list(admin_client.search_features(location=BASE_RESOURCE_PATH))

[name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/duration_minutes"
 description: "Bike duration minutes"
 create_time {
   seconds: 1629225887
   nanos: 584465000
 }
 update_time {
   seconds: 1629225887
   nanos: 584465000
 },
 name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/station_array"
 description: "Station location array"
 create_time {
   seconds: 1629225887
   nanos: 588310000
 }
 update_time {
   seconds: 1629225887
   nanos: 588310000
 },
 name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes/features/subscriber_type"
 description: "Subscriber type"
 create_time {
   seconds: 1629225887
   nanos: 586231000
 }
 update_time {
   seconds: 1629225887
   nanos: 586231000
 },
 name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/latitude"
 description: "

Now, narrow down the search to features that are of type `DOUBLE`

In [12]:
# Search for all features with value type `DOUBLE`
list(
    admin_client.search_features(
        featurestore_service_pb2.SearchFeaturesRequest(
            location=BASE_RESOURCE_PATH, query="value_type=DOUBLE")))

[name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/latitude"
 description: "Station latitude"
 create_time {
   seconds: 1629225963
   nanos: 133388000
 }
 update_time {
   seconds: 1629225963
   nanos: 133388000
 },
 name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/longitude"
 description: "Station longitude"
 create_time {
   seconds: 1629225963
   nanos: 135026000
 }
 update_time {
   seconds: 1629225963
   nanos: 135026000
 }]

Further limit the search results to features with specific keywords in their ID.

In [13]:
# Filter on feature value type and keywords.
list(
    admin_client.search_features(
        featurestore_service_pb2.SearchFeaturesRequest(
            location=BASE_RESOURCE_PATH, query="feature_id:latitude AND value_type=DOUBLE")))

[name: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/stations/features/latitude"
 description: "Station latitude"
 create_time {
   seconds: 1629225963
   nanos: 133388000
 }
 update_time {
   seconds: 1629225963
   nanos: 133388000
 }]

# Batch Ingestion (ImportFeatureValues API)

Batch ingestion involves importing feature values for existing features into a
featurestore. Hence, it uses the custom HTTP verb `importFeatureValues`. You can ingest values for multiple features at once if all features are of the same entity type. 

Prior to ingestion, you can have feature values saved in various locations like in BigQuery tables or in Cloud Storage files (files that are in standard formats like Avro or CSV). Post-ingestion, feature values can be served in a uniform format from a featurestore. You might use batch serving for model training and online serving for prediction, for example.

  > Note: A featurestore and its source data can be in different projects.
  > However, you must grant permissions in the source data project to the
  > Feature Store service account. For more information, see the
  > "Cross-project IAM permissions" section.

## Ingestion configuration

During batch ingestion of feature values, you must specify the following:

*   data source format
*   data source location
*   destination features that the feature values are instances of

Each row in the data source corresponds to one entity, and you must specify how
the data source columns correspond to entity IDs, feature generation timestamps,
and feature values.

You will learn how to set these configurations using [`ImportFeatureValuesRequest`](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.ImportFeatureValuesRequest) in
the code below.

You can **import values for up to 100 features of exactly one entity type**
for an `importFeatureValues` operation.

### Data source layout

The data source must adhere to the following layout for a given row:

*   (Required) Column for entity IDs: holds the IDs of the specific entities the
    ingested feature values are for, must be of type **String**.
*   (Required) Columns for feature values: hold the feature values, and each
    column must be of a type compatible with the destination feature.
*   (Optional) Column for feature generation timestamps: holds the feature
    generation timestamps of the ingested feature values. See details in the
    "Timestamps for ingested feature values" section. If provided, timestamps
    must follow the corresponding requirements, depending on the data source
    format:
    *   BigQuery table: must be a **TIMESTAMP column**.
    *   Avro: must be of **type long and logical type timestamp-micros**.
    *   CSV: must be in the **RFC 3339 format**.

**Example data source**

Here is a schema for Avro data. For a given row, `bike_id` is the column name for the entity ID, and `start_time` is the column name for the feature generation timestamp. The remaining fields (`subscriber_type`, `duration_minutes`, and `station_array`) are column names for feature values.

**Avro schema**

```
{
  "type":"record",
  "name":"Root",
  "fields":[
    {
      "name":"bike_id",
      "type":["null","string"]},
    {
      "name":"subscriber_type",
      "type":["null","string"]},
    {
      "name":"start_time",
      "type":["null",{"type":"long","logicalType":"timestamp-micros"}]},
    {
      "name":"duration_minutes",
      "type":["null","long"]},
    {
      "name":"station_array",
      "type":{"type":"array","items":"long"}
      }
  ]
}
```

### Timestamps for ingested feature values

Batch ingestion expects user-provided timestamps for the ingested feature
values. There are two ways to specify timestamps in the [`ImportFeatureValuesRequest`](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.ImportFeatureValuesRequest):

1.  If the timestamp for all feature values being ingested is the same, you can
    specify it just once by using the `feature_time` field as part of the
    request.

2.  If the timestamps for feature values are different, the timestamps have to
    be specified in a column in the data source. In the batch ingestion request,
    you can specify the name of that column by using `feature_time_field` field.

### Data source location considerations

*   If the data source is in Cloud Storage, the data must be in the same
    [*region*](https://cloud.google.com/storage/docs/locations#key-concepts)
    as the featurestore. For example, a featurestore in
    `us-central1` can only ingest data from files in Cloud Storage buckets in
    `us-central1`. Ingesting data from dual-region and multi-region buckets is
    not supported.

*   If the data source is in BigQuery, the data must be in the same
    [*region*](https://cloud.google.com/bigquery/docs/locations#key-concepts)
    as the featurestore. For example, a
    featurestore in `us-central1` can only ingest data from BigQuery tables in
    region `us-central1`.

## Import feature values for entities of type bikes

In [14]:
# Specify source data location and its format.
import_request = featurestore_service_pb2.ImportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(PROJECT_ID, LOCATION,
                                              FEATURESTORE_ID, "bikes"),
    bigquery_source=io_pb2.BigQuerySource(
        input_uri="bq://cloud-aiplatform-assets.cloud_aiplatform_featurestore_us_central1.bike_data_2019_10"
    ),
    feature_specs=[
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
            id="subscriber_type"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
            id="station_array"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
            id="duration_minutes"),
    ],
    entity_id_field="bike_id",
    feature_time_field="start_time",
    worker_count=10)

> Tips:
>
> *   Choose the number of workers based on the value of
>     [`fixed_node_count`](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#featurestore),
>     which was set when you provisioned your featurestore. The recommended
>     worker:node ratio is 2:1 or 3:1, though you can go higher if the online
>     serving load is low. The maximum number of workers is `100`.
>
> *   The default value for the `entity_id_field` parameter is `entity_id`. If
>     your source data already has a column with that same name, you can skip
>     setting the `entity_id_field` parameter.
>
> *   If the source column name is not the same as the target feature ID,
>     use the
>     [`source_field`](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.ImportFeatureValuesRequest.FeatureSpec)
>     parameter to specify the source column name. For example:
>
> ```
> featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
>         id="subscriber_type", source_field = 'actual_subscriber'),
> ```

In [15]:
# Expect the import to take a few minutes.
ingestion_lro = admin_client.import_feature_values(import_request)

In [16]:
# Polls for the LRO status and prints when the LRO has completed
ingestion_lro.result()

imported_entity_count: 8687
imported_feature_value_count: 26055

## Import feature values for entities of type stations

In [17]:
import_request = featurestore_service_pb2.ImportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(PROJECT_ID, LOCATION,
                                              FEATURESTORE_ID, "stations"),
    bigquery_source=io_pb2.BigQuerySource(
        input_uri="bq://cloud-aiplatform-assets.cloud_aiplatform_featurestore_us_central1.station_data"
    ),
    feature_specs=[
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
            id="latitude"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(
            id="longitude"),
    ],
    entity_id_field="station_id",
    feature_time_field="update_time",
    worker_count=10)

# Expect the import to take a few minutes.
admin_client.import_feature_values(import_request).result()

imported_entity_count: 96
imported_feature_value_count: 192

# Online serving

The
[Online Serving APIs](https://cloud.google.com/ai-platform-unified/featurestore/docs/reference/rpc/google.cloud.aiplatform.v1beta1#featurestoreonlineservingservice)
lets you serve feature values for small batches of entities. Due to its low
latency, it is often used to serve feature values to models deployed for online
prediction.

### Basic read (ReadFeatureValues API)

The ReadFeatureValues API is used to read feature values of one entity; hence
its custom HTTP verb is `readFeatureValues`. By default, the API will return the
latest value of each feature, such as the feature values with the most recent
timestamp.

To read feature values, specify the entity ID and features to read. The response
contains a `header` and an `entity_view`. Each row of data in the `entity_view`
contains one feature value, in the same order of features as listed in the response header.

In [18]:
# Specify the features for which to fetch the latest value.
feature_selector = feature_selector_pb2.FeatureSelector(
    id_matcher=feature_selector_pb2.IdMatcher(
        ids=["subscriber_type", "station_array", "duration_minutes"]))

data_client.read_feature_values(
    featurestore_online_service_pb2.ReadFeatureValuesRequest(
        entity_type=admin_client.entity_type_path(PROJECT_ID, LOCATION,
                                                  FEATURESTORE_ID, "bikes"),
        entity_id="004G",
        feature_selector=feature_selector))

header {
  entity_type: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes"
  feature_descriptors {
    id: "subscriber_type"
  }
  feature_descriptors {
    id: "station_array"
  }
  feature_descriptors {
    id: "duration_minutes"
  }
}
entity_view {
  entity_id: "004G"
  data {
    value {
      string_value: "Local365"
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
  data {
    value {
      int64_array_value {
        values: 2567
        values: 2566
      }
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
  data {
    value {
      int64_value: 6
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
}

### Read from multiple entities (StreamingReadFeatureValues API)

To read feature values from multiple entities, use the
StreamingReadFeatureValues API, which is almost identical to the previous
ReadFeatureValues API.

Specify the desired entities and feature selectors in one request and then
consume from a response stream: Feature Store will first send a `header`-only
response, and then send `entity_view`-only responses, one-by-one. Again, each
row of data in the `entity_view` contains one feature value, in the same order as listed in the response header.

In [19]:
# Read the same set of features as above, but for multiple entities.
response_stream = data_client.streaming_read_feature_values(
    featurestore_online_service_pb2.StreamingReadFeatureValuesRequest(
        entity_type=admin_client.entity_type_path(PROJECT_ID, LOCATION,
                                                  FEATURESTORE_ID, "bikes"),
        entity_ids=["004G", "190G"],
        feature_selector=feature_selector))

In [20]:
# Iterate and process response. Note the first one is always the header only.
for response in response_stream:
  print(response)

header {
  entity_type: "projects/306016756844/locations/us-central1/featurestores/featurestore_demo/entityTypes/bikes"
  feature_descriptors {
    id: "subscriber_type"
  }
  feature_descriptors {
    id: "station_array"
  }
  feature_descriptors {
    id: "duration_minutes"
  }
}

entity_view {
  entity_id: "004G"
  data {
    value {
      string_value: "Local365"
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
  data {
    value {
      int64_array_value {
        values: 2567
        values: 2566
      }
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
  data {
    value {
      int64_value: 6
      metadata {
        generate_time {
          seconds: 1572277090
        }
      }
    }
  }
}

entity_view {
  entity_id: "190G"
  data {
    value {
      string_value: "ACL 2019 Pass"
      metadata {
        generate_time {
          seconds: 1572297697
        }
      }
    }
  }
 

# Batch Serving (BatchReadFeatureValues API)

Batch Serving is used to fetch feature values for training a model or for batch
prediction. In this user guide we will focus on the scenario of fetching feature
values for training a model.

## Fetching feature values for model training

Feature Store enables feature sharing i.e. a feature in a featurestore can be
used for multiple different use cases. Each use case will have its own training
data a.k.a labelled data (i.e. the labels that the model will be trained to
predict).

Thus, while fetching data from a featurestore to train a model, one has to fetch
the feature values that correspond to each label that will be used to train the
model.

Each label is essentially an observation or measurement, made at a specific
point in time, about a specific entity (or entities). For example: Did the user
click on the ad, did the user purchase the product, etc. Thus for each label,
one has to fetch the feature values for the corresponding entity (or entities),
as of the point in time when the label was observed or measured.

That is exactly what the BatchReadFeatureValues function does.

## Inputs and outputs of BatchReadFeatureValues

**Inputs**:

Users need to provide the following info in the request:

*   Features to read (i.e. the features for which values are requested). If an
    EntityType were a table, each Feature ID would select a column.
*   An input file containing the following information for each training
    instance:

> *   **Timestamp**: This allows Feature Store to fetch feature values as of the
>     point in time when the label was observed/measured.

> *   **Entity ID(s)**: The ID(s) of the entity/entities corresponding to the
>     labelled instance. You can think of these IDs as the keys that will be
>     used to concatenate feature values. If an EntityType were a table, each
>     entity ID would select a row.

*   Destination URI and format.

**Output**:

A file, at the user-specified destination URI, containing the requested feature
values for each training instance.

**Example**:

Let's continue with the bike/station dataset and suppose we are given a machine
learning task to predict whether a bike is available at a given station. For
this model, the training data we need would be a join of some bikes and stations
features, whose label may come from ground-truth that describes whether a bike X
was available at station Y.

To be more specific, the desired training dataset is described in Table 1. Here,
we use the `subscriber_type` and `duration_minutes` features from `bikes` and
the `latitude` and `longitude` features from `stations`. Our ground-truth
observation is described in Table 2. BatchReadFeatureValues API takes Table 2 as
input and returns Table 1 for training.

<h4 align="center">Table 1. Expected Training Data Generated by Batch Read API</h4>

timestamp            | entity_type_bikes | subscriber_type | duration_minutes | entity_type_stations | latitude | longitude | label
-------------------- | ----------------- | --------------- | ---------------- | -------------------- | -------- | --------- | -----
2019-11-01T00:00:00Z | 004G              | Local365        | 6                | 3841                 | 30.28728 | -97.74495 | True
2019-11-15T18:09:43Z | 190G              | ACL 2019 Pass   | 14               | 2567                 | 30.25971 | -97.75346 | True
...                  | ...               | ...             | ...              | ...                  | ...      | ...       | ...

<h4 align="center">Table 2. Example of Ground-truth Data</h4>

bikes | stations | timestamp            | label
----- | -------- | -------------------- | -----
004G  | 3841     | 2019-11-01T00:00:00Z | True
190G  | 2567     | 2019-11-15T18:09:43Z | True
...   | ...      | ...                  | ...

Next, let's figure out how to call the API and generate such training data using
the ground-truth data.

## API Details

Step 1. In the project, create a dataset to store output, using the [BigQuery console](https://console.cloud.google.com/bigquery). For the input ground-truth data, use the public data located at ```gs://cloud-aiplatform-featurestore-us-central1/read_entity_instance.csv```. This ground-truth data is used to select entities from each requested entity type along with the features.

Step 2. Assemble the query following the steps below.

In [21]:
# The input file containing the ground-truth data (table 2) is a CSV file in a Cloud Storage bucket.
# In this demo, use the following public data, nothing needs to be changed here.
INPUT_CSV_FILE = "gs://cloud-aiplatform-featurestore-us-central1/read_entity_instance.csv"

# Output data set, as created in the step 1.
DESTINATION_DATA_SET = "sink"  #@param {type:"string"}
# Output table. Make sure that the table does NOT already exist; the BatchReadFeatureValues API cannot overwrite an existing table
DESTINATION_TABLE_NAME = "demo_output" #@param {type:"string"}

DESTINATION_PATTERN = "bq://{project}.{dataset}.{table}"
DESTINATION_TABLE_URI = DESTINATION_PATTERN.format(project=PROJECT_ID,
    dataset=DESTINATION_DATA_SET, table=DESTINATION_TABLE_NAME)

In [22]:
batch_serving_request = featurestore_service_pb2.BatchReadFeatureValuesRequest(
    # featurestore info
    featurestore=admin_client.featurestore_path(PROJECT_ID, LOCATION,
                                                FEATURESTORE_ID),
    # Input file specifying the entities to be read
    csv_read_instances=io_pb2.CsvSource(
        gcs_source=io_pb2.GcsSource(uris=[INPUT_CSV_FILE])),
    # Output info
    destination=featurestore_service_pb2.FeatureValueDestination(
        bigquery_destination=io_pb2.BigQueryDestination(
            # output to BigQuery table
            output_uri=DESTINATION_TABLE_URI)),
    # Select features to read
    entity_type_specs=[
        featurestore_service_pb2.BatchReadFeatureValuesRequest.EntityTypeSpec(
            # read feature values of features subscriber_type and duration_minutes from "bikes"
            entity_type_id="bikes", 
            feature_selector=feature_selector_pb2.FeatureSelector(
                id_matcher=feature_selector_pb2.IdMatcher(ids=[
                    # features, use "*" if you want to select all features within this entity type
                    "subscriber_type",  "duration_minutes"
                ]))),
        featurestore_service_pb2.BatchReadFeatureValuesRequest.EntityTypeSpec(
            # read feature values of features latitude and longitude from "stations"
            entity_type_id="stations",
            feature_selector=feature_selector_pb2.FeatureSelector(
                id_matcher=feature_selector_pb2.IdMatcher(
                    ids=["latitude", "longitude"])))
    ])

Step 3. Send the batch read request and wait for the result.

In [23]:
# Execute the batch read operation
serving_lro = admin_client.batch_read_feature_values(batch_serving_request)

In [24]:
# This long runing operation will poll until the batch read finishes.
serving_lro.result()



After the LRO finishes, you should be able to see the result from the [BigQuery console](https://console.cloud.google.com/bigquery), in the table created in Step 2. For more details, see the document [here](https://docs.google.com/document/d/10RMIfHHcQFTf3jmyck00g1YKbvZDBAHjSBbtF4k6Nv0/edit?usp=sharing).

## Cleaning Up
Delete the featurestore.

In [25]:
admin_client.delete_featurestore(
    request=featurestore_service_pb2.DeleteFeaturestoreRequest(
        name=admin_client.featurestore_path(PROJECT_ID, LOCATION, "featurestore_demo"),
        force=True)).result()



# Cross-project IAM permissions

By default, Feature Store has the IAM permissions to access source data in the
same project as the featurestore. It is recommended that you keep the default
IAM permissions established for your project. However, if the **source data is
in a different project from the featurestore**, you must manually grant
permission to the Feature Store service account in the source data project.

Specify your Google Cloud
[project number](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects)
where your featurestore is located:

In [None]:
# Specify your project number where the featurestore is located.
PROJECT_NUMBER = 0

# Specify the project ID where the source data is located.
SOURCE_DATA_PROJECT_ID = "SOURCE-DATA-PROJECT-ID"
FEATURE_STORE_SERVICE_ACCOUNT = "service-{}@gcp-sa-aiplatform.iam.gserviceaccount.com".format(PROJECT_NUMBER)

# Grant featurestore service account roles/aiplatform.serviceAgent in the source data project.
!gcloud projects add-iam-policy-binding {SOURCE_DATA_PROJECT_ID} --member=serviceAccount:{FEATURE_STORE_SERVICE_ACCOUNT} --role="roles/aiplatform.serviceAgent" --condition=None

Use the following command to verify that the service account has been
successfully granted the role `roles/aiplatform.serviceAgent`.

In [None]:
!gcloud projects get-iam-policy {SOURCE_DATA_PROJECT_ID} --flatten="bindings[].members" --format='table(bindings.role)' --filter="bindings.members:serviceAccount:{FEATURE_STORE_SERVICE_ACCOUNT}"

# Troubleshooting

*   After installing the Python SDK, you might see an error message containing
    "fail to load proto".

This is due to a mismatch between the protobuf compiler version and the Python
protobuf package version. Run the following commands:

```bash
pip uninstall protobuf python3-protobuf
pip install --upgrade pip

# find your protobuf compiler's version by running "protoc --version" in the terminal
pip install --upgrade protobuf==${YOUR_PROTOC_VERSION}
```

*   If you are unable to access Cloud Storage buckets, run `gcloud auth login`.