# 07 - Vertex AI > Features - Feature Store

This is a demonstration of [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/overview). A feature store is a central repository for organizing, storing, and retrieving features.  This is a fully managed service that scales the underlying compute and storage resources.  The feature store becomes a central location for serving features for training and prediction with low-latency. It stores feature values at points-in-time:

-  Point-in-time lookups for retrieving features for model training. Retrieve feature values prior to a prediction to prevent data leakage.
-  Manage training-serving skew

**Prerequisites:**

-  01 - BigQuery - Table Data Source
-  Any of 02-05 That Deploy A Model To An Endpoint
   -  Used to demonstrate online predictions with feature store serving features

**Overview:**

-  Create a Feature Store
-  Define an entity type
-  Define features for and entity type
   -  For this demonstration I use metadata from a BigQuery table to define features
      -  project.table.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
-  Search Features
   -  Using FeaturestoreServiceClient.search_features()
-  Import feature values
   -  Using a BigQuery table as the source data
-  Serve feature values
   -  For one entity_Id
   -  For multiple entity_id values
   -  Batch Feature request
-  Use online feature serving as input to online prediction with Vertex AI Endpoint

**Resources:**

-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
   -  Currently using the [v1beta1 services](https://googleapis.dev/python/aiplatform/latest/aiplatform_v1beta1/services.html)
-  [Feature Store Overview](https://cloud.google.com/vertex-ai/docs/featurestore/overview)
-  [Data Model and Concepts](https://cloud.google.com/vertex-ai/docs/featurestore/concepts)
-  [Best Practices](https://cloud.google.com/vertex-ai/docs/featurestore/best-practices) including info on composite entity types

**Related Training:**

-  todo

---
## Vertex AI - Conceptual Flow

<img src="architectures/slides/slide_35.png">

---
## Vertex AI - Workflow

<img src="architectures/slides/slide_36.png">

---
## Setup

inputs:

In [299]:
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
DATANAME = 'digits'
NOTEBOOK = '07'

ENTITYTYPE_ID = 'drawing'

packages:

In [173]:
from google.cloud.aiplatform_v1beta1 import (FeaturestoreOnlineServingServiceClient, FeaturestoreServiceClient, types)
from google.cloud import aiplatform

from google.protobuf.duration_pb2 import Duration
from google.protobuf.timestamp_pb2 import Timestamp
from google.protobuf.field_mask_pb2 import FieldMask

from google.cloud import bigquery
from google.cloud.aiplatform_v1beta1 import (PredictionServiceClient, EndpointServiceClient)
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [174]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

clients = {}
clients['fs'] = FeaturestoreServiceClient(client_options = client_options)
clients['fs_olserve'] = FeaturestoreOnlineServingServiceClient(client_options = client_options)

clients['bq'] = bigquery.Client()

aiplatform.init(project=PROJECT_ID, location=REGION)

parameters:

In [175]:
PARENT = f"projects/{PROJECT_ID}/locations/{REGION}"
DIR = f"temp/{NOTEBOOK}"

environment:

In [176]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Feature Store Data model
Feature Store organizes data with the following 3 important hierarchical concepts:

Featurestore -> EntityType -> Feature

- **Featurestore**: the place to store your features
    - **EntityType**: under a Featurestore, an EntityType describes an object to be modeled, real one or virtual one.
        - **Feature**: under an EntityType, a feature describes an attribute of the EntityType

For the digits data used in these examples, the feature store will be called digits_featurestore.  The store has 1 entity type: images.  The features will be the pixels and the target values.

---
## Create Feature Store

In [10]:
FEATURESTORE_ID = DATANAME

In [11]:
featurestore_lro = clients['fs'].create_featurestore(
    types.featurestore_service.CreateFeaturestoreRequest(
        parent = PARENT,
        featurestore_id = FEATURESTORE_ID,
        featurestore=types.featurestore.Featurestore(
            display_name=f"Notebook {NOTEBOOK} demonstration of Vertex AI Features (feature store) using {DATANAME} data",
            online_serving_config=types.featurestore.Featurestore.OnlineServingConfig(
                fixed_node_count=2
            ),
        ),
    )
)

In [12]:
featurestore_lro.result()

name: "projects/691911073727/locations/us-central1/featurestores/digits"

Use `get_featurestore` to see details of specified feature store:

In [13]:
clients['fs'].get_featurestore(name=clients['fs'].featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID))

name: "projects/691911073727/locations/us-central1/featurestores/digits"
create_time {
  seconds: 1631808488
  nanos: 193234000
}
update_time {
  seconds: 1631808488
  nanos: 502681000
}
etag: "AMEw9yNDxF_abALl9D9yDmEc9TOeOoFaCwKQEoEh-Zlm_2UNiiZXa2FG65VrSk8aDEqh"
online_serving_config {
  fixed_node_count: 2
}
state: STABLE

Use `list_featurestores` to see details of all feature stores:

In [14]:
clients['fs'].list_featurestores(parent=PARENT)

ListFeaturestoresPager<featurestores {
  name: "projects/691911073727/locations/us-central1/featurestores/digits"
  create_time {
    seconds: 1631808488
    nanos: 193234000
  }
  update_time {
    seconds: 1631808488
    nanos: 502681000
  }
  etag: "AMEw9yNz4KutuBpX5iothDdrViHBUReWh-ud3YLPn5TIz4VdW-nzDTex7NhAM_JDnmKW"
  online_serving_config {
    fixed_node_count: 2
  }
  state: STABLE
}
>

---
## Create Entity Type

In [17]:
entitytype_lro = clients['fs'].create_entity_type(
    types.featurestore_service.CreateEntityTypeRequest(
        parent=clients['fs'].featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
        entity_type_id = ENTITYTYPE_ID,
        entity_type=types.entity_type.EntityType(
            description=f"Entity: {ENTITYTYPE_ID}, for data: {DATANAME}",
            monitoring_config=types.featurestore_monitoring.FeaturestoreMonitoringConfig(
                snapshot_analysis=types.featurestore_monitoring.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                    monitoring_interval=Duration(seconds=900),  # 15 minutes
                ),
            ),
        ),
    )
)

In [18]:
entitytype_lro.result()

name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing"
etag: "AMEw9yOEUHP5x4BBKSDXBEpYikMafo8LHHBzMT0Gnm8EnDQCFqyX"

Use `list_entity_types` to see details of all entity types:

In [19]:
clients['fs'].list_entity_types(parent = f"{PARENT}/featurestores/{FEATURESTORE_ID}")

ListEntityTypesPager<entity_types {
  name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing"
  description: "Entity drawing for  data digits"
  create_time {
    seconds: 1631812110
    nanos: 831664000
  }
  update_time {
    seconds: 1631812110
    nanos: 831664000
  }
  etag: "AMEw9yNKJa4P5Pkfgf-N5epF4pLTpAL61bICdz8w1PXmG5D7PhmJfsdmmwdUH-9o_Rip"
  monitoring_config {
    snapshot_analysis {
      monitoring_interval {
        seconds: 86400
      }
    }
  }
}
>

---
## Create Features

Get the schema of the data source for new features:

In [177]:
schema = clients['bq'].query(query = f"SELECT * FROM {DATANAME}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{DATANAME}'").to_dataframe()

In [178]:
schema

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type,is_generated,generation_expression,is_stored,is_hidden,is_updatable,is_system_defined,is_partitioning_column,clustering_ordinal_position
0,statmike-mlops,digits,digits,p0,1,YES,FLOAT64,NEVER,,,NO,,NO,NO,
1,statmike-mlops,digits,digits,p1,2,YES,FLOAT64,NEVER,,,NO,,NO,NO,
2,statmike-mlops,digits,digits,p2,3,YES,FLOAT64,NEVER,,,NO,,NO,NO,
3,statmike-mlops,digits,digits,p3,4,YES,FLOAT64,NEVER,,,NO,,NO,NO,
4,statmike-mlops,digits,digits,p4,5,YES,FLOAT64,NEVER,,,NO,,NO,NO,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,statmike-mlops,digits,digits,p61,62,YES,FLOAT64,NEVER,,,NO,,NO,NO,
62,statmike-mlops,digits,digits,p62,63,YES,FLOAT64,NEVER,,,NO,,NO,NO,
63,statmike-mlops,digits,digits,p63,64,YES,FLOAT64,NEVER,,,NO,,NO,NO,
64,statmike-mlops,digits,digits,target,65,YES,INT64,NEVER,,,NO,,NO,NO,


Prepare a request for `batch_create_features`:
- specification for the features, data type and descriptions ....

In [23]:
REQUESTS = []
for i in range(schema.shape[0]):
    
    if schema['data_type'][i] == 'STRING': value_type = types.feature.Feature.ValueType.STRING
    elif schema['data_type'][i] == 'INT64': value_type = types.feature.Feature.ValueType.INT64
    elif schema['data_type'][i] == 'FLOAT64': value_type = types.feature.Feature.ValueType.DOUBLE
    
    description = f"Column named {schema['column_name'][i]} from BQ Table {PROJECT_ID}.{DATANAME}.{DATANAME}"
    
    REQUESTS.append(
        types.featurestore_service.CreateFeatureRequest(
            feature=types.feature.Feature(
                value_type = value_type,
                description = description,
                # optional, monitoring_config here as override, otherwise it inherits from entity_type
            ),
            feature_id = schema['column_name'][i].lower(),
        )    
    )

In [24]:
batchfeatures = clients['fs'].batch_create_features(
    parent = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    requests = REQUESTS,
)

In [26]:
#list(item.name for item in batchfeatures.result().features)

---
## Search Features
Search goes across all Feature Stores and Entity Types.

Also, use the list_features function to list all.

In [27]:
# return the first feature:
list(clients['fs'].search_features(location = PARENT))[0]

name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing/features/p0"
description: "Column named p0 from BQ Table statmike-mlops.digits.digits"
create_time {
  seconds: 1631812646
  nanos: 231977000
}
update_time {
  seconds: 1631812646
  nanos: 231977000
}

In [28]:
# find all features with INT64 value type
list(clients['fs'].search_features(types.featurestore_service.SearchFeaturesRequest(location = PARENT, query = "value_type=INT64")))

[name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing/features/target"
 description: "Column named target from BQ Table statmike-mlops.digits.digits"
 create_time {
   seconds: 1631812646
   nanos: 328113000
 }
 update_time {
   seconds: 1631812646
   nanos: 328113000
 }]

In [30]:
# find all features of the form p*9 with DOUBLE value type
list(clients['fs'].search_features(types.featurestore_service.SearchFeaturesRequest(location = PARENT, query = "feature_id:p*9 AND value_type=DOUBLE")))

[name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing/features/p19"
 description: "Column named p19 from BQ Table statmike-mlops.digits.digits"
 create_time {
   seconds: 1631812646
   nanos: 256765000
 }
 update_time {
   seconds: 1631812646
   nanos: 256765000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing/features/p29"
 description: "Column named p29 from BQ Table statmike-mlops.digits.digits"
 create_time {
   seconds: 1631812646
   nanos: 269066000
 }
 update_time {
   seconds: 1631812646
   nanos: 269066000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTypes/drawing/features/p39"
 description: "Column named p39 from BQ Table statmike-mlops.digits.digits"
 create_time {
   seconds: 1631812646
   nanos: 284866000
 }
 update_time {
   seconds: 1631812646
   nanos: 284866000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits/entityTyp

---
## Import Feature Values
- BigQuery (THIS DEMO)
- Avro
- CSV

Prepare a source table with timestamp (update_time) and unique id's for each entity

In [170]:
query = f"""
CREATE OR REPLACE TABLE {PROJECT_ID}.{DATANAME}.{DATANAME}_featurestore_import AS
WITH A AS 
    (SELECT GENERATE_UUID() unique_id, 
            target_OE as target_oe, 
            CAST(FLOOR(10*RAND()) AS INT64) daytrick,
            * EXCEPT(target_OE)
    FROM {PROJECT_ID}.{DATANAME}.{DATANAME})
SELECT * EXCEPT(daytrick),
        DATE_SUB(CURRENT_TIMESTAMP, INTERVAL daytrick DAY) AS update_time
FROM A
"""
bqjob = clients['bq'].query(query = query)

In [171]:
bqjob.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f5fde797590>

Create Feature specification for each feature in the input source:

In [312]:
FEATURE_SPECS = []
for i in range(schema.shape[0]):
    FEATURE_SPECS.append(types.featurestore_service.ImportFeatureValuesRequest.FeatureSpec(id = schema['column_name'][i].lower()))

In [313]:
import_request = types.featurestore_service.ImportFeatureValuesRequest(
    entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    bigquery_source = types.BigQuerySource(input_uri = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_featurestore_import'),
    feature_time_field = "update_time",
    feature_time = Timestamp().GetCurrentTime(),
    entity_id_field = "unique_id",
    feature_specs = FEATURE_SPECS,
    worker_count = 4,
)

In [314]:
importjob = clients['fs'].import_feature_values(import_request)

In [315]:
importjob.result()

imported_entity_count: 1797
imported_feature_value_count: 118602

---
## Prediction with Feature Store for Serving Features

### Entity Id's
Retrieve a list of entity id's from the source BigQuery table.  These are in the column `unique_id`.

In [316]:
unique_id = clients['bq'].query(query = f"SELECT unique_id FROM {PROJECT_ID}.{DATANAME}.{DATANAME}_featurestore_import ORDER BY update_time DESC LIMIT 5").to_dataframe()

In [317]:
unique_id['unique_id'][0]

'65c77e9a-79e0-4468-b2aa-19ae55e23b24'

### Data For Prediction: Single Entity Served by Vertex AI > Features (Feature Store)

In [318]:
feature_values = clients['fs_olserve'].read_feature_values(
    types.featurestore_online_service.ReadFeatureValuesRequest(
        entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
        entity_id = unique_id['unique_id'][0],
        feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids=['*'])),
    )
)

In [319]:
print(list(item.id for item in feature_values.header.feature_descriptors))

['p15', 'p44', 'p42', 'p24', 'p14', 'p49', 'p60', 'p57', 'p62', 'p59', 'p33', 'p16', 'p19', 'p25', 'p8', 'p52', 'p58', 'p13', 'p36', 'p9', 'p63', 'p34', 'p17', 'p35', 'p61', 'p6', 'p46', 'p10', 'target_oe', 'p32', 'p37', 'p40', 'p12', 'p43', 'p41', 'p50', 'p11', 'p53', 'p27', 'p23', 'p56', 'p54', 'p51', 'p2', 'p1', 'p20', 'p7', 'p5', 'p3', 'p31', 'p30', 'p48', 'p18', 'p21', 'p39', 'p22', 'p55', 'p4', 'target', 'p26', 'p0', 'p38', 'p29', 'p28', 'p47', 'p45']


In [320]:
print(list(item.value.double_value for item in feature_values.entity_view.data))

[0.0, 3.0, 13.0, 0.0, 0.0, 0.0, 14.0, 0.0, 0.0, 10.0, 7.0, 0.0, 3.0, 4.0, 0.0, 14.0, 0.0, 15.0, 0.0, 0.0, 0.0, 10.0, 2.0, 0.0, 4.0, 0.0, 2.0, 15.0, 0.0, 0.0, 9.0, 0.0, 13.0, 5.0, 1.0, 7.0, 15.0, 15.0, 0.0, 0.0, 0.0, 0.0, 16.0, 2.0, 0.0, 1.0, 0.0, 3.0, 12.0, 0.0, 8.0, 0.0, 14.0, 12.0, 0.0, 3.0, 0.0, 15.0, 0.0, 8.0, 0.0, 5.0, 8.0, 0.0, 0.0, 15.0]


### Prepare Prediction Request

In [321]:
newob = {}
features = list(item.id for item in feature_values.header.feature_descriptors)
for e, f in enumerate(features):
    if f.startswith('p'):
        newob[f] = feature_values.entity_view.data[e].value.double_value

In [322]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Pick An Endpoint
A list index of [0] here retrieves the first endpoint in this project:

In [323]:
aiplatform.Endpoint.list()[0].display_name

'04a_digits_20210916111550'

In [324]:
endpoint = aiplatform.Endpoint(endpoint_name = aiplatform.Endpoint.list()[0].name)

### Get Predictions: Python Client

In [325]:
prediction = endpoint.predict(instances = instances, parameters = parameters)

In [326]:
prediction

Prediction(predictions=[[0.999894142, 2.71968884e-12, 9.62739359e-05, 1.48131505e-07, 3.60981254e-08, 1.13780447e-06, 6.86076929e-09, 7.84659619e-07, 7.17200373e-06, 2.8574371e-07]], deployed_model_id='1563391185488183296', explanations=None)

In [327]:
prediction.predictions[0]

[0.999894142,
 2.71968884e-12,
 9.62739359e-05,
 1.48131505e-07,
 3.60981254e-08,
 1.13780447e-06,
 6.86076929e-09,
 7.84659619e-07,
 7.17200373e-06,
 2.8574371e-07]

In [328]:
np.argmax(prediction.predictions[0])

0

### Get Predictions: REST

In [329]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [330]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    [
      0.999894142,
      2.71968884e-12,
      9.62739359e-05,
      1.48131505e-07,
      3.60981254e-08,
      1.13780447e-06,
      6.86076929e-09,
      7.84659619e-07,
      7.17200373e-06,
      2.8574371e-07
    ]
  ],
  "deployedModelId": "1563391185488183296"
}


### Get Predictions: gcloud (CLI)

In [331]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[[0.999894142, 2.71968884e-12, 9.62739359e-05, 1.48131505e-07, 3.60981254e-08, 1.13780447e-06, 6.86076929e-09, 7.84659619e-07, 7.17200373e-06, 2.8574371e-07]]


### Data For Prediction: Multiple Entities Served by Vertex AI > Features (Feature Store)

In [332]:
unique_id['unique_id']

0    65c77e9a-79e0-4468-b2aa-19ae55e23b24
1    5f605984-3aa9-4bc7-884c-75a24336c627
2    2ea71d24-e815-4089-9276-409f3f5ee7e9
3    f9ddabe0-700a-4160-bf73-87fc79763234
4    873fb1f6-3c0c-4d00-8d91-6255854e67dd
Name: unique_id, dtype: object

In [333]:
multi_feature_values = clients['fs_olserve'].streaming_read_feature_values(
    types.featurestore_online_service.StreamingReadFeatureValuesRequest(
        entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
        entity_ids = unique_id['unique_id'],
        feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids=['*'])),
    )
)

In [334]:
for i in multi_feature_values:
    print(i.entity_view.entity_id)
    print(list(item.value.double_value for item in i.entity_view.data))


[]
2ea71d24-e815-4089-9276-409f3f5ee7e9
[16.0, 0.0, 0.0, 6.0, 16.0, 0.0, 0.0, 1.0, 0.0, 0.0, 11.0, 1.0, 1.0, 15.0, 15.0, 12.0, 11.0, 12.0, 7.0, 0.0, 0.0, 0.0, 6.0, 6.0, 8.0, 0.0, 0.0, 15.0, 0.0, 15.0, 16.0, 0.0, 4.0, 14.0, 0.0, 16.0, 0.0, 0.0, 0.0, 5.0, 0.0, 5.0, 0.0, 9.0, 0.0, 8.0, 16.0, 16.0, 4.0, 0.0, 0.0, 0.0, 0.0, 16.0, 0.0, 0.0, 5.0, 0.0, 0.0, 8.0, 15.0, 0.0, 0.0, 0.0, 9.0, 1.0]
5f605984-3aa9-4bc7-884c-75a24336c627
[12.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 5.0, 15.0, 14.0, 12.0, 15.0, 11.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 4.0, 0.0, 16.0, 0.0, 16.0, 11.0, 0.0, 15.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 16.0, 0.0, 0.0, 15.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 0.0, 0.0, 12.0, 0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 0.0, 0.0, 0.0, 0.0, 0.0, 16.0]
65c77e9a-79e0-4468-b2aa-19ae55e23b24
[10.0, 0.0, 0.0, 4.0, 16.0, 0.0, 0.0, 8.0, 0.0, 0.0, 0.0, 1.0, 3.0, 12.0, 15.0, 8.0, 9.0, 0.0, 10.0, 0.0, 0.0, 0.0, 2.0, 5.0, 15.0, 0.0, 0.0, 13.0, 0.0, 15.0, 13.0, 0.0, 3.0, 14.0, 0.0, 8.

### Data For Training: Batch (For training or large scale prediction)

In [155]:
# get current timestamp (protobuf3 is seconds since ephoch (1970))
timestamp = Timestamp()
timestamp.GetCurrentTime()

# adjust timestamp to 2 days ago: 60*60*24*4
newtimestamp = Timestamp(seconds = timestamp.seconds - 60*60*24*2, nanos = timestamp.nanos)

batch_request = types.featurestore_service.ExportFeatureValuesRequest(
    entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    snapshot_export = types.ExportFeatureValuesRequest.SnapshotExport(snapshot_time = Timestamp(seconds=newtimestamp.seconds)),
    destination = types.FeatureValueDestination(bigquery_destination = types.BigQueryDestination(output_uri = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_fs_training')),
    feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids = ['*']))
)

In [156]:
batchjob = clients['fs'].export_feature_values(batch_request)

In [157]:
batchjob.result()



By Adjusting the `snapshot_time` to 2 days ago, the batch_request creates a BigQuery table that has all the orginal rows, 1 per entity, but the features are null for 20% of the rows.  This is because the features were loaded with `feature_time_field = "update_time"` and `update_time` was set to a random day between today and 10 days ago.

In [335]:
query = f"""
SELECT CASE WHEN {list(newob.keys())[0]} is not null then False ELSE True END as Null_Rows, count(*) as counts
FROM {PROJECT_ID}.{DATANAME}.{DATANAME}_fs_training
GROUP BY Null_Rows
"""
clients['bq'].query(query = query).to_dataframe()

Unnamed: 0,Null_Rows,counts
0,True,364
1,False,1433


---
## Remove Resources
see notebook "XX - Cleanup"