In [None]:
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 01. Feature Management: Vertex AI Feature Store and BigQuery

This is the first Notebook in a series of Notebooks that builds an end-to-end ML workflow using Vertex AI. In this series we will show you you can start building a MLOps setup on Google Cloud.  

### Dataset
1. Download sample dataset from [Kaggle](https://www.kaggle.com/ashishkumarsingh123/telecom-churn-dataset).
2. [Create a new dataset](https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-gcs-csv) in your GCP BQ instance, and upload the downloaded data to BQ as a new table. e.g.: `bq://sample-project.ml_sample.telecom_churn`.

### Objective
This tutorial uses the following Vertex AI and Data Analytics services and resources:

- Vertex AI SDK
- Vertex AI Feature Store
- Vertex Workbench
- BigQuery

The steps performed include:

- Create a feature store. 
- Define a schema for your features.
- Ingest data from BigQuery into your feature store.
- Search for features. 

### Set up your local development environment

**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

1. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3. Activate the virtual environment.

1. To install Jupyter, run `pip3 install jupyter` on the
command-line in a terminal shell.

1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

Install the following packages required to execute this notebook. 

In [15]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"
    
#!pip3 install {USER_FLAG} google-cloud-aiplatform --upgrade

---
### Restart the kernel

**Only after** you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [1]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Check the versions of the packages you installed.  The KFP SDK version should be >=1.6.

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI, Cloud Storage, and Compute Engine APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,storage-component.googleapis.com). 

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`. 

In [16]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

Otherwise, set your project ID here.

In [17]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

Project ID: erwinh-demo-joonix


In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rREGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"est of this notebook.  Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [19]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"

#### UUID

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.

In [20]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Vertex AI Workbench Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [13]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When fetching a batch of training data from our feature store we need to create a list of features that we want to extect based on an id. This list will be store on Google Cloud Storage.

Set the name of your Cloud Storage bucket below. It must be unique across all
Cloud Storage buckets.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = PROJECT_ID + "aip-" + UUID
    BUCKET_URI = f"gs://{BUCKET_NAME}"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Import libraries

In [22]:
import uuid
import os
import pandas as pd
import google.auth

import google.cloud.aiplatform as vertex
from google.cloud import bigquery
from google.cloud import bigquery_storage

from google.cloud.aiplatform_v1beta1 import (
    FeaturestoreOnlineServingServiceClient, FeaturestoreServiceClient)
from google.cloud.aiplatform_v1beta1.types import FeatureSelector, IdMatcher
from google.cloud.aiplatform_v1beta1.types import \
    entity_type as entity_type_pb2
from google.cloud.aiplatform_v1beta1.types import feature as feature_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore as featurestore_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_monitoring as featurestore_monitoring_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_online_service as featurestore_online_service_pb2
from google.cloud.aiplatform_v1beta1.types import \
    featurestore_service as featurestore_service_pb2
from google.cloud.aiplatform_v1beta1.types import io as io_pb2
from google.protobuf.duration_pb2 import Duration

In [6]:
API_ENDPOINT = "us-central1-aiplatform.googleapis.com"

### Create Feature store

Now it's time to create a [feature store](https://cloud.google.com/vertex-ai/docs/featurestore). The method to create a featurestore returns a long-running operation (LRO). An LRO starts an asynchronous job. LROs are returned for other API methods too, such as updating or deleting a featurestore. Calling create_lro.result() waits for the LRO to complete.

In [23]:
# Create admin_client for CRUD and data_client for reading feature values.
admin_client = FeaturestoreServiceClient(client_options={"api_endpoint": API_ENDPOINT})
data_client = FeaturestoreOnlineServingServiceClient(
    client_options={"api_endpoint": API_ENDPOINT} )

In [24]:
# Represents featurestore resource path.
BASE_RESOURCE_PATH = admin_client.common_location_path(PROJECT_ID, REGION)
FEATURESTORE_ID_TO_CREATE = "telecom_churn_{timestamp}".format(timestamp=TIMESTAMP)

create_lro = admin_client.create_featurestore(
    featurestore_service_pb2.CreateFeaturestoreRequest(
        parent=BASE_RESOURCE_PATH,
        featurestore_id=FEATURESTORE_ID_TO_CREATE,
        featurestore=featurestore_pb2.Featurestore(
            #display_name="Featurestore for telco churn prediction",
            online_serving_config=featurestore_pb2.Featurestore.OnlineServingConfig(
                fixed_node_count=1 # we do have the option to auto-scale. 
            ),
        ),
    )
)
# Wait for LRO to finish and get the LRO result.
print(f"your feature store{create_lro.result()}")

your feature store: name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551"



You can use GetFeaturestore or ListFeaturestores to check if the Featurestore was successfully created. The following example gets the details of the Featurestore. You can also navigate to the [Google Cloud Console](https://console.cloud.google.com/vertex-ai/features) to see if the feature store was created successfully.

In [25]:
admin_client.get_featurestore(
    name=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID_TO_CREATE)
)

name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551"
create_time {
  seconds: 1661406610
  nanos: 841574000
}
update_time {
  seconds: 1661406639
  nanos: 9126000
}
etag: "AMEw9yO0DLnWc66ACqoSh_DbVsCazBo817pjgj2Ew2dhAAG94XTv957KVmBcLoKmohc="
online_serving_config {
  fixed_node_count: 1
}
state: STABLE

#### Create Entity Type
Next we will create an entitiy type `users`. You can also specify a monitoring config which will by default be inherited by all Features under this EntityType. In this case we will also enable montiring and set it to once a day. 

In [26]:
# Create users entity type with monitoring enabled.
# All Features belonging to this EntityType will by default inherit the monitoring config.
users_entity_type_lro = admin_client.create_entity_type(
    featurestore_service_pb2.CreateEntityTypeRequest(
        parent=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID_TO_CREATE),
        entity_type_id="users",
        entity_type=entity_type_pb2.EntityType(
            description="Users entity",
            monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                    monitoring_interval=Duration(seconds=86400),  # 1 day
                ),
            ),
        ),
    )
)

# Similarly, wait for EntityType creation operation to finish.
print(users_entity_type_lro.result())

name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users"



#### Create features for the 'users' entity
Now we will create features (placeholder) for the intity `users`. 

In [27]:
# Create features for the 'users' entity.
admin_client.batch_create_features(
    parent=admin_client.entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID_TO_CREATE, "users"),
    requests=[
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.INT64,
                description="mobile_number",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="mobile_number",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="average revenue per user on first month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="arpu_m1",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="average revenue per user on second month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="arpu_m2",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="average revenue per user on third month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="arpu_m3",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="average revenue per user on forth month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="arpu_m4",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="Minutes of usage - voice calls on first month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="mou_m1",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="Minutes of usage - voice calls month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="mou_m2",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.DOUBLE,
                description="Minutes of usage - voice calls on third month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="mou_m3",
        ),
        featurestore_service_pb2.CreateFeatureRequest(
            feature=feature_pb2.Feature(
                value_type=feature_pb2.Feature.ValueType.BOOL,
                description="if the user churn on the forth month. Judged by the spend > 0 on forth month",
                monitoring_config=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig(
                    snapshot_analysis=featurestore_monitoring_pb2.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                        disabled=False,
                    ),
                ),
            ),
            feature_id="is_churn",
        ),
    ],
).result()

features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/mobile_number"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m1"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m2"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m3"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m4"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/mou_m1"
}
features {
  name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/

#### Search created features
While the [ListFeatures](https://cloud.google.com/vertex-ai/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.ListFeatures) method allows you to easily view all features of a single entity type, the [SearchFeatures](https://cloud.google.com/vertex-ai/docs/reference/rpc/google.cloud.aiplatform.v1beta1#google.cloud.aiplatform.v1beta1.FeaturestoreService.SearchFeatures) method searches across all featurestores and entity types in a given location (such as us-central1). This can help you discover features that were created by someone else.

You can query based on feature properties including feature ID, entity type ID, and feature description. You can also limit results by filtering on a specific featurestore, feature value type, and/or labels.

In [33]:
# Filter on feature value type and keywords.
list(
    admin_client.search_features(
        featurestore_service_pb2.SearchFeaturesRequest(
            location=BASE_RESOURCE_PATH, query="value_type=INT64"
        )
    )
)

[name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/mobile_number"
 description: "mobile_number"
 value_type: INT64
 create_time {
   seconds: 1661406930
   nanos: 322020000
 }
 update_time {
   seconds: 1661406930
   nanos: 322020000
 }]

### Data ingestion
Now it's time for us to ingest the data into the feature store that we just created. 

In [42]:
BQ_RAW_DATA = "bq://erwinh-demo-joonix.ml_sample.telecom_churn" # --> @param {type: "string"} Change this to the BQ table that you created earlier. 
#FEATURE_DESTINATION = f"bq://{PROJECT_ID}.ml_sample.import_features" # --> Change this. Set it to something like bq://<your-project>.churn.import_features

FEATURE_DATASET = f"bq://{PROJECT_ID}.ml_sample.features"

FEATURE_DESTINATION

'bq://erwinh-demo-joonix.ml_sample.features'

Since we are not using all of the data from our initial dataset we need to create a query and ingest the data into a new table. This table we will use to ingest data into our Feature Store. 

In [39]:
client = bigquery.Client(PROJECT_ID)
job_config = bigquery.QueryJobConfig(destination=FEATURE_DATASET.split('/')[-1])

sql = """
    SELECT cast(mobile_number as string) mobile_number,arpu_6,arpu_7,arpu_8,arpu_9<=0 as is_churn,onnet_mou_6,onnet_mou_7,onnet_mou_8,CURRENT_TIMESTAMP() as update_time
    FROM `{}`;
""".format(BQ_RAW_DATA.split('/')[-1])

query_job = client.query(sql, job_config=job_config)  # Make an API request.
query_job.result()  

<google.cloud.bigquery.table.RowIterator at 0x7f2d6506bd50>

#### Ingest data into the feature store
First we will create an `admin_client` for CRUD and data client reading feature values. After that we will ingest the data. We will use `mobile_number` as our entity id. 

In [40]:
# Create admin_client for CRUD and data_client for reading feature values.
admin_client = FeaturestoreServiceClient(client_options={"api_endpoint": API_ENDPOINT})
data_client = FeaturestoreOnlineServingServiceClient(
    client_options={"api_endpoint": API_ENDPOINT}
)

BASE_RESOURCE_PATH = admin_client.common_location_path(PROJECT_ID, REGION)

Now it's time to ingest the features. This might take some time. Please do not continue until this job finishes. 

In [43]:
# Represents featurestore resource path.
import_users_request = featurestore_service_pb2.ImportFeatureValuesRequest(
    entity_type=admin_client.entity_type_path(
        PROJECT_ID, REGION, FEATURESTORE_ID_TO_CREATE, "users"
    ),
    bigquery_source=io_pb2.BigQuerySource(
        # Source
        input_uri=FEATURE_DATASET
    ),
    entity_id_field="mobile_number",
    feature_specs=[
        # Features
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="arpu_m1", source_field="arpu_6"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="arpu_m2", source_field="arpu_7"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="arpu_m3", source_field="arpu_8"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="is_churn", source_field="is_churn"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="mou_m1", source_field="onnet_mou_6"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="mou_m2", source_field="onnet_mou_7"),
        featurestore_service_pb2.ImportFeatureValuesRequest.FeatureSpec(id="mou_m3", source_field="onnet_mou_8"),
    ],
    feature_time_field="update_time",
    worker_count=10,
)
ingestion_lro = admin_client.import_feature_values(import_users_request)
ingestion_lro.result()

imported_entity_count: 99999
imported_feature_value_count: 686819

In [44]:
# Search for all features across all featurestores.
list(admin_client.search_features(location=BASE_RESOURCE_PATH))

[name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m1"
 description: "average revenue per user on first month"
 value_type: DOUBLE
 create_time {
   seconds: 1661406930
   nanos: 323306000
 }
 update_time {
   seconds: 1661406930
   nanos: 323306000
 },
 name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m2"
 description: "average revenue per user on second month"
 value_type: DOUBLE
 create_time {
   seconds: 1661406930
   nanos: 324542000
 }
 update_time {
   seconds: 1661406930
   nanos: 324542000
 },
 name: "projects/429963084013/locations/us-central1/featurestores/telecom_churn_20220825053551/entityTypes/users/features/arpu_m3"
 description: "average revenue per user on third month"
 value_type: DOUBLE
 create_time {
   seconds: 1661406930
   nanos: 325645000
 }
 update_time {
   seconds: 1661406930
   nanos: 325645000
 },
 name: "p

### Extract data from the Feature Store
The next step is for us to create a dataset in BigQuery that we can use for training. To read data from the feature store we need a `read-instance list` that contains information for each training example. It lists observations at a particular point in time. This can be either a CSV file or a BigQuery table. The list must include the following information:
* Timestamps: the times at which labels were observed or measured. The timestamps are required so that Vertex AI Feature Store can perform a point-in-time lookup.
* Entity IDs: one or more IDs of the entities that correspond to the label.

In [45]:
bqclient = bigquery.Client()
bqstorageclient = bigquery_storage.BigQueryReadClient()
query_string = """
SELECT
    mobile_number
FROM `{}`
""".format(BQ_RAW_DATA.split('/')[-1])

user_df = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(bqstorage_client=bqstorageclient)
)

X_train = user_df["mobile_number"]

In [50]:
TRAINING_DATA_TABLE = f"{PROJECT_ID}.ml_sample.test" # @param {type:"string"}
FEATURESTORE_ID = "" # @param {type:"string"} fill in or leave empty if you just created the feature store
if FEATURESTORE_ID == "":
    FEATURESTORE_ID = FEATURESTORE_ID_TO_CREATE
TRAINING_DATA_SELECTOR_LOC = BUCKET_NAME + "/dataset/query_instance_2.csv"

In [51]:
now = datetime.now()
current_time = now.strftime("%Y-%m-%dT%H:%M:%SZ")
res = pd.DataFrame()
res['users']  = X_train
res['timestamp'] = current_time
res.to_csv(TRAINING_DATA_SELECTOR_LOC, index=False)
admin_client = FeaturestoreServiceClient(client_options={"api_endpoint": API_ENDPOINT})
batch_serving_request = featurestore_service_pb2.BatchReadFeatureValuesRequest(
    # featurestore info
    featurestore=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
    # URL for the label data, i.e., Table 1.
    csv_read_instances=io_pb2.CsvSource(
        gcs_source=io_pb2.GcsSource(uris=[TRAINING_DATA_SELECTOR_LOC])
    ),
    destination=featurestore_service_pb2.FeatureValueDestination(
        bigquery_destination=io_pb2.BigQueryDestination(
            # Output to BigQuery table created earlier
            output_uri='bq://'+TRAINING_DATA_TABLE
        )
    ),
    entity_type_specs=[
        featurestore_service_pb2.BatchReadFeatureValuesRequest.EntityTypeSpec(
            entity_type_id="users",
            feature_selector=FeatureSelector(
                id_matcher=IdMatcher(
                    ids=[
                        # features, use "*" if you want to select all features within this entity type
                        "mou_m1",
                        "mou_m2",
                        "mou_m3",
                        "arpu_m1",
                        "arpu_m2",
                        "arpu_m3",
                        "is_churn"
                    ]
                )
            ),
        ),
    ],
)
batch_serving_lro = admin_client.batch_read_feature_values(batch_serving_request)
batch_serving_lro.result()



## Cleaning up

**Please do not clean up the resources if you want to run the next notebook**

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

* [Delete GCS bucket](https://cloud.google.com/storage/docs/deleting-buckets)

And delete your Feature Store:

In [14]:
admin_client.delete_featurestore(
    request=featurestore_service_pb2.DeleteFeaturestoreRequest(
        name=admin_client.featurestore_path(PROJECT_ID, REGION, FEATURE_STORE),
        force=True,
    )
).result()

