# Tutorial #6: Leverage Online-On-The-Fly Feature Retrieval

1. Run this notebook in a Spark environment
1. Configure feature set for online on-the-fly calculation 
1. Test offline-scenario


# Prerequisites
1. Please ensure you have executed `5. Enable online store and run online inference` tutorial
1. Ensure your feature store is configured with a Redis store

# Note

In this private preview, a redis configuration is required even though it is unused.

Currently, all requested features must be either materialized or calculated on-the-fly. A future release will enable mix-and-match features from redis/on-the-fly at the feature set level

## Configure the Azure Machine Learning spark notebook
In the "Compute" dropdown in the top nav, select "Configure session".

Configure session:

1. Select "configure session" in the bottom nav
1. Select upload conda file <BR>
  a. Select file `featurestore-sample/project/env/conda.yml` from your local device <BR>
  b. (Optional) Increase the session time-out (idle time) to avoid frequent prerequisite reruns

## Setup root directory for the samples

In [1]:
import os
# please update the dir to ./Users/{your-alias} (or any custom directory you uploaded the samples to).
# You can find the name from the directory structure inm the left nav
root_dir = "./Users/ruizh/featurestore_sample"

if os.path.isdir(root_dir):
    print("The folder exists.")
else:
    print("The folder does not exist. Please create or fix the path")

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 6, Finished, Available)

The folder exists.


## Enable Private Preview Flag for using Online Store

In [2]:
# Turn on the private preview flag

import os
os.environ["AZURE_ML_CLI_PRIVATE_FEATURES_ENABLED"] = "True"

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 7, Finished, Available)

## Initialize the project workspace CRUD client
This is the current workspace where you will be running the tutorial notebook from

In [3]:
import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

project_ws_sub_id = "1aefdc5e-3a7c-4d71-a9f9-f5d3b03be19a"
project_ws_rg = "ruizhMiscRG"
project_ws_name = "ruizhTest"

#connect to the project workspace
ws_client = MLClient(AzureMLOnBehalfOfCredential(), project_ws_sub_id, project_ws_rg, project_ws_name)

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 8, Finished, Available)

## Initialize the CRUD client of the feature store 

In [4]:
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

# feature store
featurestore_name = "ruizh-fs-test-1-8-release-3" # use the same name from part #1 of the tutorial
featurestore_subscription_id = "1aefdc5e-3a7c-4d71-a9f9-f5d3b03be19a"
featurestore_resource_group_name = "ruizhmiscrg"

# feature store ml client
fs_client = MLClient(AzureMLOnBehalfOfCredential(), featurestore_subscription_id, featurestore_resource_group_name, featurestore_name)

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 9, Finished, Available)

## Initialize the code FeatureStore client

In [5]:
from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

featurestore = FeatureStoreClient(
    credential = AzureMLOnBehalfOfCredential(), 
    subscription_id = featurestore_subscription_id, 
    resource_group_name = "ruizhmiscrg", 
    name = "ruizh-fs-test-1-8-release-3"
)

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 10, Finished, Available)

Class FeatureStoreClient: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Method feature_stores: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
_AzureMLSparkOnBehalfOfCredential.get_token succeeded
Class MaterializationStore: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


# Step 2: Register `transactions` feature-set with on-the-fly transformer

In the previous notebooks, we materialized data of the `transactions` feature-set to the offline materilization store. In this step we will perform:

1. Create a new version of the `transactions` feature-set that leverages a pandas-based transformer

In [42]:
from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

transactions_featureset_spec_folder = root_dir  + "/featurestore/featuresets/transactions_custom_source"

transaction_fset_config = FeatureSet(
    name = "transactions",
    version = "ondemand",
    description = "7-day and 3-day rolling aggregation of transactions featureset",
    entities = ["azureml:account:1"],
    stage = "Development",
    specification = FeatureSetSpecification(path = transactions_featureset_spec_folder),
    tags = {"data_type": "nonPII"}
)

poller = fs_client.feature_sets.begin_create_or_update(transaction_fset_config)
print(poller.result())

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 47, Finished, Available)

Uploading feature_set_spec_custom_source (0.01 MBs): 100%|██████████| 5047/5047 [00:01<00:00, 3968.37it/s]




Readonly attribute name will be ignored in class <class 'azure.ai.ml._restclient.v2023_02_01_preview.models._models_py3.FeaturesetVersion'>
description: 7-day and 3-day rolling aggregation of transactions featureset
entities:
- azureml:account:1
name: transactions
specification:
  path: azureml://subscriptions/1aefdc5e-3a7c-4d71-a9f9-f5d3b03be19a/resourcegroups/ruizhmiscrg/workspaces/ruizh-fs-test-1-8-release-3/datastores/workspaceblobstore/paths/LocalUpload/2fed474364f3f4f3dc9de3de0ed056c9/feature_set_spec_custom_source
stage: Development
tags:
  data_type: nonPII
version: ondemand_8



In [43]:
transactions_fset_config = featurestore.feature_sets.get(name="transactions", version="ondemand")

print(transactions_fset_config)

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 48, Finished, Available)

_AzureMLSparkOnBehalfOfCredential.get_token succeeded


FeatureSet
{
  "name": "'transactions'",
  "version": "'ondemand_8'",
  "specification": "<azure.ai.ml.entities._feature_set.feature_set_specification.FeatureSetSpecification object at 0x7fc8c86c7280>",
  "source": "CustomFeatureSource(timestamp_column: TimestampColumn(Name=timestamp,Format=%Y-%m-%d %H:%M:%S), kwargs: {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}, source_delay: None, source_process_code: SourceProcessCode(Path=./source_process_code,ProcessClass=bar.CustomerTransactionsTransformer))",
  "entities": [
    "FeatureStoreEntity({'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'account', 'description': 'This entity represents user account index key accountID.', 'tags': {'data_typ': 'nonPII'}, 'properties': {}, 'print_as_yaml': True, 'id': None, 'Resource__source_path': None, 'base_path': '/synfs/notebook/119/aml_notebook_mount', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7fc8c8665be0>, 'version': '1

# Step 3: Test offline scenario locally
In this step we will try to test the offline (Spark) portion of the feature set is working as expected

In [44]:
df = transactions_fset_config.to_spark_dataframe()

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 49, Finished, Available)

FeatureSet: transactions, version: ondemand_8, does not have materialization settings..
FeatureSet: transactions, version: ondemand_8 load data from source..




In [45]:
display(df)

StatementMeta(10e3b193-7fde-430d-a057-a1f189ec8233, 119, 50, Finished, Available)

SynapseWidget(Synapse.DataFrame, 9ff35f3c-f980-4b35-997d-a351e2a44c4e)

You can see we successfully fetched the training dataframe using the offline path!