d
# Feature Definition and Registration
This notebook has all **Feature Definitions and Feature Registrations** pertaining to the Home Credit Risk Default usecase

In [19]:
%pip install feast-azure-provider
%pip install azure-cli
%pip install snowflake-connector-python==2.7.4
%pip install pyarrow==6.0.1
%pip install lightgbm
%pip install mlflow
%pip install pyarrow==6.0.1


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


#### Set Environment Variables

Fetch the application service principal secret from Azure Key Vault and set the secret as an environment variable.

This is used by feast during the feature registration to Azure Blob Container

In [1]:
from azureml.core import Workspace
from azureml.core import Keyvault
import os
ws = Workspace.from_config()
keyvault = ws.get_default_keyvault()

In [2]:
import os
os.environ["REGISTRY_BLOB_KEY"] = keyvault.get_secret("registrytoken")

#### Entity, Features, Feature View and Feature Service Definition

For the "Home Credit Risk Default" modeling usecase,
- use Snowflake tables as source of feature values
- register entity, feature views and feature service

The Feast Infrastructure Configuration (yaml) file is stored in /tmp folder to run the feast registry. Following successful feast registry, registry.db file can be found in blob storage container as specified in yaml file

In [3]:
#Configuration
repo_path = "/tmp" #Feast Feature Repo Path
if not os.path.exists(repo_path):
    os.makedirs(repo_path)

In [4]:
path = "<REGISTRY_BLOB_STORAGE>" # https://<STORAGE_NAME>.blob.core.windows.net/featurestore/registry.db"
account = "feast-dev"
sf_feature_store_config="""
project: eh_credit_01
registry: 
    registry_store_type: feast_azure_provider.registry_store.AzBlobRegistryStore
    path: {}   
provider: feast_azure_provider.azure_provider.AzureProvider
offline_store:
    type: snowflake.offline
    account: {}
    user: evan_hou
    password: {}
    role: DSA_USER_ROLE
    warehouse: COMMON_WH
    database: TEST
online_store:
    type: sqlite
    path: data/online.db
""".format(path, account, keyvault.get_secret("sfaccesskey1"))

In [5]:
with open(repo_path+'/feature_store.yaml', 'w') as f:
    lines = f.write(sf_feature_store_config)

#### Feast apply to registry the feature views and services

Registry.db file can be found in blob storage container

In [6]:
from datetime import timedelta
from feast import Entity, Feature, FeatureStore, FeatureService, FeatureView, SnowflakeSource, ValueType
from google.protobuf.json_format import MessageToDict
import yaml

Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage


In [7]:
### Configuration
repo_path = "/tmp" #Feast Feature Repo Path
fs = FeatureStore(repo_path)
config_path = repo_path + "/feature_store.yaml"
database_name = yaml.safe_load(open(config_path))["offline_store"]["database"]

print("Database Source: ", database_name)

###Source Data
customer_info_table = SnowflakeSource(
    database=database_name,
    schema="PUBLIC",
    table="STATIC_FEATURE_TABLE", #SNOWFLAKE TABLE NAME
    event_timestamp_column="EVENT_TIMESTAMP",
    created_timestamp_column="CREATED_TIMESTAMP"
)

bureau_feature_table = SnowflakeSource(
    database=database_name,
    schema="PUBLIC",
    table="BUREAU_FEATURE_TABLE", #SNOWFLAKE TABLE NAME
    event_timestamp_column="EVENT_TIMESTAMP",
    created_timestamp_column="CREATED_TIMESTAMP"
    
)

previous_loan_feature_table = SnowflakeSource(
    database=database_name,
    schema="PUBLIC",
    table="PREVIOUS_LOAN_FEATURES_TABLE", #SNOWFLAKE TABLE NAME
    event_timestamp_column="EVENT_TIMESTAMP",
    created_timestamp_column="CREATED_TIMESTAMP"
)

### Entity
customer =  Entity(name="SK_ID_CURR", value_type=ValueType.INT64, description="customer id",)

### Feature Views
customer_stats_view = FeatureView(
    name="static_feature_view",
    entities=["SK_ID_CURR"],
    ttl=timedelta(days=90),
    features=[
        Feature(name="OCCUPATION_TYPE", dtype=ValueType.STRING),
        Feature(name="AMT_INCOME_TOTAL", dtype=ValueType.FLOAT),
        Feature(name="NAME_INCOME_TYPE", dtype=ValueType.STRING),
        Feature(name="DAYS_LAST_PHONE_CHANGE", dtype=ValueType.FLOAT),
        Feature(name="ORGANIZATION_TYPE", dtype=ValueType.STRING),
        Feature(name="AMT_CREDIT", dtype=ValueType.FLOAT),
        Feature(name="AMT_GOODS_PRICE", dtype=ValueType.FLOAT),
        Feature(name="DAYS_REGISTRATION", dtype=ValueType.FLOAT),
        Feature(name="AMT_ANNUITY", dtype=ValueType.FLOAT),
        Feature(name="CODE_GENDER", dtype=ValueType.STRING),
        Feature(name="DAYS_ID_PUBLISH", dtype=ValueType.INT64),
        Feature(name="NAME_EDUCATION_TYPE", dtype=ValueType.STRING),
        Feature(name="DAYS_EMPLOYED", dtype=ValueType.INT64),
        Feature(name="DAYS_BIRTH", dtype=ValueType.INT64),
        Feature(name="EXT_SOURCE_1", dtype=ValueType.FLOAT),
        Feature(name="EXT_SOURCE_2", dtype=ValueType.FLOAT),
        Feature(name="EXT_SOURCE_3", dtype=ValueType.FLOAT),
    ],
    online=False,
    batch_source=customer_info_table,
    tags={},
)


bureau_view = FeatureView(
    name="bureau_feature_view",
    entities=["SK_ID_CURR"],
    ttl=timedelta(days=90),
    online=False,
    batch_source=bureau_feature_table,
    tags={},
)

previous_loan_view = FeatureView(
    name="previous_loan_feature_view",
    entities=["SK_ID_CURR"],
    ttl=timedelta(days=90),
    online=False,
    batch_source=previous_loan_feature_table,
    tags={},
)

### Feature Services
cust_fs_1 = FeatureService(
    name="eh_dbr_credit_model",
    features=[customer_stats_view, bureau_view, previous_loan_view ]
)

risk_model_bureau_fs = FeatureService(
    name="risk_model_bureau_fs",
    features=[bureau_view, previous_loan_view]
)

  and should_run_async(code)


Database Source:  TEST


#### Feature Registration via FEAST

- Register Features, Entity, Feature Views and Feature Service using Feast.
- Feature registration implies storing the definitions and associated metadata into Azure Blob Container
- List the feature views from the registry to ensure the registration was successful

In [8]:

fs = FeatureStore(repo_path)
fs.apply([customer, customer_stats_view, bureau_view, previous_loan_view, cust_fs_1, risk_model_bureau_fs])
# List features from registry
print("====FEATURE VIEWS====")
fv_list = fs.list_feature_views()
for fv in fv_list:
    d=MessageToDict(fv.to_proto())
    print("Feature View Name:", d['spec']['name'])
    print("Entities:", d['spec']['entities'])
    print("Features:", d['spec']['features'])
    print("Source Type:", d['spec']['batchSource']['dataSourceClassType'])
    print("\n")

print("====FEATURE SERVICE====")
fs_list = fs.list_feature_services()
for fserv in fs_list:
    d=MessageToDict(fserv.to_proto())
    print("Feature Service Name:", d['spec']['name'])
    print("Feature Views:", d['spec']['features'])
    print("\n")


  and should_run_async(code)


====FEATURE VIEWS====
Feature View Name: static_feature_view
Entities: ['SK_ID_CURR']
Features: [{'name': 'OCCUPATION_TYPE', 'valueType': 'STRING'}, {'name': 'AMT_INCOME_TOTAL', 'valueType': 'FLOAT'}, {'name': 'NAME_INCOME_TYPE', 'valueType': 'STRING'}, {'name': 'DAYS_LAST_PHONE_CHANGE', 'valueType': 'FLOAT'}, {'name': 'ORGANIZATION_TYPE', 'valueType': 'STRING'}, {'name': 'AMT_CREDIT', 'valueType': 'FLOAT'}, {'name': 'AMT_GOODS_PRICE', 'valueType': 'FLOAT'}, {'name': 'DAYS_REGISTRATION', 'valueType': 'FLOAT'}, {'name': 'AMT_ANNUITY', 'valueType': 'FLOAT'}, {'name': 'CODE_GENDER', 'valueType': 'STRING'}, {'name': 'DAYS_ID_PUBLISH', 'valueType': 'INT64'}, {'name': 'NAME_EDUCATION_TYPE', 'valueType': 'STRING'}, {'name': 'DAYS_EMPLOYED', 'valueType': 'INT64'}, {'name': 'DAYS_BIRTH', 'valueType': 'INT64'}, {'name': 'EXT_SOURCE_1', 'valueType': 'FLOAT'}, {'name': 'EXT_SOURCE_2', 'valueType': 'FLOAT'}, {'name': 'EXT_SOURCE_3', 'valueType': 'FLOAT'}]
Source Type: feast.infra.offline_stores.sno

#### Testing pulling training data [Optional]

If all the features have been registered, get_historical_features call should return a the first 5 rows of a pandas dataframe

In [9]:
from datetime import datetime
from feast import FeatureStore
import pandas as pd

repo_path = "/tmp/" #Feast Feature Repo Path
fs = FeatureStore(repo_path)
feature_service = fs.get_feature_service("eh_dbr_credit_model")
entity_df = pd.DataFrame.from_dict(
    {
        "SK_ID_CURR": [100002, 100003, 100004],
        "label": [1, 0, 1],
        "event_timestamp": [
            datetime(2022,2,24),
            datetime(2022,2,24),
            datetime(2022,2,24),
        ],
    }
)

bureau_df = fs.get_historical_features(
    entity_df=entity_df,
    features=feature_service
).to_df()

print(bureau_df.head(5))


  and should_run_async(code)


   SK_ID_CURR  label event_timestamp OCCUPATION_TYPE  AMT_INCOME_TOTAL  \
0      100004      1      2022-02-24        Laborers           67500.0   
1      100002      1      2022-02-24        Laborers          202500.0   
2      100003      0      2022-02-24      Core staff          270000.0   

  NAME_INCOME_TYPE  DAYS_LAST_PHONE_CHANGE       ORGANIZATION_TYPE  \
0          Working                  -815.0              Government   
1          Working                 -1134.0  Business Entity Type 3   
2    State servant                  -828.0                  School   

   AMT_CREDIT  AMT_GOODS_PRICE  ...  CREDIT_TYPE_MOBILE_OPERATOR_LOAN  \
0    135000.0         135000.0  ...                               0.0   
1    406597.5         351000.0  ...                               0.0   
2   1293502.5        1129500.0  ...                               0.0   

   CREDIT_TYPE_MORTGAGE CREDIT_TYPE_REAL_ESTATE_LOAN  \
0                   0.0                          0.0   
1                