# Feathr Feature Store on Home Credit

This notebook illustrates the use of Feature Store to create a model for home credits. It includes these steps:



## Prerequisite: Install Feathr

Install Feathr using pip:

`pip install -U feathr pandavro scikit-learn`

Or if you want to use the latest Feathr code from GitHub:

`pip install -I git+https://github.com/linkedin/feathr.git#subdirectory=feathr_project pandavro scikit-learn`

In [1]:
%pip install -U feathr pandavro scikit-learn

[0mNote: you may need to restart the kernel to use updated packages.


## Prerequisite: Configure the required environment

In the first step (Provision cloud resources), you should have provisioned all the required cloud resources. If you use Feathr CLI to create a workspace, you should have a folder with a file called `feathr_config.yaml` in it with all the required configurations. Otherwise, update the configuration below.

The code below will write this configuration string to a temporary location and load it to Feathr. Please still refer to [feathr_config.yaml](https://github.com/linkedin/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/feathr_config.yaml) and use that as the source of truth. It should also have more explanations on the meaning of each variable.

In [2]:
import tempfile
yaml_config = """
# Please refer to https://github.com/linkedin/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/feathr_config.yaml for explanations on the meaning of each field.
api_version: 1
project_config:
  project_name: 'feathr_home_credit'
  required_environment_variables:
    - 'REDIS_PASSWORD'
    - 'AZURE_CLIENT_ID'
    - 'AZURE_TENANT_ID'
    - 'AZURE_CLIENT_SECRET'
offline_store:
  adls:
    adls_enabled: tru
  wasb:
    wasb_enabled: true
  s3:
    s3_enabled: false
    s3_endpoint: 's3.amazonaws.com'
  jdbc:
    jdbc_enabled: false
    jdbc_database: 'feathrtestdb'
    jdbc_table: 'feathrtesttable'
  snowflake:
    url: "dqllago-ol19457.snowflakecomputing.com"
    user: "feathrintegration"
    role: "ACCOUNTADMIN"
spark_config:
  spark_cluster: 'azure_synapse'
  spark_result_output_parts: '1'
  azure_synapse:
    dev_url: "https://feathrhomecreditcaspark.dev.azuresynapse.net"
    pool_name: "spark31"
    # workspace dir for storing all the required configuration files and the jar resources
    workspace_dir: "abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/"
    executor_size: "Small"
    executor_num: 4
    feathr_runtime_location: wasbs://public@azurefeathrstorage.blob.core.windows.net/feathr-assembly-LATEST.jar
  databricks:
    workspace_instance_url: 'https://adb-6885802458123232.12.azuredatabricks.net/'
    workspace_token_value: ''
    config_template: {'run_name':'','new_cluster':{'spark_version':'9.1.x-scala2.12','node_type_id':'Standard_D3_v2','num_workers':2,'spark_conf':{}},'libraries':[{'jar':''}],'spark_jar_task':{'main_class_name':'','parameters':['']}}
    work_dir: 'dbfs:/feathr_getting_started'
    feathr_runtime_location: wasbs://public@azurefeathrstorage.blob.core.windows.net/feathr-assembly-LATEST.jar
online_store:
  redis:
    host: 'feathrhomecreditcaredis.redis.cache.windows.net'
    port: 6380
    ssl_enabled: True
feature_registry:
  purview:
    type_system_initialization: true
    purview_name: 'feathrhomecreditcapurview'
    delimiter: '__'
"""
tmp = tempfile.NamedTemporaryFile(mode='w', delete=False)
with open(tmp.name, "w") as text_file:
    text_file.write(yaml_config)


## View the data

In this tutorial, we use Feathr Feature Store to create a model that predicts NYC Taxi fares. The dataset comes from [here](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). The data is as below

In [3]:
import glob
import os
import tempfile
from datetime import datetime, timedelta
from math import sqrt

import pandas as pd
import pandavro as pdx
from feathr import FeathrClient
from feathr import BOOLEAN, FLOAT, INT32, ValueType, STRING
from feathr import Feature, DerivedFeature, FeatureAnchor
from feathr import BackfillTime, MaterializationSettings
from feathr import FeatureQuery, ObservationSettings
from feathr import RedisSink
from feathr import INPUT_CONTEXT, HdfsSource
from feathr import WindowAggTransformation
from feathr import TypedKey
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import lit

## Setup necessary environment variables

You have to setup the environment variables in order to run this sample. More environment variables can be set by referring to [feathr_config.yaml](https://github.com/linkedin/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/feathr_config.yaml) and use that as the source of truth. It should also have more explanations on the meaning of each variable.

In [4]:
os.environ['REDIS_PASSWORD'] = ''
os.environ['AZURE_CLIENT_ID'] = ''
os.environ['AZURE_TENANT_ID'] = '' 
os.environ['AZURE_CLIENT_SECRET'] = ''

Then we will initialize a feathr client:


In [5]:
client = FeathrClient(config_path=tmp.name)

## Misc pre-processing methods

## Static features pre-processing (Application train)

In [6]:
def application_train_preprocessing(df: DataFrame) -> DataFrame:
    import datetime

    df = df.withColumn("TRAN_DATE", lit(datetime.datetime(2021,1,1,11,34,44).strftime('%Y-%m-%d %X')))

    return df


## Feature definition for Application train

In [7]:

# source for pass through features
# "TRAN_DATE" column created on on the "datasource_prepocessing" method.
application_train_source_core = HdfsSource(name="applicationTrainSourceCore",
                          path="abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/home_credit_data/application_train.csv",
                          preprocessing=application_train_preprocessing,
                          event_timestamp_column="TRAN_DATE",
                          timestamp_format="yyyy-MM-dd HH:mm:ss"
                          )

# key definition for application train
key_SK_ID_CURR = TypedKey(key_column="SK_ID_CURR",
                       key_column_type=ValueType.INT32,
                       description="SK ID CURR",
                       full_name="application_train.SK_ID_CURR")

# pass through columns of Application train CSV
# columns Application train
f_SK_ID_CURR = Feature(name="f_SK_ID_CURR",
                  key=key_SK_ID_CURR,
                  feature_type=INT32,
                  transform="SK_ID_CURR")

f_TARGET = Feature(name="f_TARGET",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="TARGET")
f_NAME_CONTRACT_TYPE = Feature(name="f_NAME_CONTRACT_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_CONTRACT_TYPE")
f_CODE_GENDER = Feature(name="f_CODE_GENDER",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="CODE_GENDER")
f_FLAG_OWN_CAR = Feature(name="f_FLAG_OWN_CAR",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_OWN_CAR")
f_FLAG_OWN_REALTY = Feature(name="f_FLAG_OWN_REALTY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_OWN_REALTY")
f_CNT_CHILDREN = Feature(name="f_CNT_CHILDREN",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="CNT_CHILDREN")
f_AMT_INCOME_TOTAL = Feature(name="f_AMT_INCOME_TOTAL",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_INCOME_TOTAL")
f_AMT_CREDIT = Feature(name="f_AMT_CREDIT",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_CREDIT")
f_AMT_ANNUITY = Feature(name="f_AMT_ANNUITY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_ANNUITY")
f_AMT_GOODS_PRICE = Feature(name="f_AMT_GOODS_PRICE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_GOODS_PRICE")
f_NAME_TYPE_SUITE = Feature(name="f_NAME_TYPE_SUITE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_TYPE_SUITE")
f_NAME_INCOME_TYPE = Feature(name="f_NAME_INCOME_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_INCOME_TYPE")
f_NAME_EDUCATION_TYPE = Feature(name="f_NAME_EDUCATION_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_EDUCATION_TYPE")
f_NAME_FAMILY_STATUS = Feature(name="f_NAME_FAMILY_STATUS",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_FAMILY_STATUS")
f_NAME_HOUSING_TYPE = Feature(name="f_NAME_HOUSING_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NAME_HOUSING_TYPE")
f_REGION_POPULATION_RELATIVE = Feature(name="f_REGION_POPULATION_RELATIVE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REGION_POPULATION_RELATIVE")
f_DAYS_BIRTH = Feature(name="f_DAYS_BIRTH",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DAYS_BIRTH")
f_DAYS_EMPLOYED = Feature(name="f_DAYS_EMPLOYED",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DAYS_EMPLOYED")
f_DAYS_REGISTRATION = Feature(name="f_DAYS_REGISTRATION",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DAYS_REGISTRATION")
f_DAYS_ID_PUBLISH = Feature(name="f_DAYS_ID_PUBLISH",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DAYS_ID_PUBLISH")
f_OWN_CAR_AGE = Feature(name="f_OWN_CAR_AGE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="OWN_CAR_AGE")
f_FLAG_MOBIL = Feature(name="f_FLAG_MOBIL",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_MOBIL")
f_FLAG_EMP_PHONE = Feature(name="f_FLAG_EMP_PHONE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_EMP_PHONE")
f_FLAG_WORK_PHONE = Feature(name="f_FLAG_WORK_PHONE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_WORK_PHONE")
f_FLAG_CONT_MOBILE = Feature(name="f_FLAG_CONT_MOBILE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_CONT_MOBILE")
f_FLAG_PHONE = Feature(name="f_FLAG_PHONE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_PHONE")
f_FLAG_EMAIL = Feature(name="f_FLAG_EMAIL",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_EMAIL")
f_OCCUPATION_TYPE = Feature(name="f_OCCUPATION_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="OCCUPATION_TYPE")
f_CNT_FAM_MEMBERS = Feature(name="f_CNT_FAM_MEMBERS",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="CNT_FAM_MEMBERS")
f_REGION_RATING_CLIENT = Feature(name="f_REGION_RATING_CLIENT",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REGION_RATING_CLIENT")
f_REGION_RATING_CLIENT_W_CITY = Feature(name="f_REGION_RATING_CLIENT_W_CITY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REGION_RATING_CLIENT_W_CITY")
f_WEEKDAY_APPR_PROCESS_START = Feature(name="f_WEEKDAY_APPR_PROCESS_START",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="WEEKDAY_APPR_PROCESS_START")
f_HOUR_APPR_PROCESS_START = Feature(name="f_HOUR_APPR_PROCESS_START",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="HOUR_APPR_PROCESS_START")
f_REG_REGION_NOT_LIVE_REGION = Feature(name="f_REG_REGION_NOT_LIVE_REGION",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REG_REGION_NOT_LIVE_REGION")
f_REG_REGION_NOT_WORK_REGION = Feature(name="f_REG_REGION_NOT_WORK_REGION",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REG_REGION_NOT_WORK_REGION")
f_LIVE_REGION_NOT_WORK_REGION = Feature(name="f_LIVE_REGION_NOT_WORK_REGION",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVE_REGION_NOT_WORK_REGION")
f_REG_CITY_NOT_LIVE_CITY = Feature(name="f_REG_CITY_NOT_LIVE_CITY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REG_CITY_NOT_LIVE_CITY")
f_REG_CITY_NOT_WORK_CITY = Feature(name="f_REG_CITY_NOT_WORK_CITY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="REG_CITY_NOT_WORK_CITY")
f_LIVE_CITY_NOT_WORK_CITY = Feature(name="f_LIVE_CITY_NOT_WORK_CITY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVE_CITY_NOT_WORK_CITY")
f_ORGANIZATION_TYPE = Feature(name="f_ORGANIZATION_TYPE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ORGANIZATION_TYPE")
f_EXT_SOURCE_1 = Feature(name="f_EXT_SOURCE_1",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="EXT_SOURCE_1")
f_EXT_SOURCE_2 = Feature(name="f_EXT_SOURCE_2",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="EXT_SOURCE_2")
f_EXT_SOURCE_3 = Feature(name="f_EXT_SOURCE_3",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="EXT_SOURCE_3")
f_APARTMENTS_AVG = Feature(name="f_APARTMENTS_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="APARTMENTS_AVG")
f_BASEMENTAREA_AVG = Feature(name="f_BASEMENTAREA_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="BASEMENTAREA_AVG")
f_YEARS_BEGINEXPLUATATION_AVG = Feature(name="f_YEARS_BEGINEXPLUATATION_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BEGINEXPLUATATION_AVG")
f_YEARS_BUILD_AVG = Feature(name="f_YEARS_BUILD_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BUILD_AVG")
f_COMMONAREA_AVG = Feature(name="f_COMMONAREA_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="COMMONAREA_AVG")
f_ELEVATORS_AVG = Feature(name="f_ELEVATORS_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ELEVATORS_AVG")
f_ENTRANCES_AVG = Feature(name="f_ENTRANCES_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ENTRANCES_AVG")
f_FLOORSMAX_AVG = Feature(name="f_FLOORSMAX_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMAX_AVG")
f_FLOORSMIN_AVG = Feature(name="f_FLOORSMIN_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMIN_AVG")
f_LANDAREA_AVG = Feature(name="f_LANDAREA_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LANDAREA_AVG")
f_LIVINGAPARTMENTS_AVG = Feature(name="f_LIVINGAPARTMENTS_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAPARTMENTS_AVG")
f_LIVINGAREA_AVG = Feature(name="f_LIVINGAREA_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAREA_AVG")
f_NONLIVINGAPARTMENTS_AVG = Feature(name="f_NONLIVINGAPARTMENTS_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAPARTMENTS_AVG")
f_NONLIVINGAREA_AVG = Feature(name="f_NONLIVINGAREA_AVG",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAREA_AVG")
f_APARTMENTS_MODE = Feature(name="f_APARTMENTS_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="APARTMENTS_MODE")
f_BASEMENTAREA_MODE = Feature(name="f_BASEMENTAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="BASEMENTAREA_MODE")
f_YEARS_BEGINEXPLUATATION_MODE = Feature(name="f_YEARS_BEGINEXPLUATATION_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BEGINEXPLUATATION_MODE")
f_YEARS_BUILD_MODE = Feature(name="f_YEARS_BUILD_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BUILD_MODE")
f_COMMONAREA_MODE = Feature(name="f_COMMONAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="COMMONAREA_MODE")
f_ELEVATORS_MODE = Feature(name="f_ELEVATORS_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ELEVATORS_MODE")
f_ENTRANCES_MODE = Feature(name="f_ENTRANCES_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ENTRANCES_MODE")
f_FLOORSMAX_MODE = Feature(name="f_FLOORSMAX_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMAX_MODE")
f_FLOORSMIN_MODE = Feature(name="f_FLOORSMIN_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMIN_MODE")
f_LANDAREA_MODE = Feature(name="f_LANDAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LANDAREA_MODE")
f_LIVINGAPARTMENTS_MODE = Feature(name="f_LIVINGAPARTMENTS_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAPARTMENTS_MODE")
f_LIVINGAREA_MODE = Feature(name="f_LIVINGAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAREA_MODE")
f_NONLIVINGAPARTMENTS_MODE = Feature(name="f_NONLIVINGAPARTMENTS_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAPARTMENTS_MODE")
f_NONLIVINGAREA_MODE = Feature(name="f_NONLIVINGAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAREA_MODE")
f_APARTMENTS_MEDI = Feature(name="f_APARTMENTS_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="APARTMENTS_MEDI")
f_BASEMENTAREA_MEDI = Feature(name="f_BASEMENTAREA_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="BASEMENTAREA_MEDI")
f_YEARS_BEGINEXPLUATATION_MEDI = Feature(name="f_YEARS_BEGINEXPLUATATION_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BEGINEXPLUATATION_MEDI")
f_YEARS_BUILD_MEDI = Feature(name="f_YEARS_BUILD_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="YEARS_BUILD_MEDI")
f_COMMONAREA_MEDI = Feature(name="f_COMMONAREA_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="COMMONAREA_MEDI")
f_ELEVATORS_MEDI = Feature(name="f_ELEVATORS_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ELEVATORS_MEDI")
f_ENTRANCES_MEDI = Feature(name="f_ENTRANCES_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="ENTRANCES_MEDI")
f_FLOORSMAX_MEDI = Feature(name="f_FLOORSMAX_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMAX_MEDI")
f_FLOORSMIN_MEDI = Feature(name="f_FLOORSMIN_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLOORSMIN_MEDI")
f_LANDAREA_MEDI = Feature(name="f_LANDAREA_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LANDAREA_MEDI")
f_LIVINGAPARTMENTS_MEDI = Feature(name="f_LIVINGAPARTMENTS_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAPARTMENTS_MEDI")
f_LIVINGAREA_MEDI = Feature(name="f_LIVINGAREA_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="LIVINGAREA_MEDI")
f_NONLIVINGAPARTMENTS_MEDI = Feature(name="f_NONLIVINGAPARTMENTS_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAPARTMENTS_MEDI")
f_NONLIVINGAREA_MEDI = Feature(name="f_NONLIVINGAREA_MEDI",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="NONLIVINGAREA_MEDI")
f_FONDKAPREMONT_MODE = Feature(name="f_FONDKAPREMONT_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FONDKAPREMONT_MODE")
f_HOUSETYPE_MODE = Feature(name="f_HOUSETYPE_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="HOUSETYPE_MODE")
f_TOTALAREA_MODE = Feature(name="f_TOTALAREA_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="TOTALAREA_MODE")
f_WALLSMATERIAL_MODE = Feature(name="f_WALLSMATERIAL_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="WALLSMATERIAL_MODE")
f_EMERGENCYSTATE_MODE = Feature(name="f_EMERGENCYSTATE_MODE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="EMERGENCYSTATE_MODE")
f_OBS_30_CNT_SOCIAL_CIRCLE = Feature(name="f_OBS_30_CNT_SOCIAL_CIRCLE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="OBS_30_CNT_SOCIAL_CIRCLE")
f_DEF_30_CNT_SOCIAL_CIRCLE = Feature(name="f_DEF_30_CNT_SOCIAL_CIRCLE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DEF_30_CNT_SOCIAL_CIRCLE")
f_OBS_60_CNT_SOCIAL_CIRCLE = Feature(name="f_OBS_60_CNT_SOCIAL_CIRCLE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="OBS_60_CNT_SOCIAL_CIRCLE")
f_DEF_60_CNT_SOCIAL_CIRCLE = Feature(name="f_DEF_60_CNT_SOCIAL_CIRCLE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DEF_60_CNT_SOCIAL_CIRCLE")
f_DAYS_LAST_PHONE_CHANGE = Feature(name="f_DAYS_LAST_PHONE_CHANGE",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="DAYS_LAST_PHONE_CHANGE")
f_FLAG_DOCUMENT_2 = Feature(name="f_FLAG_DOCUMENT_2",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_2")
f_FLAG_DOCUMENT_3 = Feature(name="f_FLAG_DOCUMENT_3",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_3")
f_FLAG_DOCUMENT_4 = Feature(name="f_FLAG_DOCUMENT_4",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_4")
f_FLAG_DOCUMENT_5 = Feature(name="f_FLAG_DOCUMENT_5",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_5")
f_FLAG_DOCUMENT_6 = Feature(name="f_FLAG_DOCUMENT_6",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_6")
f_FLAG_DOCUMENT_7 = Feature(name="f_FLAG_DOCUMENT_7",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_7")
f_FLAG_DOCUMENT_8 = Feature(name="f_FLAG_DOCUMENT_8",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_8")
f_FLAG_DOCUMENT_9 = Feature(name="f_FLAG_DOCUMENT_9",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_9")
f_FLAG_DOCUMENT_10 = Feature(name="f_FLAG_DOCUMENT_10",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_10")
f_FLAG_DOCUMENT_11 = Feature(name="f_FLAG_DOCUMENT_11",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_11")
f_FLAG_DOCUMENT_12 = Feature(name="f_FLAG_DOCUMENT_12",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_12")
f_FLAG_DOCUMENT_13 = Feature(name="f_FLAG_DOCUMENT_13",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_13")
f_FLAG_DOCUMENT_14 = Feature(name="f_FLAG_DOCUMENT_14",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_14")
f_FLAG_DOCUMENT_15 = Feature(name="f_FLAG_DOCUMENT_15",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_15")
f_FLAG_DOCUMENT_16 = Feature(name="f_FLAG_DOCUMENT_16",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_16")
f_FLAG_DOCUMENT_17 = Feature(name="f_FLAG_DOCUMENT_17",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_17")
f_FLAG_DOCUMENT_18 = Feature(name="f_FLAG_DOCUMENT_18",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_18")
f_FLAG_DOCUMENT_19 = Feature(name="f_FLAG_DOCUMENT_19",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_19")
f_FLAG_DOCUMENT_20 = Feature(name="f_FLAG_DOCUMENT_20",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_20")
f_FLAG_DOCUMENT_21 = Feature(name="f_FLAG_DOCUMENT_21",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="FLAG_DOCUMENT_21")
f_AMT_REQ_CREDIT_BUREAU_HOUR = Feature(name="f_AMT_REQ_CREDIT_BUREAU_HOUR",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_HOUR")
f_AMT_REQ_CREDIT_BUREAU_DAY = Feature(name="f_AMT_REQ_CREDIT_BUREAU_DAY",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_DAY")
f_AMT_REQ_CREDIT_BUREAU_WEEK = Feature(name="f_AMT_REQ_CREDIT_BUREAU_WEEK",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_WEEK")
f_AMT_REQ_CREDIT_BUREAU_MON = Feature(name="f_AMT_REQ_CREDIT_BUREAU_MON",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_MON")
f_AMT_REQ_CREDIT_BUREAU_QRT = Feature(name="f_AMT_REQ_CREDIT_BUREAU_QRT",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_QRT")
f_AMT_REQ_CREDIT_BUREAU_YEAR = Feature(name="f_AMT_REQ_CREDIT_BUREAU_YEAR",
                  key=key_SK_ID_CURR,
                  feature_type=STRING,
                  transform="AMT_REQ_CREDIT_BUREAU_YEAR")

features_application_train_core=[
  f_SK_ID_CURR,
  f_TARGET,
  f_NAME_CONTRACT_TYPE,
  f_CODE_GENDER,
  f_FLAG_OWN_CAR,
  f_FLAG_OWN_REALTY,
  f_CNT_CHILDREN,
  f_AMT_INCOME_TOTAL,
  f_AMT_CREDIT,
  f_AMT_ANNUITY,
  f_AMT_GOODS_PRICE,
  f_NAME_TYPE_SUITE,
  f_NAME_INCOME_TYPE,
  f_NAME_EDUCATION_TYPE,
  f_NAME_FAMILY_STATUS,
  f_NAME_HOUSING_TYPE,
  f_REGION_POPULATION_RELATIVE,
  f_DAYS_BIRTH,
  f_DAYS_EMPLOYED,
  f_DAYS_REGISTRATION,
  f_DAYS_ID_PUBLISH,
  f_OWN_CAR_AGE,
  f_FLAG_MOBIL,
  f_FLAG_EMP_PHONE,
  f_FLAG_WORK_PHONE,
  f_FLAG_CONT_MOBILE,
  f_FLAG_PHONE,
  f_FLAG_EMAIL,
  f_OCCUPATION_TYPE,
  f_CNT_FAM_MEMBERS,
  f_REGION_RATING_CLIENT,
  f_REGION_RATING_CLIENT_W_CITY,
  f_WEEKDAY_APPR_PROCESS_START,
  f_HOUR_APPR_PROCESS_START,
  f_REG_REGION_NOT_LIVE_REGION,
  f_REG_REGION_NOT_WORK_REGION,
  f_LIVE_REGION_NOT_WORK_REGION,
  f_REG_CITY_NOT_LIVE_CITY,
  f_REG_CITY_NOT_WORK_CITY,
  f_LIVE_CITY_NOT_WORK_CITY,
  f_ORGANIZATION_TYPE,
  f_EXT_SOURCE_1,
  f_EXT_SOURCE_2,
  f_EXT_SOURCE_3,
  f_APARTMENTS_AVG,
  f_BASEMENTAREA_AVG,
  f_YEARS_BEGINEXPLUATATION_AVG,
  f_YEARS_BUILD_AVG,
  f_COMMONAREA_AVG,
  f_ELEVATORS_AVG,
  f_ENTRANCES_AVG,
  f_FLOORSMAX_AVG,
  f_FLOORSMIN_AVG,
  f_LANDAREA_AVG,
  f_LIVINGAPARTMENTS_AVG,
  f_LIVINGAREA_AVG,
  f_NONLIVINGAPARTMENTS_AVG,
  f_NONLIVINGAREA_AVG,
  f_APARTMENTS_MODE,
  f_BASEMENTAREA_MODE,
  f_YEARS_BEGINEXPLUATATION_MODE,
  f_YEARS_BUILD_MODE,
  f_COMMONAREA_MODE,
  f_ELEVATORS_MODE,
  f_ENTRANCES_MODE,
  f_FLOORSMAX_MODE,
  f_FLOORSMIN_MODE,
  f_LANDAREA_MODE,
  f_LIVINGAPARTMENTS_MODE,
  f_LIVINGAREA_MODE,
  f_NONLIVINGAPARTMENTS_MODE,
  f_NONLIVINGAREA_MODE,
  f_APARTMENTS_MEDI,
  f_BASEMENTAREA_MEDI,
  f_YEARS_BEGINEXPLUATATION_MEDI,
  f_YEARS_BUILD_MEDI,
  f_COMMONAREA_MEDI,
  f_ELEVATORS_MEDI,
  f_ENTRANCES_MEDI,
  f_FLOORSMAX_MEDI,
  f_FLOORSMIN_MEDI,
  f_LANDAREA_MEDI,
  f_LIVINGAPARTMENTS_MEDI,
  f_LIVINGAREA_MEDI,
  f_NONLIVINGAPARTMENTS_MEDI,
  f_NONLIVINGAREA_MEDI,
  f_FONDKAPREMONT_MODE,
  f_HOUSETYPE_MODE,
  f_TOTALAREA_MODE,
  f_WALLSMATERIAL_MODE,
  f_EMERGENCYSTATE_MODE,
  f_OBS_30_CNT_SOCIAL_CIRCLE,
  f_DEF_30_CNT_SOCIAL_CIRCLE,
  f_OBS_60_CNT_SOCIAL_CIRCLE,
  f_DEF_60_CNT_SOCIAL_CIRCLE,
  f_DAYS_LAST_PHONE_CHANGE,
  f_FLAG_DOCUMENT_2,
  f_FLAG_DOCUMENT_3,
  f_FLAG_DOCUMENT_4,
  f_FLAG_DOCUMENT_5,
  f_FLAG_DOCUMENT_6,
  f_FLAG_DOCUMENT_7,
  f_FLAG_DOCUMENT_8,
  f_FLAG_DOCUMENT_9,
  f_FLAG_DOCUMENT_10,
  f_FLAG_DOCUMENT_11,
  f_FLAG_DOCUMENT_12,
  f_FLAG_DOCUMENT_13,
  f_FLAG_DOCUMENT_14,
  f_FLAG_DOCUMENT_15,
  f_FLAG_DOCUMENT_16,
  f_FLAG_DOCUMENT_17,
  f_FLAG_DOCUMENT_18,
  f_FLAG_DOCUMENT_19,
  f_FLAG_DOCUMENT_20,
  f_FLAG_DOCUMENT_21,
  f_AMT_REQ_CREDIT_BUREAU_HOUR,
  f_AMT_REQ_CREDIT_BUREAU_DAY,
  f_AMT_REQ_CREDIT_BUREAU_WEEK,
  f_AMT_REQ_CREDIT_BUREAU_MON,
  f_AMT_REQ_CREDIT_BUREAU_QRT,
  f_AMT_REQ_CREDIT_BUREAU_YEAR,
  ]

anchor_application_train_core = FeatureAnchor(name="anchor_application_train_core",
                                source=application_train_source_core, #INPUT_CONTEXT,
                                features=features_application_train_core)


And then we need to build those features so that it can be consumed later. Note that we have to build both the "anchor" and the "derived" features (which is not anchored to a source).

In [8]:
client.build_features(
    anchor_list=[
        anchor_application_train_core
        ], 
    derived_feature_list=[])

## Create training data using point-in-time correct feature join

A training dataset usually contains entity id columns, multiple feature columns, event timestamp column and label/target column. 

To create a training dataset using Feathr, one needs to provide a feature join configuration file to specify
what features and how these features should be joined to the observation data. The feature join config file mainly contains: 

1. The path of a dataset as the 'spine' for the to-be-created training dataset. We call this input 'spine' dataset the 'observation'
   dataset. Typically, each row of the observation data contains: 
   a) Column(s) representing entity id(s), which will be used as the join key to look up(join) feature value. 
   b) A column representing the event time of the row. By default, Feathr will make sure the feature values joined have
   a timestamp earlier than it, ensuring no data leakage in the resulting training dataset. 
   c) Other columns will be simply pass through onto the output training dataset.
2. The key fields from the observation data, which are used to joined with the feature data.
3. List of feature names to be joined with the observation data. The features must be defined in the feature
   definition configs.
4. The time information of the observation data used to compare with the feature's timestamp during the join.

Create training dataset via:



In [9]:
feature_queries = [
    FeatureQuery(
        feature_list=[
            "f_SK_ID_CURR",
            "f_TARGET",
            "f_NAME_CONTRACT_TYPE",
            "f_CODE_GENDER",
            "f_FLAG_OWN_CAR",
            "f_FLAG_OWN_REALTY",
            "f_CNT_CHILDREN",
            "f_AMT_INCOME_TOTAL",
            "f_AMT_CREDIT",
            "f_AMT_ANNUITY",
            "f_AMT_GOODS_PRICE",
            "f_NAME_TYPE_SUITE",
            "f_NAME_INCOME_TYPE",
            "f_NAME_EDUCATION_TYPE",
            "f_NAME_FAMILY_STATUS",
            "f_NAME_HOUSING_TYPE",
            "f_REGION_POPULATION_RELATIVE",
            "f_DAYS_BIRTH",
            "f_DAYS_EMPLOYED",
            "f_DAYS_REGISTRATION",
            "f_DAYS_ID_PUBLISH",
            "f_OWN_CAR_AGE",
            "f_FLAG_MOBIL",
            "f_FLAG_EMP_PHONE",
            "f_FLAG_WORK_PHONE",
            "f_FLAG_CONT_MOBILE",
            "f_FLAG_PHONE",
            "f_FLAG_EMAIL",
            "f_OCCUPATION_TYPE",
            "f_CNT_FAM_MEMBERS",
            "f_REGION_RATING_CLIENT",
            "f_REGION_RATING_CLIENT_W_CITY",
            "f_WEEKDAY_APPR_PROCESS_START",
            "f_HOUR_APPR_PROCESS_START",
            "f_REG_REGION_NOT_LIVE_REGION",
            "f_REG_REGION_NOT_WORK_REGION",
            "f_LIVE_REGION_NOT_WORK_REGION",
            "f_REG_CITY_NOT_LIVE_CITY",
            "f_REG_CITY_NOT_WORK_CITY",
            "f_LIVE_CITY_NOT_WORK_CITY",
            "f_ORGANIZATION_TYPE",
            "f_EXT_SOURCE_1",
            "f_EXT_SOURCE_2",
            "f_EXT_SOURCE_3",
            "f_APARTMENTS_AVG",
            "f_BASEMENTAREA_AVG",
            "f_YEARS_BEGINEXPLUATATION_AVG",
            "f_YEARS_BUILD_AVG",
            "f_COMMONAREA_AVG",
            "f_ELEVATORS_AVG",
            "f_ENTRANCES_AVG",
            "f_FLOORSMAX_AVG",
            "f_FLOORSMIN_AVG",
            "f_LANDAREA_AVG",
            "f_LIVINGAPARTMENTS_AVG",
            "f_LIVINGAREA_AVG",
            "f_NONLIVINGAPARTMENTS_AVG",
            "f_NONLIVINGAREA_AVG",
            "f_APARTMENTS_MODE",
            "f_BASEMENTAREA_MODE",
            "f_YEARS_BEGINEXPLUATATION_MODE",
            "f_YEARS_BUILD_MODE",
            "f_COMMONAREA_MODE",
            "f_ELEVATORS_MODE",
            "f_ENTRANCES_MODE",
            "f_FLOORSMAX_MODE",
            "f_FLOORSMIN_MODE",
            "f_LANDAREA_MODE",
            "f_LIVINGAPARTMENTS_MODE",
            "f_LIVINGAREA_MODE",
            "f_NONLIVINGAPARTMENTS_MODE",
            "f_NONLIVINGAREA_MODE",
            "f_APARTMENTS_MEDI",
            "f_BASEMENTAREA_MEDI",
            "f_YEARS_BEGINEXPLUATATION_MEDI",
            "f_YEARS_BUILD_MEDI",
            "f_COMMONAREA_MEDI",
            "f_ELEVATORS_MEDI",
            "f_ENTRANCES_MEDI",
            "f_FLOORSMAX_MEDI",
            "f_FLOORSMIN_MEDI",
            "f_LANDAREA_MEDI",
            "f_LIVINGAPARTMENTS_MEDI",
            "f_LIVINGAREA_MEDI",
            "f_NONLIVINGAPARTMENTS_MEDI",
            "f_NONLIVINGAREA_MEDI",
            "f_FONDKAPREMONT_MODE",
            "f_HOUSETYPE_MODE",
            "f_TOTALAREA_MODE",
            "f_WALLSMATERIAL_MODE",
            "f_EMERGENCYSTATE_MODE",
            "f_OBS_30_CNT_SOCIAL_CIRCLE",
            "f_DEF_30_CNT_SOCIAL_CIRCLE",
            "f_OBS_60_CNT_SOCIAL_CIRCLE",
            "f_DEF_60_CNT_SOCIAL_CIRCLE",
            "f_DAYS_LAST_PHONE_CHANGE",
            "f_FLAG_DOCUMENT_2",
            "f_FLAG_DOCUMENT_3",
            "f_FLAG_DOCUMENT_4",
            "f_FLAG_DOCUMENT_5",
            "f_FLAG_DOCUMENT_6",
            "f_FLAG_DOCUMENT_7",
            "f_FLAG_DOCUMENT_8",
            "f_FLAG_DOCUMENT_9",
            "f_FLAG_DOCUMENT_10",
            "f_FLAG_DOCUMENT_11",
            "f_FLAG_DOCUMENT_12",
            "f_FLAG_DOCUMENT_13",
            "f_FLAG_DOCUMENT_14",
            "f_FLAG_DOCUMENT_15",
            "f_FLAG_DOCUMENT_16",
            "f_FLAG_DOCUMENT_17",
            "f_FLAG_DOCUMENT_18",
            "f_FLAG_DOCUMENT_19",
            "f_FLAG_DOCUMENT_20",
            "f_FLAG_DOCUMENT_21",
            "f_AMT_REQ_CREDIT_BUREAU_HOUR",
            "f_AMT_REQ_CREDIT_BUREAU_DAY",
            "f_AMT_REQ_CREDIT_BUREAU_WEEK",
            "f_AMT_REQ_CREDIT_BUREAU_MON",
            "f_AMT_REQ_CREDIT_BUREAU_QRT",
            "f_AMT_REQ_CREDIT_BUREAU_YEAR",
        ], key=key_SK_ID_CURR),
]


settings = ObservationSettings(
    observation_path="abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/home_credit_data/application_train.csv",
    event_timestamp_column="1609472084",
    timestamp_format="epoch"
)

client.get_offline_features(observation_settings=settings,
                            feature_query=feature_queries,
                            output_path="abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/home_credit_data/output_static_features.avro")
client.wait_job_to_finish(timeout_sec=7200)

2022-07-07 15:02:40.417 | INFO     | feathr._synapse_submission:upload_or_get_cloud_path:62 - Uploading /var/folders/gs/dbrzk90d0m3849n982_q27w40000gn/T/tmptmcr4je1/feathr_pyspark_driver.py to cloud..
2022-07-07 15:02:40.417 | INFO     | feathr._synapse_submission:upload_file:360 - Uploading file feathr_pyspark_driver.py
2022-07-07 15:02:42.467 | INFO     | feathr._synapse_submission:upload_file:366 - /var/folders/gs/dbrzk90d0m3849n982_q27w40000gn/T/tmptmcr4je1/feathr_pyspark_driver.py is uploaded to location: abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/feathr_pyspark_driver.py
2022-07-07 15:02:42.468 | INFO     | feathr._synapse_submission:upload_or_get_cloud_path:65 - /var/folders/gs/dbrzk90d0m3849n982_q27w40000gn/T/tmptmcr4je1/feathr_pyspark_driver.py is uploaded to location: abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/feathr_pyspark_driver.py
2022-07-07 15:02:42.508 | INFO     | feathr._synapse_submission:upload_or_get_cloud_p

## Download the result and show the result

Let's use the helper function `get_result_df` to download the result and view it:

In [11]:
import shutil
def get_result_df(client: FeathrClient) -> pd.DataFrame:
    """Download the job result dataset from cloud as a Pandas dataframe."""
    res_url = client.get_job_result_uri(block=True, timeout_sec=600)
    tmp_dir = "../output_static_features.avro"
    shutil.rmtree(tmp_dir, ignore_errors=True)
    client.feathr_spark_laucher.download_result(result_path=res_url, local_folder=tmp_dir)
    dataframe_list = []
    # assuming the result are in avro format
    for file in glob.glob(os.path.join(tmp_dir, '*.avro')):
        dataframe_list.append(pdx.read_avro(file))
    vertical_concat_df = pd.concat(dataframe_list, axis=0)
    return vertical_concat_df

df_res = get_result_df(client)

2022-07-07 15:10:00.886 | INFO     | feathr._synapse_submission:wait_for_completion:134 - Current Spark job status: success
2022-07-07 15:10:01.252 | INFO     | feathr._synapse_submission:download_file:378 - Beginning reading of results from abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/home_credit_data/output_static_features.avro
Downloading result files: 100%|██████████| 201/201 [03:31<00:00,  1.05s/it]
2022-07-07 15:13:33.670 | INFO     | feathr._synapse_submission:download_file:407 - Finish downloading files from abfss://feathrhomecreditcafs@feathrhomecreditcasto.dfs.core.windows.net/home_credit_data/output_static_features.avro to ../../results/output_static_features.avro.


In [12]:

with pd.option_context('display.max_columns', 50, 'display.max_rows', 1000):
   print(df_res.columns.values.tolist())
   print(df_res[[
      "f_SK_ID_CURR",
      "f_TARGET",
      "f_NAME_CONTRACT_TYPE",
      "f_CODE_GENDER",
      "f_FLAG_OWN_CAR",
      "f_FLAG_OWN_REALTY",
      "f_CNT_CHILDREN",
      "f_AMT_INCOME_TOTAL",
      "f_AMT_CREDIT",
      "f_AMT_ANNUITY",
      "f_AMT_GOODS_PRICE",
      "f_NAME_TYPE_SUITE",
      "f_NAME_INCOME_TYPE",
      "f_NAME_EDUCATION_TYPE",
      "f_NAME_FAMILY_STATUS",
      "f_NAME_HOUSING_TYPE",
      "f_REGION_POPULATION_RELATIVE",
      "f_DAYS_BIRTH",
      "f_DAYS_EMPLOYED",
      "f_DAYS_REGISTRATION",
      "f_DAYS_ID_PUBLISH",
      "f_OWN_CAR_AGE",
      "f_FLAG_MOBIL",
      "f_FLAG_EMP_PHONE",
      "f_FLAG_WORK_PHONE",
      "f_FLAG_CONT_MOBILE",
      "f_FLAG_PHONE",
      "f_FLAG_EMAIL",
      "f_OCCUPATION_TYPE",
      "f_CNT_FAM_MEMBERS",
      "f_REGION_RATING_CLIENT",
      "f_REGION_RATING_CLIENT_W_CITY",
      "f_WEEKDAY_APPR_PROCESS_START",
      "f_HOUR_APPR_PROCESS_START",
      "f_REG_REGION_NOT_LIVE_REGION",
      "f_REG_REGION_NOT_WORK_REGION",
      "f_LIVE_REGION_NOT_WORK_REGION",
      "f_REG_CITY_NOT_LIVE_CITY",
      "f_REG_CITY_NOT_WORK_CITY",
      "f_LIVE_CITY_NOT_WORK_CITY",
      "f_ORGANIZATION_TYPE",
      "f_EXT_SOURCE_1",
      "f_EXT_SOURCE_2",
      "f_EXT_SOURCE_3",
      "f_APARTMENTS_AVG",
      "f_BASEMENTAREA_AVG",
      "f_YEARS_BEGINEXPLUATATION_AVG",
      "f_YEARS_BUILD_AVG",
      "f_COMMONAREA_AVG",
      "f_ELEVATORS_AVG",
      "f_ENTRANCES_AVG",
      "f_FLOORSMAX_AVG",
      "f_FLOORSMIN_AVG",
      "f_LANDAREA_AVG",
      "f_LIVINGAPARTMENTS_AVG",
      "f_LIVINGAREA_AVG",
      "f_NONLIVINGAPARTMENTS_AVG",
      "f_NONLIVINGAREA_AVG",
      "f_APARTMENTS_MODE",
      "f_BASEMENTAREA_MODE",
      "f_YEARS_BEGINEXPLUATATION_MODE",
      "f_YEARS_BUILD_MODE",
      "f_COMMONAREA_MODE",
      "f_ELEVATORS_MODE",
      "f_ENTRANCES_MODE",
      "f_FLOORSMAX_MODE",
      "f_FLOORSMIN_MODE",
      "f_LANDAREA_MODE",
      "f_LIVINGAPARTMENTS_MODE",
      "f_LIVINGAREA_MODE",
      "f_NONLIVINGAPARTMENTS_MODE",
      "f_NONLIVINGAREA_MODE",
      "f_APARTMENTS_MEDI",
      "f_BASEMENTAREA_MEDI",
      "f_YEARS_BEGINEXPLUATATION_MEDI",
      "f_YEARS_BUILD_MEDI",
      "f_COMMONAREA_MEDI",
      "f_ELEVATORS_MEDI",
      "f_ENTRANCES_MEDI",
      "f_FLOORSMAX_MEDI",
      "f_FLOORSMIN_MEDI",
      "f_LANDAREA_MEDI",
      "f_LIVINGAPARTMENTS_MEDI",
      "f_LIVINGAREA_MEDI",
      "f_NONLIVINGAPARTMENTS_MEDI",
      "f_NONLIVINGAREA_MEDI",
      "f_FONDKAPREMONT_MODE",
      "f_HOUSETYPE_MODE",
      "f_TOTALAREA_MODE",
      "f_WALLSMATERIAL_MODE",
      "f_EMERGENCYSTATE_MODE",
      "f_OBS_30_CNT_SOCIAL_CIRCLE",
      "f_DEF_30_CNT_SOCIAL_CIRCLE",
      "f_OBS_60_CNT_SOCIAL_CIRCLE",
      "f_DEF_60_CNT_SOCIAL_CIRCLE",
      "f_DAYS_LAST_PHONE_CHANGE",
      "f_FLAG_DOCUMENT_2",
      "f_FLAG_DOCUMENT_3",
      "f_FLAG_DOCUMENT_4",
      "f_FLAG_DOCUMENT_5",
      "f_FLAG_DOCUMENT_6",
      "f_FLAG_DOCUMENT_7",
      "f_FLAG_DOCUMENT_8",
      "f_FLAG_DOCUMENT_9",
      "f_FLAG_DOCUMENT_10",
      "f_FLAG_DOCUMENT_11",
      "f_FLAG_DOCUMENT_12",
      "f_FLAG_DOCUMENT_13",
      "f_FLAG_DOCUMENT_14",
      "f_FLAG_DOCUMENT_15",
      "f_FLAG_DOCUMENT_16",
      "f_FLAG_DOCUMENT_17",
      "f_FLAG_DOCUMENT_18",
      "f_FLAG_DOCUMENT_19",
      "f_FLAG_DOCUMENT_20",
      "f_FLAG_DOCUMENT_21",
      "f_AMT_REQ_CREDIT_BUREAU_HOUR",
      "f_AMT_REQ_CREDIT_BUREAU_DAY",
      "f_AMT_REQ_CREDIT_BUREAU_WEEK",
      "f_AMT_REQ_CREDIT_BUREAU_MON",
      "f_AMT_REQ_CREDIT_BUREAU_QRT",
      "f_AMT_REQ_CREDIT_BUREAU_YEAR",
   ]])

['SK_ID_CURR', 'TARGET', 'NAME_CONTRACT_TYPE', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN', 'AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE', 'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'REGION_POPULATION_RELATIVE', 'DAYS_BIRTH', 'DAYS_EMPLOYED', 'DAYS_REGISTRATION', 'DAYS_ID_PUBLISH', 'OWN_CAR_AGE', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'REGION_RATING_CLIENT', 'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START', 'HOUR_APPR_PROCESS_START', 'REG_REGION_NOT_LIVE_REGION', 'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY', 'ORGANIZATION_TYPE', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3', 'APARTMENTS_AVG', 'BASEMENTAREA_AVG', 'YEARS_BEGINEXPLUATATION_AVG', 'YEARS_BUILD_AVG', 'COMMONAREA_AVG', 'ELE