# ANOVOS - Feast Integration
Following notebook shows the feast integration supported by ANOVOS package and how it can be invoked accordingly. 
Code that is necessary for a minimal dataflow is contained here as well. 
* [Read Dataset](#Read-Dataset)
* [Write Datasets and export feature definitions](#Write-Datasets-and-export-feature-definitions)

**Setting Spark Session**

In [2]:
from anovos.shared.spark import *

sc.setLogLevel("ERROR")
import warnings
warnings.filterwarnings('ignore')

**Input/Output Path**

In [3]:
inputPath = "../data/income_dataset/csv"
inputPath_parq = "../data/income_dataset/parquet"
inputPath_join = "../data/income_dataset/join"
outputPath = "../output/income_dataset/"

# Read Dataset

- API specification of function **read_dataset** can be found <a href="https://docs.anovos.ai/api/data_ingest/data_ingest.html">here</a>
- Currently supports - csv, parquet, avro

In [4]:
from anovos.data_ingest.data_ingest import read_dataset

In [5]:
df = read_dataset(spark, file_path = inputPath, file_type = "csv",file_configs = {"header": "True", 
                                                                           "delimiter": "," , 
                                                                           "inferSchema": "True"})
df.toPandas().head(5)

                                                                                

Unnamed: 0,ifa,age,workclass,fnlwgt,logfnl,empty,education,education-num,marital-status,occupation,...,capital-gain,capital-loss,hours-per-week,native-country,income,dt_1,dt_2,latitude,longitude,geohash
0,1a,,State-gov,77516.0,4.889391,,Bachelors,13.0,Never-married,Adm-clerical,...,2174.0,0.0,40.0,UnitedStates,<=50K,1/8/16 5:59,1/16/16 5:59,-38.624096,177.982468,rb68np99
1,2a,,Self-emp-not-inc,83311.0,4.920702,,Bachelors,13.0,Married-civ-spouse,Exec-managerial,...,0.0,0.0,13.0,UnitedStates,<=50K,1/8/16 21:09,1/12/16 21:09,-40.880497,174.992142,rckjypw0
2,3a,38.0,Private,215646.0,5.333741,,HS-grad,9.0,Divorced,Handlers-cleaners,...,0.0,0.0,40.0,UnitedStates,<=50K,3/8/16 2:21,3/20/16 2:21,-37.73563,176.164047,rckm712q
3,4a,53.0,Private,234721.0,5.370552,,11th,7.0,Married-civ-spouse,Handlers-cleaners,...,0.0,0.0,40.0,UnitedStates,<=50K,3/8/16 6:31,3/14/16 6:31,-39.536491,176.832321,rckndgte
4,5a,,Private,338409.0,5.529442,,Bachelors,13.0,Married-civ-spouse,Prof-specialty,...,0.0,0.0,40.0,Cuba,<=50K,3/8/16 9:45,3/10/16 9:45,-41.128094,175.033722,rckq4596


# Write Datasets and export feature definitions

A description of feature store related configuration can be found <a href="https://docs.anovos.ai/using-anovos/feature_store.html">here</a>
- API specification of function **generate_feature_description** can be found <a href="https://docs.anovos.ai/api/feature_store/feast_exporter.html">here</a> <br>
- Limitations:
    - repartition for file output needs to be set to 1
    - no incremental updates possible
       

In [6]:
from anovos.feature_store import feast_exporter

In [7]:
#Example 1 - add timestamp columns to df 
entity_config = {
    "name": "income",
    "id_col": "ifa",
    "description": "write_feast_features",
}

file_source_config = {
    "owner": "test@owner.com",
    "description": "data source description",
    "timestamp_col": "event_time",
    "create_timestamp_col": "create_time_col",
}

feature_view_config = {
    "name": "income_view",
    "ttl_in_seconds": 3600000,
    "owner": "view@owner.com",
    "create_timestamps": True,
}

write_feast_features = {
    "entity": entity_config,
    "file_source": file_source_config,
    "feature_view": feature_view_config,
    "file_path": "../data/feast_repo",
    "service_name": "income_feature_service"
}
# read this from yml file in real world


file_source_config = write_feast_features["file_source"]
df = feast_exporter.add_timestamp_columns(df, file_source_config)

Adding timestamp columns


In [8]:
from anovos.data_ingest.data_ingest import write_dataset

In [18]:
write_dataset(df, outputPath, 'parquet',{'repartition':1, 'mode':'overwrite'})

In [17]:
outputPath

'../output/income_dataset/'

In [16]:
df.toPandas()

                                                                                

Unnamed: 0,ifa,age,workclass,fnlwgt,logfnl,empty,education,education-num,marital-status,occupation,...,hours-per-week,native-country,income,dt_1,dt_2,latitude,longitude,geohash,event_time,create_time_col
0,1a,,State-gov,77516.0,4.889391,,Bachelors,13.0,Never-married,Adm-clerical,...,40.0,UnitedStates,<=50K,1/8/16 5:59,1/16/16 5:59,-38.624096,177.982468,rb68np99,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
1,2a,,Self-emp-not-inc,83311.0,4.920702,,Bachelors,13.0,Married-civ-spouse,Exec-managerial,...,13.0,UnitedStates,<=50K,1/8/16 21:09,1/12/16 21:09,-40.880497,174.992142,rckjypw0,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
2,3a,38.0,Private,215646.0,5.333741,,HS-grad,9.0,Divorced,Handlers-cleaners,...,40.0,UnitedStates,<=50K,3/8/16 2:21,3/20/16 2:21,-37.735630,176.164047,rckm712q,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
3,4a,53.0,Private,234721.0,5.370552,,11th,7.0,Married-civ-spouse,Handlers-cleaners,...,40.0,UnitedStates,<=50K,3/8/16 6:31,3/14/16 6:31,-39.536491,176.832321,rckndgte,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
4,5a,,Private,338409.0,5.529442,,Bachelors,13.0,Married-civ-spouse,Prof-specialty,...,40.0,Cuba,<=50K,3/8/16 9:45,3/10/16 9:45,-41.128094,175.033722,rckq4596,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,32557a,27.0,Private,257302.0,,,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,...,38.0,United-States,<=50K,4/14/19 22:59,4/26/19 22:59,-41.293278,174.783737,rcm32hdg,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
32557,32558a,40.0,Private,154374.0,,,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,...,40.0,United-States,>50K,4/15/19 7:29,4/17/19 7:29,-45.855858,170.513382,rb6b82me,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
32558,32559a,58.0,Private,151910.0,,,HS-grad,9.0,Widowed,Adm-clerical,...,40.0,United-States,<=50K,4/15/19 8:54,4/24/19 8:54,-37.743980,175.225586,rckqh5tv,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829
32559,32560a,22.0,Private,201490.0,,,HS-grad,9.0,Never-married,Adm-clerical,...,20.0,United-States,<=50K,4/15/19 10:25,4/25/19 10:25,-37.750027,175.278122,rckkughm,2022-11-21 06:44:56.972974,2022-11-21 06:44:57.243829


In [10]:
import os 
import glob

In [11]:
# Example 1 - write feast feature configuration into feast repository
path = os.path.join(write_main["file_path"], "final_dataset", "part*")
filename = glob.glob(path)[0]
feast_exporter.generate_feature_description(df.dtypes, write_feast_features, filename)

NameError: name 'write_main' is not defined