# Initiate Feast Repository in Teradata

From command line run: **feast-td init-repo**

In [10]:
!ls -l ~/EFS

total 16
drwxr-xr-x  6 my-username  staff   192 Feb  8 13:24 [34mfeature_repo[m[m
-rw-r--r--  1 my-username  staff  6742 Feb  8 12:22 test_workflow.py


In [19]:
%cd ~/EFS

/Users/my-username/EFS


In [20]:
!ls -l feature_repo

total 24
drwxr-xr-x  3 my-username  staff    96 Feb  8 13:24 [34m__pycache__[m[m
drwxr-xr-x  3 my-username  staff    96 Feb  8 13:01 [34mdata[m[m
-rw-r--r--  1 my-username  staff  5996 Feb  8 12:22 driver_repo.py
-rw-r--r--  1 my-username  staff   593 Feb  8 13:25 feature_store.yaml


#### feature_store.yaml

#### driver_repo.py

Defines entity, source and views for demo entity "driver". Below is the example of simple definition.

Now to explain the different components:
* **TeradataSource**: Data Source for features stored in Teradata (Enterprise or Lake) or accessible via a Foreign Table from Teradata (NOS, QueryGrid)
* **Entity**: A collection of semantically related features
* **Feature View**: A feature view is a group of feature data from a specific data source. Feature views allow you to consistently define features and their data sources, enabling the reuse of feature groups across a project

## Run testing workflow to generate testing data

**test_workflow.py** file contains sample E2E process

In [21]:
!cat test_workflow.py

import random
import subprocess
import pandas as pd
import yaml

from datetime import datetime, timedelta
from pytz import utc

from feast import FeatureStore
from feast.data_source import PushMode


def run_demo():
    store = FeatureStore(repo_path="feature_repo")
    print("\n--- Run feast apply to setup feature store on Teradata ---")
    command = "cd feature_repo; feast apply"
    subprocess.run(command, shell=True)

    print("\n--- Historical features for training ---")
    fetch_historical_features_entity_df(store, for_batch_scoring=False)

    print("\n--- Historical features for batch scoring ---")
    fetch_historical_features_entity_df(store, for_batch_scoring=True)

    print(
        "\n--- Historical features for training (all entities in a window using SQL entity dataframe) ---"
    )
    fetch_historical_features_entity_sql(store, for_batch_scoring=False)

    print(
        "\n--- Historical features for batch scoring (all entities in a window using SQL entity datafr

In [18]:
%run test_workflow.py


--- Run feast apply to setup feature store on Teradata ---




Created entity driver
Created feature view driver_hourly_stats
Created feature view driver_hourly_stats_fresh
Created on demand feature view transformed_conv_rate_fresh
Created on demand feature view transformed_conv_rate
Created feature service driver_activity_v3
Created feature service driver_activity_v2
Created feature service driver_activity_v1





Deploying infrastructure for driver_hourly_stats
Deploying infrastructure for driver_hourly_stats_fresh

--- Historical features for training ---




   driver_id     event_timestamp  label_driver_reported_satisfaction  \
0       1002 2021-04-12 08:12:10                                   5   
1       1003 2021-04-12 16:40:26                                   3   
2       1001 2021-04-12 10:59:42                                   1   

   val_to_add  val_to_add_2  conv_rate  acc_rate  avg_daily_trips  \
0           2            20   0.408351  0.996706              330   
1           3            30   0.269635  0.668824               22   
2           1            10   0.494244  0.646646              265   

   conv_rate_plus_val1  conv_rate_plus_val2  
0             2.408351            20.408351  
1             3.269635            30.269635  
2             1.494244            10.494244  

--- Historical features for batch scoring ---
   driver_id            event_timestamp  label_driver_reported_satisfaction  \
0       1001 2023-02-08 14:31:45.376439                                   1   
1       1002 2023-02-08 14:31:45.376439      

100%|█████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  1.96it/s]


[1m[32mdriver_hourly_stats[0m from [1m[32m2013-02-20 14:33:06+00:00[0m to [1m[32m2023-02-08 14:32:55+00:00[0m:


100%|█████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  1.77it/s]



--- Online features ---
acc_rate  :  [0.3568315804004669, 0.9236539602279663]
conv_rate_plus_val1  :  [1000.5405999422073, 1001.7620654702187]
conv_rate_plus_val2  :  [2000.5405999422073, 2002.7620654702187]
driver_id  :  [1001, 1002]

--- Online features retrieved (instead) through a feature service---
conv_rate  :  [0.5405999422073364, 0.7620654702186584]
conv_rate_plus_val1  :  [1000.5405999422073, 1001.7620654702187]
conv_rate_plus_val2  :  [2000.5405999422073, 2002.7620654702187]
driver_id  :  [1001, 1002]

--- Online features retrieved (using feature service v3, which uses a feature view with a push source---
acc_rate  :  [0.3568315804004669, 0.9236539602279663]
avg_daily_trips  :  [55, 824]
conv_rate  :  [0.5405999422073364, 0.7620654702186584]
conv_rate_plus_val1  :  [1000.5405999422073, 1001.7620654702187]
conv_rate_plus_val2  :  [2000.5405999422073, 2002.7620654702187]
driver_id  :  [1001, 1002]

--- Simulate a stream event ingestion of the hourly stats df ---
   driver_id  



Detailed description of different parts of demo script is available on: https://medium.com/teradata/enabling-highly-scalable-feature-store-with-teradata-vantage-and-feast-e01008fa8fdb

## Test whether data are really in EFS

In [23]:
!pip install teradataml



In [67]:
from teradataml import *

In [68]:
#clean any previous open connection to Vantage
try:
    remove_context()
except Exception: 
    pass

In [69]:
Param = {
    'host'               : 'my-server', 
        'user'     : '*********', 
        'password' : "**********",
    'logmech'            : 'LDAP',
    'database'           : '*********'
    }

create_context(**Param)



Engine(teradatasql://:***@my-server/?DATABASE=my-username&LOGDATA=%2A%2A%2A&LOGMECH=%2A%2A%2A&USER=my-username)

In [70]:
#Tools to manipulate in db remote DataFrames
from sqlalchemy import *

import pandas as pd
import numpy as np

import getpass as gp

In [71]:
amp = DataFrame.from_query("SELECT HASHAMP()+1 as number_amps")

In [72]:
amp

number_amps
216


### Source table

In [63]:
df = DataFrame.from_query("select * from EFS_feast_driver_hourly_stats")

In [64]:
df

event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
2023-01-29 17:00:00.000000+00:,1005,0.7775729894638062,0.8009993433952332,197,2023-02-08 13:24:39.269000
2023-01-29 12:00:00.000000+00:,1005,0.8104477524757385,0.1386490017175674,625,2023-02-08 13:24:39.269000
2023-01-30 23:00:00.000000+00:,1005,0.5619951486587524,0.454708069562912,825,2023-02-08 13:24:39.269000
2023-02-04 19:00:00.000000+00:,1005,0.914663791656494,0.0755914002656936,227,2023-02-08 13:24:39.269000
2023-01-27 19:00:00.000000+00:,1005,0.6499730944633484,0.6637897491455078,423,2023-02-08 13:24:39.269000
2023-02-07 14:00:00.000000+00:,1005,0.9088509678840636,0.551448404788971,896,2023-02-08 13:24:39.269000
2023-02-06 08:00:00.000000+00:,1005,0.9945278167724608,0.1370145827531814,469,2023-02-08 13:24:39.269000
2023-02-05 21:00:00.000000+00:,1005,0.3578195571899414,0.5817363858222961,947,2023-02-08 13:24:39.269000
2023-01-29 05:00:00.000000+00:,1005,0.3570314347743988,0.095795176923275,155,2023-02-08 13:24:39.269000
2023-01-27 22:00:00.000000+00:,1005,0.191768392920494,0.2689937651157379,54,2023-02-08 13:24:39.269000


### Materialized feature views

In [83]:
df = DataFrame.from_query("select * from EFS_driver_hourly_stats")
df

entity_feature_key,entity_key,feature_name,value,event_ts,created_ts
b'20000006472697665725F69640400000008000000E9030000000000006176675F6461696C795F7472697073',b'20000006472697665725F69640400000008000000E903000000000000',avg_daily_trips,b'20D303',2023-02-08 19:33:51.731730,2023-02-08 14:33:51.731734
b'20000006472697665725F69640400000008000000EB030000000000006163635F72617465',b'20000006472697665725F69640400000008000000EB03000000000000',acc_rate,b'353F6A963D',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000ED030000000000006176675F6461696C795F7472697073',b'20000006472697665725F69640400000008000000ED03000000000000',avg_daily_trips,b'208D04',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EA03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000EA03000000000000',conv_rate,b'35B916433F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EC03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000EC03000000000000',conv_rate,b'35B6F8713F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000ED03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000ED03000000000000',conv_rate,b'35CA24C33E',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000E9030000000000006163635F72617465',b'20000006472697665725F69640400000008000000E903000000000000',acc_rate,b'355074803F',2023-02-08 19:33:51.731730,2023-02-08 14:33:51.731734
b'20000006472697665725F69640400000008000000ED030000000000006163635F72617465',b'20000006472697665725F69640400000008000000ED03000000000000',acc_rate,b'35BCEB293F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EA030000000000006163635F72617465',b'20000006472697665725F69640400000008000000EA03000000000000',acc_rate,b'3596746C3F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EA030000000000006176675F6461696C795F7472697073',b'20000006472697665725F69640400000008000000EA03000000000000',avg_daily_trips,b'20B806',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000


In [75]:
df = DataFrame.from_query("select * from EFS_driver_hourly_stats_fresh")
df

entity_feature_key,entity_key,feature_name,value,event_ts,created_ts
b'20000006472697665725F69640400000008000000EA030000000000006163635F72617465',b'20000006472697665725F69640400000008000000EA03000000000000',acc_rate,b'3596746C3F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EC030000000000006163635F72617465',b'20000006472697665725F69640400000008000000EC03000000000000',acc_rate,b'35192EE73E',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000ED030000000000006176675F6461696C795F7472697073',b'20000006472697665725F69640400000008000000ED03000000000000',avg_daily_trips,b'208D04',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EB030000000000006163635F72617465',b'20000006472697665725F69640400000008000000EB03000000000000',acc_rate,b'353F6A963D',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EA03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000EA03000000000000',conv_rate,b'35B916433F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000EC03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000EC03000000000000',conv_rate,b'35B6F8713F',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000E9030000000000006176675F6461696C795F7472697073',b'20000006472697665725F69640400000008000000E903000000000000',avg_daily_trips,b'20D303',2023-02-08 19:33:51.731730,2023-02-08 14:33:51.731734
b'20000006472697665725F69640400000008000000E903000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000E903000000000000',conv_rate,b'350000803F',2023-02-08 19:33:51.731730,2023-02-08 14:33:51.731734
b'20000006472697665725F69640400000008000000ED03000000000000636F6E765F72617465',b'20000006472697665725F69640400000008000000ED03000000000000',conv_rate,b'35CA24C33E',2023-02-08 12:00:00.000000,2023-02-08 13:24:39.269000
b'20000006472697665725F69640400000008000000E9030000000000006163635F72617465',b'20000006472697665725F69640400000008000000E903000000000000',acc_rate,b'355074803F',2023-02-08 19:33:51.731730,2023-02-08 14:33:51.731734
