## Demo update online feature store with write_to_online_store method

To ensure no down time happend when we update feature(value or type). We can follow these steps

1. Update offline feature store/registry
2. Get offline feature(date=today) that need to be replaced
3. Load the dataframe(step 2) into online feature store with write_to_online_store method

In [10]:
## Check redis online store connection
import redis
import pandas as pd
port = 6379
client = redis.Redis(host = "cache", port = port)
client.ping()

True

### Initialize online feature store

In [11]:
%cd feature_repo
!feast teardown

[Errno 2] No such file or directory: 'feature_repo'
/usr/src/feature_repo


In [12]:
# Copy initial data to materilize 
!cp data/original_driver_stats.parquet data/driver_stats.parquet

In [13]:
# Checking the initial data
data = pd.read_parquet("data/driver_stats.parquet")
data[["driver_id", "avg_daily_trips"]]

Unnamed: 0,driver_id,avg_daily_trips
0,1005,682
1,1005,656
2,1005,649
3,1005,346
4,1005,878
...,...,...
1802,1001,488
1803,1001,966
1804,1001,74
1805,1003,52


In [14]:
# enable direct load to online store(I set it up already in the config)
!feast alpha enable direct_ingest_to_online_store
!feast apply

Created entity [1m[32mdriver_id[0m
Created feature view [1m[32mdriver_hourly_stats[0m

Deploying infrastructure for [1m[32mdriver_hourly_stats[0m


In [15]:
from datetime import datetime, date
# !feast materialize-incremental {datetime.now().isoformat()}
!feast materialize {date.fromisoformat('2019-12-04')} {datetime.now().isoformat()}

Materializing [1m[32m1[0m feature views from [1m[32m2019-12-04 00:00:00+00:00[0m to [1m[32m2022-04-21 14:50:06+00:00[0m into the [1m[32mredis[0m online store.

[1m[32mdriver_hourly_stats[0m:
100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1518.14it/s]


In [16]:
# get current online features
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")

def get_online_features():

    feature_vector = store.get_online_features(
        features=[
            "driver_hourly_stats:avg_daily_trips",
        ],
        entity_rows=[
            {"driver_id": 1001},
            {"driver_id": 1002},
            {"driver_id": 1003},
            {"driver_id": 1004},
            {"driver_id": 1005},
        ],
    ).to_dict()
    return feature_vector
features = get_online_features()

In [17]:
# show the current avg_daily_trips feature 
pd.DataFrame.from_dict(features)

Unnamed: 0,driver_id,avg_daily_trips
0,1001,966
1,1002,314
2,1003,506
3,1004,256
4,1005,387


In [18]:
print(type(features["avg_daily_trips"][0]))
# value < 1000, type = int
# the next section will show how to update this feature

<class 'int'>


## Update online store without down time

In this section we will change 
1. avg_daily_trips=3*avg_daily_trips
2. Type: int -> float


### 1. Update offline feature store/registry


In [19]:
# remove the old offline data and replace with new data
!rm data/driver_stats.parquet
!cp data/updated_driver_stats.parquet data/driver_stats.parquet

In [20]:
# Checking the updated data
data = pd.read_parquet("data/driver_stats.parquet")
data[["driver_id", "avg_daily_trips"]]
# new data have value x3 and the type int->float

Unnamed: 0,driver_id,avg_daily_trips
0,1005,20460.0
1,1005,19680.0
2,1005,19470.0
3,1005,10380.0
4,1005,26340.0
...,...,...
1802,1001,14640.0
1803,1001,28980.0
1804,1001,2220.0
1805,1003,1560.0


In [21]:
# verify the current online store state
features = get_online_features()
pd.DataFrame.from_dict(features)

Unnamed: 0,driver_id,avg_daily_trips
0,1001,966
1,1002,314
2,1003,506
3,1004,256
4,1005,387


In [22]:
pprint(type(features["avg_daily_trips"][0]))
# the value and type still the same as the old data

<class 'int'>


### 2. Get offline feature(date=today) that need to be replaced

In [16]:
from datetime import datetime
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003, 1004, 1005],
        "event_timestamp": [
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
        ],
    }
)

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

training_df["created"] = datetime.now()
training_df

Unnamed: 0,driver_id,event_timestamp,conv_rate,acc_rate,avg_daily_trips,created
720,1002,2022-04-21 14:35:42.282081+00:00,0.651089,0.426752,9420.0,2022-04-21 14:35:42.523725
1081,1003,2022-04-21 14:35:42.282081+00:00,0.975326,0.21654,15180.0,2022-04-21 14:35:42.523725
359,1001,2022-04-21 14:35:42.282078+00:00,0.349679,0.015979,28980.0,2022-04-21 14:35:42.523725
1444,1004,2022-04-21 14:35:42.282082+00:00,0.277619,0.206873,7680.0,2022-04-21 14:35:42.523725
1805,1005,2022-04-21 14:35:42.282082+00:00,0.598443,0.863169,11610.0,2022-04-21 14:35:42.523725


### 3. Load the dataframe(step 2) into online feature store with write_to_online_store method

In [17]:
store.write_to_online_store("driver_hourly_stats", training_df)

In [23]:
# verify the current online store state
features = get_online_features()
pd.DataFrame.from_dict(features)

Unnamed: 0,driver_id,avg_daily_trips
0,1001,28980
1,1002,9420
2,1003,15180
3,1004,7680
4,1005,11610


In [24]:
# the value had change according to new offline store. This also work with type