## About this notebook
@author: Yingding Wang

This notebook shows the approach to save and load a feaure data set with google feast connected to an online postgres db.
* feast: https://feast.dev/
* feast-postgres: https://github.com/nossrannug/feast-postgres
* postgres on dockerhub: https://hub.docker.com/_/postgres


In [None]:
# %%capture 
# discard the pip output

import sys, os
!{sys.executable} -m pip install feast feast-postgres python-dotenv pandas

## (optional) create an .env file 

Uncomment the following cell to create an .env file, change the param values to meet your settings before running the cell

Note:
Uncomment the block comment and the `%%writefile .env` should be the first in the line.

In [None]:
'''
%%writefile .env
# environment variables for online feature_store
ON_FS_HOST="POSTGRES_HOST_NAME"
ON_FS_DB="featurestore"
ON_FS_PORT="5432"
ON_FS_USER="postgres_name"
ON_FS_PW="postgres_pw"
'''

In [None]:
from dotenv import load_dotenv
# load all values paar from ./.env into environment variables
load_dotenv()

'''
print(f"\
{os.environ['ON_FS_HOST']}\n\
{os.environ['ON_FS_DB']}\n\
{os.environ['ON_FS_PORT']}\n\
{os.environ['ON_FS_USER']}\n\
{os.environ['ON_FS_PW']}\n\
")
'''

## Init a feature store local repo

In [None]:
feature_repo="/home/jovyan/feature_repo"
config_file=f"{feature_repo}/feature_store.yaml"
# print(f"{config_file}")

In [None]:
# init local repo
!feast init $feature_repo

## Update the feature store configuration file
* create custom "%%writetemplate" magic command: https://stackoverflow.com/questions/26385041/is-it-possible-to-write-the-value-of-a-variable-in-a-writefile-magic-command-i/63784887#63784887
* use filename variable in "%%writefile" magic command: https://github.com/ipython/ipython/issues/6701#issue-45873574

In [None]:
# import feast_postgres

In [None]:
from IPython.core.magic import register_line_cell_magic

# create a custom template magic command %%writetemplate
# https://github.com/ipython/ipython/issues/6701#issuecomment-382640776
@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))
        
        
# all need to assign global variables from environment variables
ON_FS_HOST=os.environ['ON_FS_HOST']
ON_FS_PORT=os.environ['ON_FS_PORT']
ON_FS_DB=os.environ['ON_FS_DB']
ON_FS_USER=os.environ['ON_FS_USER']
ON_FS_PW=os.environ['ON_FS_PW']

In [None]:
%%writetemplate $config_file
project: feature_repo
registry: data/registry.db
provider: local
online_store:
    type: feast_postgres.PostgreSQLOnlineStore # MUST be this value
    host: {ON_FS_HOST}
    port: {ON_FS_PORT}   # Optional, default is 5432
    database: {ON_FS_DB} # postgres is the default postgres db
    db_schema: feature_store      # Optional, default is None      
    user: {ON_FS_USER}
    password: {ON_FS_PW}    

Examples of change the working directory in Jupyter Notebook, which is not needed in this example:

* get current work directory: `CUR_DIR=os.getcwd()`
* Default work directory: `WORK_DIR="/home/jovyan/"`
* change the current work directory: `os.chdir(feature_repo)`

In [None]:
# apply the features, and the notebook cell work directory remain unchanged
!cd $feature_repo && feast apply

## deploy feature to online feature store

https://www.mikulskibartosz.name/adding-datasets-to-feast-feature-store/

In [None]:
proteome_olink_data_path="/home/jovyan/data/Proteome_Olink_data.csv"

In [None]:
import pandas
from datetime import datetime, timezone
df = pandas.read_csv(proteome_olink_data_path)
# df.reset_index(level=0, inplace=True) # turn  the index into a columne

# datetime(year, month, day, hour, min, sec).timestamp() returns utc timestamp in secs as float, cast it to int()
# ts = datetime(2021, 11, 22, 20, 0, 0).replace(tzinfo=timezone.utc).timestamp()
# ts_rounded = int(ts)

# must be a datetime and can not be int timestamp
df['observation_dt'] = datetime(2021, 11, 22, 20, 30, 0).replace(tzinfo=timezone.utc)
df.head(5)

In [None]:
df.describe()

In [None]:
# save the dataframe to local parquet file
df.to_parquet("/home/jovyan/feature_repo/data/proteome_olink.parquet")

## Define new features in Feast repository

The code flow does three things:

* It defines the feature source location. In this case, a path to the local file system. Note that the FileSource also requires the column containing the event timestamp.
* The Entity object describes which column contains the entity identifier. In our example, the value is useless and has no business meaning, but we still need it.
* Finally, we define the FeatureView, which combines the available column names (and types) with the entity identifier and the data location. We have only historical data in our example, so I set the online parameter to False.

Note:\
Since Feast 0.11, we can skip the features parameter in FeatureView, and the library will infer the column names and types from the data.

In [None]:
%%writefile $feature_repo/proteome_olink.py
# this file defines the proteome_oline_data.csv
from datetime import timedelta
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType, FileSource
#from feast.data_source import FileSource

proteome_olink_observations = FileSource(
    path="/home/jovyan/feature_repo/data/proteome_olink.parquet",
    event_timestamp_column="observation_dt",
)

proteome_olink = Entity(name="OlinkID", value_type=ValueType.STRING, description="olink identifier",)

proteome_olink_observations_view = FeatureView(
    name="proteome_olink_observations",
    entities=["OlinkID"],
    ttl=timedelta(days=-1),
#    features=[
#        Feature(name="UniPort", dtype=ValueType.STRING),
#        Feature(name="sepal_width", dtype=ValueType.FLOAT),
#        Feature(name="petal_length", dtype=ValueType.INT64),
#        Feature(name="petal_width", dtype=ValueType.INT64),
#        Feature(name="species", dtype=ValueType.STRING),
#    ],
    online=True,
#    input=proteome_olink_observations,
    batch_source=proteome_olink_observations,
    tags={},
)

In [None]:
# reload feature repository
!cd $feature_repo && feast apply

In [None]:
!cd $feature_repo && feast version

## Populate data to online store
https://aws.amazon.com/blogs/opensource/getting-started-with-feast-an-open-source-feature-store-running-on-aws-managed-services/

In [None]:
# populate feature value to online store, incremental
!cd feature_repo && feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

## Advanced feature store with TTL

**What does TTL mean?**\
In the example below, we retrieve the value from the feature store. We must specify the event_timestamp. The ttl describes the maximal time difference between the actual event timestamp and the timestamp we want to get. Of course, it is a difference “in the past.” We can never retrieve events “in the future.”