# Run Feast and PSQL

bla bla bla

## Prerequisites

* `Red Hat OpenShift pipelines` operator installed
* Add `admin` role to the ServiceAccount of the current notebook, e.g. something like:
```console
oc adm policy add-cluster-role-to-user -z jupyter-nb-dmartino-40redhat-2ecom -n rhods-notebooks admin
```
(this is needed to run `oc` commands from the notebook)

## Install requirements

In [9]:
!cat requirements.txt
!echo '-------------'
!pip install -r requirements.txt

feast==0.36.0
psycopg2>=2.9
-------------


## Create PSQL DB

Create the PSQL DB in the **TARGET_NS** namespace by running the following command to instantiate the application from the template:

In [2]:
# Update it to use a different namespace
%env TARGET_NS=feast

env: TARGET_NS=feast


In [3]:
from IPython.display import Markdown as md
import os

ns = os.environ.get('TARGET_NS')
md(f'''
**Note**: namespace  {ns}  must exist before, otherwise run the following from your CLI logged to the OpenShift console:
```console
oc create ns {ns}
```
''')



**Note**: namespace  feast  must exist before, otherwise run the following from your CLI logged to the OpenShift console:
```console
oc create ns feast
```


In [7]:
!oc process -n openshift postgresql-persistent \
DATABASE_SERVICE_NAME=postgresql POSTGRESQL_USER=feast POSTGRESQL_PASSWORD=feast \
POSTGRESQL_DATABASE=feast VOLUME_CAPACITY=20Gi MEMORY_LIMIT=1Gi | oc apply -f - -n ${TARGET_NS} 

secret/postgresql configured
service/postgresql configured
persistentvolumeclaim/postgresql created
deploymentconfig.apps.openshift.io/postgresql configured


Wait until the DB is running:

In [5]:
!oc wait pod -l deploymentconfig=postgresql -n ${TARGET_NS} --for=condition=Ready=true --timeout=5m

pod/postgresql-3-fsnbl condition met


## Create sample feast repo

In [6]:
!feast init -m sample_repo

Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage

Creating a new Feast repository in [1m[32m/opt/app-root/src/feast-workshop-team-share/feast_showcase_notebook/sample_repo[0m.



Update the default repository to use the local deployment of PSQL DB:

In [7]:
!sed "s/_NAMESPACE_/$TARGET_NS/" templates/feature_store.yaml  > sample_repo/feature_repo/feature_store.yaml
!cat sample_repo/feature_repo/feature_store.yaml

project: feast_postgres
provider: local
registry:
    registry_store_type: PostgreSQLRegistryStore
    path: feast_registry
    host: postgresql.feast.svc.cluster.local
    port: 5432
    database: feast
    db_schema: feast
    user: feast
    password: feast
online_store:
    type: postgres
    host: postgresql.feast.svc.cluster.local
    port: 5432
    database: feast
    db_schema: feast
    user: feast
    password: feast
offline_store:
    type: postgres
    host: postgresql.feast.svc.cluster.local
    port: 5432
    database: feast
    db_schema: feast
    user: feast
    password: feast
entity_key_serialization_version: 2



Update the entity definition from the template

In [8]:
!cp templates/example_repo.py sample_repo/feature_repo

## Validate DB state

Create the function to read the list of tables

In [None]:
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

psqlHost = 'postgresql.feast.svc.cluster.local'
psqlPort = 5432
psqlUsername = 'feast'
psqlPassword = 'feast'
psqlDb = 'feast'
psqlSchema = 'feast'

In [None]:
# Executes a generic sql_query and return the result as a Pandas DataFrame
def fetchToDF(sql_query):
    engine = create_engine(f'postgresql+psycopg2://{psqlUsername}:{psqlPassword}@{psqlHost}:{str(psqlPort)}/{psqlDb}')

    # SQL command to list tables
    with engine.connect() as conn:
        query = conn.execute(sql_query)
    
    df = pd.DataFrame(query.fetchall())
    return df

In [None]:
# Executes a generic sql_query and return the result as a Pandas DataFrame
def executeSql(sql_command):
    engine = create_engine(f'postgresql+psycopg2://{psqlUsername}:{psqlPassword}@{psqlHost}:{str(psqlPort)}/{psqlDb}')

    # SQL command to list tables
    with engine.connect() as conn:
        conn.execute(sql_command)

In [None]:
# Reads the tables names
def readTables():
    return fetchToDF("SELECT table_name FROM information_schema.tables WHERE table_schema = 'feast';")


Invoke it and verify there are no tables

In [None]:
df = readTables()
assert len(df) == 0
print('No tables found, as expected')

### Populate offline data from sample parquet

Use `create_driver_hourly_stats_df` to create sample data and push to the data source table `feast_driver_hourly_stats`

In [None]:
from feast.file_utils import replace_str_in_file
from feast.infra.utils.postgres.connection_utils import df_to_postgres_table
from feast.infra.utils.postgres.postgres_config import PostgreSQLConfig
from feast.driver_test_data import create_driver_hourly_stats_df
from datetime import datetime, timedelta

import psycopg2
config_file = "sample_repo/feature_repo/feature_store.yaml"

end_date = datetime.now().replace(microsecond=0, second=0, minute=0)
start_date = end_date - timedelta(days=15)

driver_entities = [1001, 1002, 1003, 1004, 1005]
driver_df = create_driver_hourly_stats_df(driver_entities, start_date, end_date)

tableName = 'feast_driver_hourly_stats'
executeSql(f'DROP TABLE IF EXISTS {tableName}')

df_to_postgres_table(
    config=PostgreSQLConfig(
        host=psqlHost,
        port=psqlPort,
        database=psqlDb,
        db_schema=psqlSchema,
        user=psqlUsername,
        password=psqlPassword,
    ),
    df=driver_df,
    table_name=tableName
)
print(f'Bootstrap completed, added {len(driver_df)} to {tableName}')


In [None]:
df=fetchToDF('select * from feast_driver_hourly_stats')
df.head()

In [None]:
assert len(df) == 1807
print(f'Found {len(df)} items')

## Feature store deployment

Now apply the Feast repository and then validate it has the new tables

In [None]:
!feast -c sample_repo/feature_repo apply

In [None]:
df = readTables()
expected = 4
assert len(df) == expected
print(f'Found {expected} tables, as expected: {",".join(df["table_name"])}')


### Feast state

Verify Feast resources using `feast` CLI

In [None]:
!feast -c sample_repo/feature_repo entities list
!feast -c sample_repo/feature_repo feature-views list
!feast -c sample_repo/feature_repo feature-services list

## Install Feature server

Install the [Python feature server](https://docs.feast.dev/reference/feature-servers/python-feature-server)

### Install Helm

In [None]:
!curl https://get.helm.sh/helm-v3.14.3-linux-amd64.tar.gz --output helm.tar.gz
!gunzip -f helm.tar.gz 
!tar xvf helm.tar 
!mv ./linux-amd64/helm .
!./helm version

### Install from chart

In [None]:
!./helm repo add feast-charts https://feast-helm-charts.storage.googleapis.com
!./helm repo update

In [None]:
import base64
import os

file_path = 'sample_repo/feature_repo/feature_store.yaml'
with open(file_path, 'rb') as file:
    file_content = file.read()

base64_encoded = base64.b64encode(file_content)
os.environ['FEATURE_STORE_YAML_BASE64'] = base64_encoded.decode('utf-8')

In [None]:
!echo $FEATURE_STORE_YAML_BASE64

!./helm upgrade --install -n $TARGET_NS feast-release feast-charts/feast-feature-server --set image.tag=0.36.0 --set feature_store_yaml_base64=$FEATURE_STORE_YAML_BASE64

Patch the deployment to silent Feast usage stats (raises a disturbing warning `Certificate did not match expected hostname: usage.feast.dev`)

In [None]:
!oc patch deployment/feast-release-feast-feature-server --type=json --patch '[{"op": "add", "path": "/spec/template/spec/containers/0/env/-", "value": {"name": "FEAST_USAGE", "value": "False"}}]'

Wait until the server is running

In [None]:
!oc wait pod -l app.kubernetes.io/instance=feast-release -n ${TARGET_NS} --for=condition=Ready=true --timeout=5m

## Integration test

Run use cases and validate using `Feature server` or [Python SDK](https://rtd.feast.dev/en/master/)

### Fetch offline data

#### From server

No API available on the server for offline data

#### From Python SDK

Validate historical features using the data source populated in a [previous step](#Populate-offline-data-from-sample-parquet)

In [None]:
from feast import FeatureStore
from datetime import datetime, timedelta

def fetchHistoricalDataForTest():
    end_date = datetime.now().replace(microsecond=0, second=0, minute=0)
    start_date = end_date - timedelta(days=14)
    test_ts = start_date.replace(hour=6)
    entity_df = pd.DataFrame.from_dict(
        {
            "driver_id": [1001, 1002, 1003],
            "event_timestamp": [
                test_ts,
                test_ts,
                test_ts,
            ],
            "label_driver_reported_satisfaction": [1, 5, 3],
            "val_to_add": [1, 2, 3],
            "val_to_add_2": [10, 20, 30],
        }
    )

    store = FeatureStore(repo_path="sample_repo/feature_repo")

    test_df = store.get_historical_features(
        entity_df=entity_df,
        features=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips",
            "transformed_conv_rate:conv_rate_plus_val1",
            "transformed_conv_rate:conv_rate_plus_val2",
        ],
    ).to_df()
    return test_df


In [None]:
test_df = fetchHistoricalDataForTest()
test_df.head()

In [None]:
assert len(test_df) == 3

In [None]:
### Fetch online data

#### From server

In [None]:
import requests

def fetchOnlineFeaturesFromServer():
    payload = {
        "features": [
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips"
        ],
        "entities": {
            "driver_id": [1001, 1002, 1003]
        }
    }

    url = "http://feast-feature-server.feast.svc.cluster.local/get-online-features"

    return requests.post(url, json=payload)

In [None]:
online = fetchOnlineFeaturesFromServer()

In [None]:
assert online.status_code == 200
try:
    import json

    json_data = json.loads(online.text)
    # print(json.dumps(json_data, indent=2))
    assert "metadata" in json_data
    # Validate metadata
    assert "feature_names" in json_data["metadata"]
    featureNames = json_data["metadata"]["feature_names"]
    assert len(featureNames) == 4
    assert "driver_id" in featureNames
    assert "conv_rate" in featureNames
    assert "acc_rate" in featureNames
    assert "avg_daily_trips" in featureNames
    # Validate data: all NOT_FOUND
    for index, feature in enumerate(featureNames[1:]):
        statuses = json_data["results"][index + 1]["statuses"]
        # print(f'Statuses of {feature}/{index} are {statuses}')
        assert "PRESENT" not in statuses
        assert "NOT_FOUND" in statuses
    print(f'No online data found for all queried features {featureNames}')
except ImportError:
    print(response.text)

#### From DB

### Materialize

#### Materialize from server

In [None]:
import requests

end_date = datetime.now().replace(microsecond=0, second=0, minute=0)
start_date = end_date - timedelta(days=15)
start_date = start_date.replace(hour=0)

payload = {
    "start_ts": str(start_date),
    "end_ts": str(datetime.now())
}

url = "http://feast-feature-server.feast.svc.cluster.local/materialize"

response = requests.post(url, json=payload)

In [None]:
assert response.status_code == 200

#### Validate from server

In [None]:
online = fetchOnlineFeaturesFromServer()

In [None]:
assert online.status_code == 200
try:
    import json

    json_data = json.loads(online.text)
    # print(json.dumps(json_data, indent=2))
    assert "metadata" in json_data
    # Validate metadata
    assert "feature_names" in json_data["metadata"]
    featureNames = json_data["metadata"]["feature_names"]
    assert len(featureNames) == 4
    assert "driver_id" in featureNames
    assert "conv_rate" in featureNames
    assert "acc_rate" in featureNames
    assert "avg_daily_trips" in featureNames
    # Validate data: all PRESENT
    for index, feature in enumerate(featureNames[1:]):
        statuses = json_data["results"][index + 1]["statuses"]
        # print(f'Statuses of {feature}/{index} are {statuses}')
        assert "PRESENT" in statuses
        assert "NOT_FOUND" not in statuses
    print(f'All online data is present for all queried features {featureNames}')
except ImportError:
    print(response.text)

#### Validate from DB

**TODO** some queries like `select feature_name, event_ts from feast_postgres_driver_hourly_stats;` and validate counters
And the same for the `feast_postgres_driver_hourly_stats_fresh`table

### Push Data

Push sample data on new `driver_id=3001` using Feature server

In [None]:
import requests
import json

push_ts = datetime(2021, 5, 13, 10, 59, 42)
event_dict = {
    "driver_id": [3001],
    "event_timestamp": [str(datetime(2021, 5, 13, 10, 59, 42))],
    "created": [str(push_ts)],
    "conv_rate": [1.0],
    "acc_rate": [1.0],
    "avg_daily_trips": [1000],
}
# "string_feature": "test2",
push_data = {
    "push_source_name":"driver_stats_push_source",
    "df":event_dict,
    "to":"online",
}

# Note: push not implemented ATM for PSQL offline store

url = "http://feast-feature-server.feast.svc.cluster.local/push"
response = requests.post(
    url,
    data=json.dumps(push_data))

In [None]:
assert response.status_code == 200

#### Validate from Feature server

In [None]:
import requests

def fetchPushedOnlineFeaturesFromServer():
    payload = {
        "features": [
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips"
        ],
        "entities": {
            "driver_id": [3001]
        }
    }

    url = "http://feast-feature-server.feast.svc.cluster.local/get-online-features"

    return requests.post(url, json=payload)

In [None]:
pushed = fetchPushedOnlineFeaturesFromServer()

**TODO** Specify to use the `push_ts` for the query


In [None]:
assert pushed.status_code == 200
try:
    import json

    json_data = json.loads(pushed.text)
    # print(json.dumps(json_data, indent=2))
    assert "metadata" in json_data
    # Validate metadata
    assert "feature_names" in json_data["metadata"]
    featureNames = json_data["metadata"]["feature_names"]
    assert len(featureNames) == 4
    assert "driver_id" in featureNames
    assert "conv_rate" in featureNames
    assert "acc_rate" in featureNames
    assert "avg_daily_trips" in featureNames
    # Validate data: all PRESENT
    for index, feature in enumerate(featureNames[1:]):
        statuses = json_data["results"][index + 1]["statuses"]
        # print(f'Statuses of {feature}/{index} are {statuses}')
        assert "PRESENT" in statuses
        assert "NOT_FOUND" not in statuses
    print(f'No online data found for all queried features {featureNames}')
except ImportError:
    print(response.text)

## Tear down

Tear down deployed feature store infrastructure

In [None]:
!feast -c sample_repo/feature_repo teardown

Remove local project

In [None]:
!rm -rf sample_repo

Uninstall `Feature server`

In [None]:
!./helm uninstall -n $TARGET_NS feast-release

Uninstall PSQL DB

In [None]:
!oc process -n openshift postgresql-ephemeral \
DATABASE_SERVICE_NAME=postgresql POSTGRESQL_USER=feast POSTGRESQL_PASSWORD=feast \
POSTGRESQL_DATABASE=feast | oc delete -f - -n ${TARGET_NS} 