<a href="https://colab.research.google.com/github/dmatrix/feast_workshops/blob/master/notebooks/Feast_Tutorial_Module_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Install Feast

Install Feast using pip:


In [None]:
pip install feast



## Step 2: Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See [Feature Repository](https://docs.feast.dev/reference/feature-repository) for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use the `feast init` command:

In [None]:
%%shell
cd /content/
feast init feature_repo


Creating a new Feast repository in [1m[32m/content/feature_repo[0m.





### Inspecting the feature repository

Let's take a look at the repo itself. It breaks down into


*   `data/` contains the raw parquet data
*   `example.py` contains demo feature definitions
*   `feature_store.yaml` contains a demo setup configuring where data sources are



In [None]:
%cd /content/feature_repo
!ls

/content/feature_repo
data  example.py  feature_store.yaml


## Step 3: Applying and deploying feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [None]:
%%shell
feast apply

Registered entity [1m[32mdriver_id[0m
Registered feature view [1m[32mdriver_hourly_stats[0m
Deploying infrastructure for [1m[32mdriver_hourly_stats[0m




## Step 4: Materialize features

We now serialize these features since the beginning of time to prepare for model training / serving (note: `materialize-incremental` serializes all new features since the last `materialize` call)

In [None]:
%%shell
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Materializing [1m[32m1[0m feature views to [1m[32m2021-07-21 21:03:44+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2021-07-21 21:03:30+00:00[0m to [1m[32m2021-07-21 21:03:44+00:00[0m:
0it [00:00, ?it/s]0it [00:00, ?it/s]




### Inspect materialized features

Note that now there are `online_store.db` and `registry.db`, which store the materialized features and schema information, respectively

In [None]:
%%shell
cd data
ls

driver_stats.parquet  online_store.db  registry.db




## Step 5: Fetch features for offline / online cases

### Read some historical data for model training

We pass in the 'entities' aka the unique identifiers of the example features we want as well as the feature names we want to extract.

Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model

In [None]:
from datetime import datetime

import pandas as pd

from feast import FeatureStore

entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003, 1004],
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
            datetime(2021, 4, 12, 15, 1, 12),
        ],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 5 columns):
 #   Column                                Non-Null Count  Dtype              
---  ------                                --------------  -----              
 0   event_timestamp                       4 non-null      datetime64[ns, UTC]
 1   driver_id                             4 non-null      int64              
 2   driver_hourly_stats__conv_rate        4 non-null      float64            
 3   driver_hourly_stats__acc_rate         4 non-null      float64            
 4   driver_hourly_stats__avg_daily_trips  4 non-null      int64              
dtypes: datetime64[ns, UTC](1), float64(2), int64(2)
memory usage: 192.0 bytes
None

----- Example features -----

            event_timestamp  ...  driver_hourly_stats__avg_daily_trips
0 2021-04-12 08:12:10+00:00  ...                                   827
1 2021-04-12 10:59:42+00:00  ...                              

### Read features at serving time


In [None]:
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[{"driver_id": 1001}],
).to_dict()

pprint(feature_vector)

{'driver_hourly_stats__acc_rate': [0.2474302053451538],
 'driver_hourly_stats__avg_daily_trips': [899],
 'driver_hourly_stats__conv_rate': [0.7521973848342896],
 'driver_id': [1001]}
