# Feast

## Introduction
>[Feast](https://docs.feast.dev/) (FEAture STore) is a feature store provider that serves online (for real-time prediction) or offline (for model training) features.

It is an operational data system for managing and serving ML features to models in production. 

<p align=center><img src=images/Feast_Diagram.png width=600></p>

As a feature store, Feast provides single-point access to features, thereby preventing data leakage due to inconsistencies in data. 

Additionally, the features can be accessed by any team. Thus, Feast reduces friction between teams by providing single-point access and a centralised hub for registering features. 

Furthermore, the features can be reused not only across teams, but also across projects. Therefore, once a project is finished, the features used in it can also be utilised in the next project without requiring re-engineering.

Moreover, knowledge of the features that result in a specific model is important. With Feast, users can track features and their corresponding models.

__However,__ as opposed to other feature-store providers, Feast does not offer a feature-engineering service and does not generate statistic reports.


## Installation and Initialisation

Before installing Feast, we highly recommend that you create a virtual environment. Once this is done, proceed with the installation by executing the following command:
```
pip install feast
```
Note that if you intend to use Feast with AWS services or GCP to create online features, you also need to install the AWS or GCP dependencies.
```
pip install 'feast[aws]'
```
```
pip install 'feast[gcp]'
```

__In this lesson, we will explore local feature stores. However, for more information about online feature stores, follow the examples provided by the Feast team.__

### Creating a feature repository
Next, we create a feature repository. This repository will contain the configuration of the feature store as well as the features. 

Feast simplifies the creation process of a feature repository. Further, it generates files detailing the procedures for running Feast on your infrastructure and files containing the feature definitions.

Go to the directory where you intend to create the feature repository, and run
```
feast init
```
This will add some examples to your directory for carrying out experiments, as well as a file named `feature_store.yaml`. The folder will be named after a random animal and will contain a data folder.

<p align=center><img src=images/Feast_Directory.png width=200></p>


`just_grouper` is an example of a feature store that stores information about the features in AWS. In our case, `steady_platypus` is an example of a feature store that contains information in the local machine and that can be tracked using Git.

Next, we change directories to `steady_platypus` and initialise a repository:
```
cd steady_platypus
git init
```

## Feature Repository

Feast users use Feast to manage two important sets of configuration:

- Configuration for running Feast on your infrastructure (`feature_store.yaml`).
- Feature definitions (all the remaining Python files).

The above configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository.

### Structure
The structure of a feature repository is as follows:

- The root of the repository should contain a feature_store.yaml file and, optionally, a .feastignore file.
- The repository should contain Python files that contain feature definitions. 
- The repository can contain other files as well, including documentation and, potentially, data files.

### Applying definitions 
In our example, we can apply the definitions included in `example.py` to Feast by running
```
feast apply
```

Explore `example.py` to see how a feature is defined. It uses different classes within the `feast` module to define the different types of features. The comments provided by the Feast team are very useful for understanding how to define a feature. 

`example.py` will open `data/driver_stats.parquet`. To view its content, use pandas to read the file or add an extension and open it with your favorite editor. This is what the file looks like:

<p align=center><img src=images/Feast_parquet.png width=600></p>

You might encounter the syntax error when running `feast apply`:
```
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated 
\UXXXXXXXX escape
```
As a solution, go to `example.py` and change the backslashes (`\`) to forward slashes (`/`).

Afterwards, run `feast apply`, and the following message should appear:
```
Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
```
Additionally, you will notice that there are two more files in the `data` folder: `online_store.db` and `registry.db`. These files are used by Feast to store the features and their metadata. Track these files using Git:
```
git add .
git commit -m "Add feature store and feature definitions"
```

It is possible to change the data inside by modifying `example.py` and, thereafter, running `feast apply`. For example, in `example.py`, you can change the feature definition by adding a feature to the list:
```
features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
        Feature(name="created", dtype=ValueType.TIMESTAMP),
    ],

After running `feast apply`, you can verify the changes by running `git status`. You should see two files that have been modified:
```
        modified:   data/registry.db
        modified:   example.py
```

## Build a Training Dataset

Once the feature store has been deployed, a training dataset can be built. Users will need to provide a list of features and a list of entities. Thereafter, Feast will generate a training dataset that contains those features and entities.

First, you can define the feature references for the features that you wish to retrieve from the offline store. These features can come from multiple feature tables.

```
features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ]
```

Additionally, an entity dataframe is required, which is the target dataframe onto which you will join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary for joining the feature tables. All entities found in feature views that are being joined to the entity dataframe must appear as columns on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or an SQL query. In this case, we will use Pandas.

In [None]:
from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame(
    {
        "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC"), pd.Timestamp(datetime.now(), tz="UTC")],
        "driver_id": [1001, 1002]
    }
)

fs = FeatureStore(repo_path="steady_platypus")

training_df = fs.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_df=entity_df
).to_df()

## Loading Data to an Online Store

Feast allows users to load their feature data to an online store to serve the latest features to models for online prediction.

This is accomplished using the `feast materialize` command. The syntax is as follows:
```
feast materialize <initial_date> <final_date>
```

In our case, we will materialise data from the past week (i.e. 19th of August, with today (the day of writing) being the 26th). For this, run the following:

```
feast materialize 2021-08-19T00:00:00 2021-08-26T00:00:00
```

Now, online_store.db should contain some data, which can be utilised to make real-time predictions.

In [None]:
from feast import FeatureStore

fs = FeatureStore(repo_path="./steady_platypus")

training_df = fs.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_rows=[
        {"driver_id": 1001},
        {"driver_id": 1002}]
).to_dict()

print(training_df)

As you can observe, we did not use the `get_historical_features` method here. Instead, we utilised `get_online_features` to provide information on the database to Feast. Furthermore, we utilised `entity_rows` as an argument instead of `entity`.

When you run the code above, the output should be similar to the following:
```
{'driver_id': [1001, 1002], 'conv_rate': [0.9721019268035889, 0.700430691242218], 'acc_rate': [0.1053711548447609, 0.15271347761154175]}
```
This is the dictionary corresponding to the last added data.

## Conclusion
At this point, you should have a good understanding of
- Feast, as a library for defining features and deploying them to a feature store.
- how to use Feast to build a training dataset using historical features.
- how to use Feast to load data into an online store for real-time predictions.