# 2. Feature Stores with Feast

[Feast](https://docs.feast.dev/) (FEAture STore) is a feature store provider that can serve online (for real-time prediction) or offline (for model training) features.

> ## Feast is an operational data system for managing and serving machine learning features to models in production. 

<p align=center><img src=images/Feast_Diagram.png width=600></p>

As a feature store, Feast can give a single point access to your features, so there is no data leakage due to inconsistencies in your data. 

Additionally, features can be accessed by any team, so by providing that single point access as well as a centralized hub for registering the features, reduces the friction between teams. 

On top of that, features can be reused, not only across teams, but also across projects, so once a project is finished, the features used in it can also be used by the next project without engineering them again.

Moreover, it's important to know what features lead to a specific model. Thanks to Feast, you can track in time what features generated the corresponding models.

<span style="font-size:1.4em;">
<em>However</em>,
</span>
as opposed to other feature store providers, Feast doesn't offer a feature engineering service. Also, it doesn't generate statistic reports (yet)


# Installation and Initializing

Before installing feast, it is highly recommended to create a virtual environment. Installing Feast is as simply as executing
```
pip install feast
```
Note that, if you are planning on using Feast with AWS services or GCP to create online features, you will also need to install the AWS or GCP dependencies.
```
pip install 'feast[aws]'
```
Or
```
pip install 'feast[gcp]'
```

__Bear in mind that here we are going to look at local feature stores. If you want to know more about online feature stores, you can follow the examples provided by the Feast team. Here, we are going to show you how feature stores work, so then you can extrapolate that knowledge to a wider use.__

Now, you can create a feature repository. This repository will contain the configuration of the feature store and the features themselves. 

Feast makes the creation of a Feature Repository quite simple, and it generate some files about how to run Feast on your infrastructure, and another file containing the definition of the features. 

To create the feature repository, go to the directory where you want to create the feature repository, and simply run:
```
feast init
```
It will add some examples to your directory so you can play around, as well as a file named `feature_store.yaml`. The name of a folder will be a random animal with a data folder inside

<p align=center><img src=images/Feast_Directory.png width=200></p>


`just_grouper` is another example for a feature store that has information about your features in AWS. In our case, `steady_platypus` is an example of a feature store that contains information in your local machine and that you can keep track using Git.

Let't change directories to `steady_platypus` and start a repository:
```
cd steady_platypus
git init
```

# Feature Repository

Feast users use Feast to manage two important sets of configuration:

- Configuration about how to run Feast on your infrastructure (`feature_store.yaml`)
- Feature definitions (All remaining Python files)

The above configuration can be written declaratively and stored as code in a central location. This central location is called a feature repository.

The structure of a feature repository is as follows:

- The root of the repository should contain a feature_store.yaml file and may contain a .feastignore file.
- The repository should contain Python files that contain feature definitions. 
- The repository can contain other files as well, including documentation and potentially data files.

Thus, in our example, we can apply the definitions included in `example.py` to Feast by running:
```
feast apply
```

Take a look at `example.py` to see how to define a feature. It uses different classes within the `feast` module to define the different types of features. The comments provided by the Feast team are very useful to understand how to define a feature. 

`example.py` will open `data/driver_stats.parquet`. If you are curious about its content, you can use pandas to read that file, or add an extension to your favorite editor to open it. This is what the file looks like:

<p align=center><img src=images/Feast_parquet.png width=600></p>

You might find an error when running `feast apply`:
```
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated 
\UXXXXXXXX escape
```
To solve it, simply go to `example.py` and change the backslashes (`\`) to forward slashes (`/`)

You will see that upon running `feast apply`, the following message shows up:
```
Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
```
Additionally, you will notice that there are two more files in the `data` folder. The first one is `online_store.db` and the second one is `registry.db`. These files are used by Feast to store the features and their metadata. It would be a good time to track yout files using Git:
```
git add .
git commit -m "Add feature store and feature definitions"
```

You can change the data inside by tweaking `example.py` and then running `feast apply`. For example, in `example.py`, you can change the feature definition by adding a feature to the list:
```
features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
        Feature(name="created", dtype=ValueType.TIMESTAMP),
    ],

After running `feast apply` You can check that there has been a change by running `git status`. You should see two files that have been modified:
```
        modified:   data/registry.db
        modified:   example.py
```

# Build a training dataset

Once you deployed the feature store, you can build a training dataset. Users have to provide a list of features and a list of entities. Feast will generate a training dataset that contains those features and entities.

First, you can define the feature references for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables.

```
features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ]
```

You have to also provide an entity dataframe, which is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp and all entities (primary keys) necessary to join feature tables onto. All entities found in feature views that are being joined onto the entity dataframe must be found as column on the entity dataframe.

It is possible to provide entity dataframes as either a Pandas dataframe or a SQL query, but in this case, we are going to simply use Pandas.

In [None]:
from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame(
    {
        "event_timestamp": [pd.Timestamp(datetime.now(), tz="UTC"), pd.Timestamp(datetime.now(), tz="UTC")],
        "driver_id": [1001, 1002]
    }
)

fs = FeatureStore(repo_path="steady_platypus")

training_df = fs.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_df=entity_df
).to_df()

# Load data into the online store

Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction.

You can send data to the online store by using the `feast materialize` command. The syntax is as follows:
```
feast materialize <initial_date> <final_date>
```

In my case, I would like to materialize data from the last week (Today is 26th of August), so I have to run:

```
feast materialize 2021-08-19T00:00:00 2021-08-26T00:00:00
```

Now, if you open online_store.db, you will see some data in it. We can use this for making real-time predictions.

In [None]:
from feast import FeatureStore

fs = FeatureStore(repo_path="./steady_platypus")

training_df = fs.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate"
    ],
    entity_rows=[
        {"driver_id": 1001},
        {"driver_id": 1002}]
).to_dict()

print(training_df)

Observe that in this case, we are not using the method `get_historical_features`, instead, we are using `get_online_features`, so feast knows what database it should look for. Also, we are not using `entity`, but instead we are using the `entity_rows` as an argument.

When you run the code above, the output should look like:
```
{'driver_id': [1001, 1002], 'conv_rate': [0.9721019268035889, 0.700430691242218], 'acc_rate': [0.1053711548447609, 0.15271347761154175]}
```
which is the dictionary corresponding to the last added data.

# Summary

- Feast is a library for defining features and deploying them to a feature store.
- Feast can be used to build a training dataset using historical features.
- Feast can be used to load data into an online store, so you can do real-time predictions.