# Setup environment

Before execute next cell you should run a command in a terminal as follows

`$ source /workspace/FeatureStore/install-venv.sh`

Change kernel to "Py3.7 (Feast)" and run cells orderly as below 

In [2]:
!pygmentize /workspace/FeatureStore/install-venv.sh

[37m#!/bin/bash[39;49;00m
[37m# to influence current shell execute this script with source[39;49;00m
[37m# $ source install-venv.sh[39;49;00m


[37m# yes | jupyter kernelspec remove[39;49;00m

conda init --all
conda deactivate
conda create -n feast-conda-env [31mpython[39;49;00m=[34m3[39;49;00m.7
conda env list
conda activate feast-conda-env
yes | conda install pip ipykernel
python -m ipykernel install --user --name feast-conda-env --display-name [33m"Py3.7 (Feast)"[39;49;00m
jupyter kernelspec list
conda env list
ls /opt/conda/envs/feast-conda-env

[37m# pip uninstall luigi[39;49;00m
pip install feast[gcp] Pygments -U


# Create a feature repository
A feature repository consists of:
  * A collection of Python files containing feature declarations.
  * A feature_store.yaml file containing infrastructural configuration.
  * A .feastignore file containing paths in the feature repository to ignore.

Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

## Initialize repo

In [2]:
!./run-venv.sh feast-conda-env \
 feast init feature_repo


Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage

Creating a new Feast repository in [1m[32m/workspace/FeatureStore/1.getting-started/feature_repo[0m.



In [19]:
%cd feature_repo
!tree -a

/workspace/FeatureStore/1.getting-started/feature_repo
[01;34m.[00m
├── [01;34mdata[00m
│   └── driver_stats.parquet
├── example.py
└── feature_store.yaml

1 directory, 3 files


In [18]:
%cd 1.getting-started/

/workspace/FeatureStore/1.getting-started


## Generated files
### 1. The **feature_store.yaml** configuration file :
The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. An example feature_store.yaml file is shown below:


In [12]:
!pygmentize feature_store.yaml

[94mproject[39;49;00m: feature_repo
[94mregistry[39;49;00m: data/registry.db
[94mprovider[39;49;00m: local
[94monline_store[39;49;00m:
    [94mpath[39;49;00m: data/online_store.db


**provider** defines where the raw data exists (for generating training data & feature values for serving), and where to materialize feature values to in the online store (for serving).
* local: use file source / SQLite
* gcp: use BigQuery / Google Cloud Datastore
* aws: use Redshift / DynamoDB

### 2. The **.feastignore** file : 
This file should be created to stop feast execute irrelevant files for feast operation since feast executes all python file under repo folder recursively. This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

```
.feastignore
# Ignore virtual environment
venv
​
# Ignore a specific Python file
scripts/foo.py
​
# Ignore all Python files directly under scripts directory
scripts/*.py
​
# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py
```

### 3. **Feature definitions** : 
A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. Feastn team recommends having a single feature store and a single project per environment (dev, staging, prod) in their ["Concepts" page](https://docs.feast.dev/getting-started/concepts).
![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKKN_g2YldHEVf-XmU%2F-MaKTnYQG9kFhl8yPNTM%2Fimage.png?alt=media&token=1db54ccd-cb92-4239-b4a9-2db77d4ff626)

In [29]:
!pygmentize -O full,style=zenburn,linenos=1 example.py

0001: [37m# This is an example feature definition file[39;49;00m
0002: 
0003: [34mfrom[39;49;00m [04m[36mgoogle[39;49;00m[04m[36m.[39;49;00m[04m[36mprotobuf[39;49;00m[04m[36m.[39;49;00m[04m[36mduration_pb2[39;49;00m [34mimport[39;49;00m Duration
0004: 
0005: [34mfrom[39;49;00m [04m[36mfeast[39;49;00m [34mimport[39;49;00m Entity, Feature, FeatureView, FileSource, ValueType
0006: 
0007: [37m# Read data from parquet files. Parquet is convenient for local development mode. For[39;49;00m
0008: [37m# production, you can use your favorite DWH, such as BigQuery. See Feast documentation[39;49;00m
0009: [37m# for more info.[39;49;00m
0010: driver_hourly_stats = FileSource(
0011:     path=[33m"[39;49;00m[33m/workspace/FeatureStore/1.getting-started/feature_repo/data/driver_stats.parquet[39;49;00m[33m"[39;49;00m,
0012:     event_timestamp_column=[33m"[39;49;00m[33mevent_timestamp[39;49;00m[33m"[39;49;00m,
0013:     created_timestamp_column=[33m"[39;4

#### Feature View ####
A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Codes between 23 and 35 shows how we can define FeatureView

Feature views are used during
* The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
* Loading of feature values into an online store. Feature views determine the storage schema in the online store.
* Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.
>Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

#### Feature ####
A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.
Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

You can find declaration of Features in the above codes between 27 and 30.  

Together with [data sources](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/data-source), they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using [feature references](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/feature-retrieval#feature-references).
Feature names must be unique within a [feature view](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/feature-view#feature-view).


#### Entity
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

**Entity key** is one or more entity values that uniquely describe a feature view record. This key also consist of multiple entity values.

![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKa97WKyl0myJs-uLy%2F-MaKdRWc81UMJLidewa6%2Fimage.png?alt=media&token=615cc748-1a26-4643-a92f-669a821d6141)

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during [point-in-time joins](https://docs.feast.dev/v/v0.6-branch/user-guide/feature-retrieval#point-in-time-correct-join).

The line 0018 in the code above show how to create Entity

In [21]:
import pandas as pd
table = pd.read_parquet("data/driver_stats.parquet")
table

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2021-08-20 04:00:00+00:00,1005,0.108550,0.280793,906,2021-09-04 04:26:19.018
1,2021-08-20 05:00:00+00:00,1005,0.725994,0.095828,817,2021-09-04 04:26:19.018
2,2021-08-20 06:00:00+00:00,1005,0.877615,0.904514,580,2021-09-04 04:26:19.018
3,2021-08-20 07:00:00+00:00,1005,0.457302,0.888655,903,2021-09-04 04:26:19.018
4,2021-08-20 08:00:00+00:00,1005,0.953363,0.517090,856,2021-09-04 04:26:19.018
...,...,...,...,...,...,...
1802,2021-09-04 02:00:00+00:00,1001,0.706308,0.748441,379,2021-09-04 04:26:19.018
1803,2021-09-04 03:00:00+00:00,1001,0.895251,0.315381,826,2021-09-04 04:26:19.018
1804,2021-04-12 07:00:00+00:00,1001,0.678634,0.704390,956,2021-09-04 04:26:19.018
1805,2021-08-27 16:00:00+00:00,1003,0.953687,0.409890,213,2021-09-04 04:26:19.018


#### Data Source
The data source **refers to raw underlying data (e.g. a table in BigQuery)**.
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKKN_g2YldHEVf-XmU%2F-MaKSgD8bNlCMB-D9YZ2%2Fimage.png?alt=media&token=833eebaa-c16a-42d6-8286-516057f5d540)