# A. Setup environment with Feast Installation

Before execute next cell you should run a command in a terminal as follows

`$ source /workspace/FeatureStore/install-venv.sh`

Change kernel to "Py3.7 (Feast)" and run cells orderly as below 

In [2]:
!pygmentize /workspace/FeatureStore/install-venv.sh

[37m#!/bin/bash[39;49;00m
[37m# to influence current shell execute this script with source[39;49;00m
[37m# $ source install-venv.sh[39;49;00m


[37m# yes | jupyter kernelspec remove[39;49;00m

conda init --all
conda deactivate
conda create -n feast-conda-env [31mpython[39;49;00m=[34m3[39;49;00m.7
conda env list
conda activate feast-conda-env
yes | conda install pip ipykernel
python -m ipykernel install --user --name feast-conda-env --display-name [33m"Py3.7 (Feast)"[39;49;00m
jupyter kernelspec list
conda env list
ls /opt/conda/envs/feast-conda-env

[37m# pip uninstall luigi[39;49;00m
pip install feast[gcp] Pygments -U


# B. Create a feature repository
A feature repository consists of:
  * A collection of Python files containing feature declarations.
  * A feature_store.yaml file containing infrastructural configuration.
  * A .feastignore file containing paths in the feature repository to ignore.

Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.

## Initialize repo

In [2]:
!./run-venv.sh feast-conda-env \
 feast init feature_repo


Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage

Creating a new Feast repository in [1m[32m/workspace/FeatureStore/1.getting-started/feature_repo[0m.



#### feast init result ####

In [19]:
%cd feature_repo
!tree -a

/workspace/FeatureStore/1.getting-started/feature_repo
[01;34m.[00m
├── [01;34mdata[00m
│   └── driver_stats.parquet
├── example.py
└── feature_store.yaml

1 directory, 3 files


## Generated files (Most important)
### 1. The **feature_store.yaml** configuration file :
`feast init` generates example configuration files. You should modify them for your development requirements and execute `feast apply` and `feast metirialize` in the end

The configuration for a feature store is stored in a file named feature_store.yaml , which must be located at the root of a feature repository. This tutorial is from [Feas official site](https://docs.feast.dev) and it is only on online store. The feature_store.yaml file in the feature_repo folder is shown below:


In [12]:
!pygmentize feature_store.yaml

[94mproject[39;49;00m: feature_repo
[94mregistry[39;49;00m: data/registry.db
[94mprovider[39;49;00m: local
[94monline_store[39;49;00m:
    [94mpath[39;49;00m: data/online_store.db


**provider** defines where the raw data exists (for generating training data & feature values for serving), and where to materialize feature values to in the online store (for serving).
* local: use file source / SQLite
* gcp: use BigQuery / Google Cloud Datastore
* aws: use Redshift / DynamoDB

Below is the example of the configuration when GCP is the platform to operate feast
```yaml
project: my_feature_repo
registry: gs://my-bucket/data/registry.db
provider: gcp
offline_store:
  type: bigquery
  dataset: feast_bq_dataset
```
<br/>

In AWS, the configuration goes like this. Tecton team seems to actively update for AWS version. 
```yaml
project: my_feature_repo
registry: data/registry.db
provider: aws
offline_store:
  type: redshift
  region: us-west-2
  cluster_id: feast-cluster
  database: feast-database
  user: redshift-user
  s3_staging_location: s3://feast-bucket/redshift
  iam_role: arn:aws:iam::123456789012:role/redshift_s3_access_role
```

### 2. The **.feastignore** file : 
This file should be created to stop feast execute irrelevant files for feast operation since feast executes all python file under repo folder recursively. This file contains paths that should be ignored when running feast apply. An example .feastignore is shown below:

```
.feastignore
# Ignore virtual environment
venv
​
# Ignore a specific Python file
scripts/foo.py
​
# Ignore all Python files directly under scripts directory
scripts/*.py
​
# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py
```

### 3. **Feature definitions** : 
A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:

Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. Feastn team recommends having a single feature store and a single project per environment (dev, staging, prod) in their ["Concepts" page](https://docs.feast.dev/getting-started/concepts).
![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKKN_g2YldHEVf-XmU%2F-MaKTnYQG9kFhl8yPNTM%2Fimage.png?alt=media&token=1db54ccd-cb92-4239-b4a9-2db77d4ff626)

In [29]:
!pygmentize -O full,style=zenburn,linenos=1 example.py

0001: [37m# This is an example feature definition file[39;49;00m
0002: 
0003: [34mfrom[39;49;00m [04m[36mgoogle[39;49;00m[04m[36m.[39;49;00m[04m[36mprotobuf[39;49;00m[04m[36m.[39;49;00m[04m[36mduration_pb2[39;49;00m [34mimport[39;49;00m Duration
0004: 
0005: [34mfrom[39;49;00m [04m[36mfeast[39;49;00m [34mimport[39;49;00m Entity, Feature, FeatureView, FileSource, ValueType
0006: 
0007: [37m# Read data from parquet files. Parquet is convenient for local development mode. For[39;49;00m
0008: [37m# production, you can use your favorite DWH, such as BigQuery. See Feast documentation[39;49;00m
0009: [37m# for more info.[39;49;00m
0010: driver_hourly_stats = FileSource(
0011:     path=[33m"[39;49;00m[33m/workspace/FeatureStore/1.getting-started/feature_repo/data/driver_stats.parquet[39;49;00m[33m"[39;49;00m,
0012:     event_timestamp_column=[33m"[39;49;00m[33mevent_timestamp[39;49;00m[33m"[39;49;00m,
0013:     created_timestamp_column=[33m"[39;4

#### Feature View ####
A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Codes between 23 and 35 shows how we can define FeatureView

Feature views are used during
* The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views.
* Loading of feature values into an online store. Feature views determine the storage schema in the online store.
* Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.
>Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.

#### Feature ####
A feature is an individual measurable property observed on an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month.
Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type:

You can find declaration of Features in the above codes between 27 and 30.  

Together with [data sources](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/data-source), they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using [feature references](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/feature-retrieval#feature-references).
Feature names must be unique within a [feature view](https://docs.feast.dev/getting-started/concepts/data-model-and-concepts/feature-view#feature-view).


#### Entity
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.

**Entity key** is one or more entity values that uniquely describe a feature view record. This key also consist of multiple entity values. This needs to be test in the future since there are no officially released tutorials or examples being able to experience multiple-entity-value entity key. You only can one value entity key even if Feast team published three examples in their official web site such as [Driver Ranking Example](https://docs.feast.dev/tutorials/driver-ranking-with-feast#driver-ranking-example), [Fraud detection on GCP](https://docs.feast.dev/tutorials/fraud-detection#fraud-detection-example), [Real-time Credit Scoring Example](https://docs.feast.dev/tutorials/real-time-credit-scoring-on-aws#real-time-credit-scoring-example)

![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKa97WKyl0myJs-uLy%2F-MaKdRWc81UMJLidewa6%2Fimage.png?alt=media&token=615cc748-1a26-4643-a92f-669a821d6141)

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during [point-in-time joins](https://docs.feast.dev/v/v0.6-branch/user-guide/feature-retrieval#point-in-time-correct-join).

The line 0018 in the code above shows how to create Entity

In [21]:
import pandas as pd
table = pd.read_parquet("data/driver_stats.parquet")
table

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2021-08-20 04:00:00+00:00,1005,0.108550,0.280793,906,2021-09-04 04:26:19.018
1,2021-08-20 05:00:00+00:00,1005,0.725994,0.095828,817,2021-09-04 04:26:19.018
2,2021-08-20 06:00:00+00:00,1005,0.877615,0.904514,580,2021-09-04 04:26:19.018
3,2021-08-20 07:00:00+00:00,1005,0.457302,0.888655,903,2021-09-04 04:26:19.018
4,2021-08-20 08:00:00+00:00,1005,0.953363,0.517090,856,2021-09-04 04:26:19.018
...,...,...,...,...,...,...
1802,2021-09-04 02:00:00+00:00,1001,0.706308,0.748441,379,2021-09-04 04:26:19.018
1803,2021-09-04 03:00:00+00:00,1001,0.895251,0.315381,826,2021-09-04 04:26:19.018
1804,2021-04-12 07:00:00+00:00,1001,0.678634,0.704390,956,2021-09-04 04:26:19.018
1805,2021-08-27 16:00:00+00:00,1003,0.953687,0.409890,213,2021-09-04 04:26:19.018


#### Data Source
The data source **refers to raw underlying data (e.g. a table in BigQuery)**.
Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

![](https://gblobscdn.gitbook.com/assets%2F-LqPPgcuCulk4PnaI4Ob%2F-MaKKN_g2YldHEVf-XmU%2F-MaKSgD8bNlCMB-D9YZ2%2Fimage.png?alt=media&token=833eebaa-c16a-42d6-8286-516057f5d540)

# C. Register feature definitions and deploy feature store

In [33]:
!pwd

/workspace/FeatureStore/1.getting-started/feature_repo


In [35]:
!../run-venv.sh feast-conda-env \
 feast apply


Registered entity [1m[32mdriver_id[0m
Registered feature view [1m[32mdriver_hourly_stats[0m
Deploying infrastructure for [1m[32mdriver_hourly_stats[0m


The **feast apply** command scans python files in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, it reads example.py (showned above) and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by using the local provider in feature_store.yaml.

#### feast apply result

In [37]:
!tree -a -c

[01;34m.[00m
├── example.py
├── feature_store.yaml
└── [01;34mdata[00m
    ├── driver_stats.parquet
    ├── online_store.db
    └── registry.db

1 directory, 5 files


You can find differences in the tree results between "feast init" and "feast apply". Two *.db files are generated after executing "feast apply" which are online_store.db and registry.db. The online_store.db is a SQLite database file, on the other hand, registry.db is not.

#### db file check

In [40]:
import sqlite3
# conn = sqlite3.connect('data/online_store.db')
conn = sqlite3.connect('file:data/online_store.db?mode=ro', uri=True)
curs = conn.cursor()
curs.execute("SELECT name FROM sqlite_master WHERE type = 'table'").fetchall()


[('feature_repo_driver_hourly_stats',)]

In [43]:
for row in curs.execute("PRAGMA table_info(feature_repo_driver_hourly_stats)"):
    print(row)

(0, 'entity_key', 'BLOB', 0, None, 1)
(1, 'feature_name', 'TEXT', 0, None, 2)
(2, 'value', 'BLOB', 0, None, 0)
(3, 'event_ts', 'timestamp', 0, None, 0)
(4, 'created_ts', 'timestamp', 0, None, 0)


In [44]:
curs.execute("SELECT * from feature_repo_driver_hourly_stats").fetchall()


[]

In [45]:
import sqlite3
# conn = sqlite3.connect('data/registry.db')
conn = sqlite3.connect('file:data/registry.db?mode=ro', uri=True)
curs = conn.cursor()
curs.execute("SELECT name FROM sqlite_master WHERE type = 'table'").fetchall()


DatabaseError: file is not a database

# D. Generating training data
To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).
The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation. In many cases, Feast will also intelligently join relevant tables to create the relevant feature vectors.
>Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.

In [56]:
from datetime import datetime, timedelta
import pandas as pd

from feast import FeatureStore

# The entity dataframe is the dataframe we want to enrich with feature values
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003],
        "label_driver_reported_satisfaction": [1, 5, 3], 
        "event_timestamp": [
            datetime.now() - timedelta(minutes=11),
            datetime.now() - timedelta(minutes=36),
            datetime.now() - timedelta(minutes=73),
        ],
    }
)
print("\n----- Entity data -----")
display(entity_df)
store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----")
print(training_df.info(),'\n')

print()
print("----- Example features -----")
display(training_df.head())


----- Entity data -----


Unnamed: 0,driver_id,label_driver_reported_satisfaction,event_timestamp
0,1001,1,2021-09-04 15:19:23.070288
1,1002,5,2021-09-04 14:54:23.070322
2,1003,3,2021-09-04 14:17:23.070327


----- Feature schema -----
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 6 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   event_timestamp                     3 non-null      datetime64[ns, UTC]
 1   driver_id                           3 non-null      int64              
 2   label_driver_reported_satisfaction  3 non-null      int64              
 3   conv_rate                           3 non-null      float32            
 4   acc_rate                            3 non-null      float32            
 5   avg_daily_trips                     3 non-null      int32              
dtypes: datetime64[ns, UTC](1), float32(2), int32(1), int64(2)
memory usage: 132.0 bytes
None 


----- Example features -----


Unnamed: 0,event_timestamp,driver_id,label_driver_reported_satisfaction,conv_rate,acc_rate,avg_daily_trips
0,2021-09-04 14:17:23.070327+00:00,1003,3,0.829489,0.491795,375
1,2021-09-04 14:54:23.070322+00:00,1002,5,0.642183,0.260732,911
2,2021-09-04 15:19:23.070288+00:00,1001,1,0.895251,0.315381,826


# E. Load features into the online store
We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

#### execute `feast materialize-incremental`

In [126]:
%%bash
export CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
../run-venv.sh  feast-conda-env \
feast materialize-incremental $CURRENT_TIME

Materializing [1m[32m1[0m feature views to [1m[32m2021-09-05 01:33:21+09:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2021-09-05 01:21:11+09:00[0m to [1m[32m2021-09-05 01:33:21+09:00[0m:


0it [00:00, ?it/s]


#### check the online store resides in SQLite db file.

In [121]:
import sqlite3
import pandas as pd
# conn = sqlite3.connect('data/online_store.db')
conn = sqlite3.connect('file:data/online_store.db?mode=ro', uri=True)
curs = conn.cursor()
curs.execute("SELECT name FROM sqlite_master WHERE type = 'table'").fetchall()


[('feature_repo_driver_hourly_stats',)]

In [122]:
for row in curs.execute("PRAGMA table_info(feature_repo_driver_hourly_stats)"):
    print(row)

(0, 'entity_key', 'BLOB', 0, None, 1)
(1, 'feature_name', 'TEXT', 0, None, 2)
(2, 'value', 'BLOB', 0, None, 0)
(3, 'event_ts', 'timestamp', 0, None, 0)
(4, 'created_ts', 'timestamp', 0, None, 0)


In [123]:
df = pd.read_sql_query("SELECT * from feature_repo_driver_hourly_stats", con=conn)
# df=df['entity_key'].apply(lambda x: x.decode("utf-8"))
display(df)
df.info()


Unnamed: 0,entity_key,feature_name,value,event_ts,created_ts
0,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,conv_rate,b')\x00\x00\x00@\x00a\xe8?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
1,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,acc_rate,b')\x00\x00\x00\x00\xd6\xf7\xe3?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
2,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,avg_daily_trips,b' \xb9\x01',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
3,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,conv_rate,b')\x00\x00\x00\xa0\xabp\xeb?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
4,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,acc_rate,b')\x00\x00\x00\xc0\xd0\x8b\xe0?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
5,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,avg_daily_trips,b' \xdc\x07',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
6,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,conv_rate,b')\x00\x00\x00\xe0+\x8b\xea?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
7,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,acc_rate,b')\x00\x00\x00\xa0\x92y\xdf?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
8,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,avg_daily_trips,b' \xf7\x02',2021-09-04 03:00:00,2021-09-04 04:26:19.018000
9,b'\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x0...,conv_rate,b')\x00\x00\x00 \xc4\x8c\xe4?',2021-09-04 03:00:00,2021-09-04 04:26:19.018000


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   entity_key    15 non-null     object
 1   feature_name  15 non-null     object
 2   value         15 non-null     object
 3   event_ts      15 non-null     object
 4   created_ts    15 non-null     object
dtypes: object(5)
memory usage: 728.0+ bytes


In [None]:
conn.close()


# F. Fetching feature vectors for inference
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features(). These feature vectors can then be fed to the model.

In [117]:
%%timeit
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

pprint(feature_vector)

{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 'driver_id': [1004, 1005]}
{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 'driver_id': [1004, 1005]}
{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 'driver_id': [1004, 1005]}
{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 'driver_id': [1004, 1005]}
{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 'driver_id': [1004, 1005]}
{'acc_rate': [0.5170673131942749, 0.6240034103393555],
 'avg_daily_trips': [988, 185],
 'conv_rate': [0.8575037121772766, 0.7618409395217896],
 '

# G. Command Line interface

In [118]:
!../run-venv.sh feast-conda-env \
feast --help

Usage: feast [OPTIONS] COMMAND [ARGS]...

  Feast CLI

  For more information, see our public docs at https://docs.feast.dev/

  For any questions, you can reach us at https://slack.feast.dev/

Options:
  -c, --chdir TEXT  Switch to a different feature repository directory before
                    executing the given subcommand.

  --help            Show this message and exit.

Commands:
  apply                    Create or update a feature store deployment
  entities                 Access entities
  feature-services         Access feature services
  feature-views            Access feature views
  init                     Create a new Feast repository
  materialize              Run a (non-incremental) materialization job to...
  materialize-incremental  Run an incremental materialization job to ingest...
  registry-dump            Print contents of the metadata registry
  teardown                 Tear down deployed feature store infrastructure
  version                  Display Feas

In [109]:
!../run-venv.sh feast-conda-env \
feast entities list

NAME       DESCRIPTION    TYPE
driver_id  driver id      ValueType.INT64


In [113]:
!../run-venv.sh feast-conda-env \
feast entities describe driver_id

spec:
  name: driver_id
  valueType: INT64
  description: driver id
  joinKey: driver_id
meta: {}



In [114]:
!../run-venv.sh feast-conda-env \
feast feature-views list

NAME                 ENTITIES
driver_hourly_stats  ['driver_id']


In [115]:
!../run-venv.sh feast-conda-env \
feast feature-views describe driver_hourly_stats

spec:
  name: driver_hourly_stats
  entities:
  - driver_id
  features:
  - name: conv_rate
    valueType: FLOAT
  - name: acc_rate
    valueType: FLOAT
  - name: avg_daily_trips
    valueType: INT64
  ttl: 86400s
  batchSource:
    type: BATCH_FILE
    eventTimestampColumn: event_timestamp
    createdTimestampColumn: created
    fileOptions:
      fileUrl: /workspace/FeatureStore/1.getting-started/feature_repo/data/driver_stats.parquet
    dataSourceClassType: feast.infra.offline_stores.file_source.FileSource
  online: true
meta:
  materializationIntervals:
  - startTime: '2021-09-03T06:47:26.421407Z'
    endTime: '2021-09-04T06:47:22Z'



In [116]:
!../run-venv.sh feast-conda-env \
feast registry-dump

{
  "spec": {
    "name": "driver_id",
    "valueType": "INT64",
    "description": "driver id",
    "joinKey": "driver_id"
  },
  "meta": {}
}
{
  "spec": {
    "name": "driver_hourly_stats",
    "entities": [
      "driver_id"
    ],
    "features": [
      {
        "name": "conv_rate",
        "valueType": "FLOAT"
      },
      {
        "name": "acc_rate",
        "valueType": "FLOAT"
      },
      {
        "name": "avg_daily_trips",
        "valueType": "INT64"
      }
    ],
    "ttl": "86400s",
    "batchSource": {
      "type": "BATCH_FILE",
      "eventTimestampColumn": "event_timestamp",
      "createdTimestampColumn": "created",
      "fileOptions": {
        "fileUrl": "/workspace/FeatureStore/1.getting-started/feature_repo/data/driver_stats.parquet"
      },
      "dataSourceClassType": "feast.infra.offline_stores.file_source.FileSource"
    },
    "online": true
  },
  "meta": {
    "materializationIntervals": [
      {
        "startTime": "2021-09-03T06:47:26.421407

# H. Comprehension of `feast materialize-incremental` 


Data upload from offline store to online store will be peformed by `feast materialize` or `feast  materialize-incremental`. With scope of sotring data, big difference between offline store and online store is whether the store contains all historical data or latest data. This specificity defines usage of each store. While **offline store** can be a source for **training data**, **online store** works with **prediction** process.

Back to the point on feast command, materialize always needs two time stamps which are from and to. On the other hand, materialize-incremental only needs one or no time stamp for end of the period. As explained above, because online store keeps the latest data with latest timestamp, materialize-incremental needs end time stamp to extract data from offline store. If there is no timestamp, it would be taken as current timestamp.  
```shell
$ feast materialize-incremental 2022-01-01T00:00:00
$ feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2020-01-02T00:00:00
```

# I. Next Action Items for the research

1. Configuring Offline and Online stores with Local provider
2. Connecting these local stores to the process of training and serving
3. Configuring Offline and Online stores with GCP provider
4. Applying stores to MLOps full cycle on GCP

