# Enterprise Feature Store

## WHAT

**“EFS is a <u>curated</u>, <u>production-ready</u> store of pre-calculated features, that is model-independent and significantly simplifies the journey from source data to actionable knowledge from advanced analytics by enabling automation and reuse.”**

## WHY

It is generally accepted within the analytics industry that **80% of model development time, is spent preparing data**. Any organization with a long history in Data Management and Analytics will recognize the benefits of the reusability of Data Products. With time to market a key driver behind the deployment of models, one of the key investments a company can make is to deploy an Enterprise Feature Store (EFS).

### Feature Catalog

The goal of a Feature Catalog is to **enable Feature reuse via discovery and governance**. It stores metadata about each Feature and how Features are related to each other. The Feature Catalog is not a Data Catalog and should not be considered a replacement for a general-purpose Data Catalog in an organization. Instead, it should be viewed as providing additional information on a subset of the data in the Data Catalog which is specific to Advanced Analytics users such as Data Scientists. Note that a Feature Catalog is also referred to as a **Feature Registry** in some popular open-source implementations such as Feast.

### Feature Storage

Feature Storage is a place for the **physical storage** of features within the Feature Store. Teradata Vantage is considered as primary storage for EFS. However, any kind of physical storage within Connected Data Store can be used for physical data model implementation (in that case, additional considerations such as the use of QueryGrid / NOS must be taken into account).

When considering the massive use of different Remote Features within a single Feature Group, **implications of Query Federation** such as dependency on the availability of source systems must be understood.

#### Offline store
An offline store is **to where features are stored (materialized)**. These features can then be used for analysis or training. Data can both be written and retrieved from an offline store.
#### Online store
An online store is where features are stored for **low-latency access**. And because they can be quickly retrieved, they make the process of inference faster

#### Remote Features
Remote Features are physically stored in remote system, registered in Feature Catalog, accessible via Data Virtualization / Query Federation tools such as QueryGrid

### SDKs and APIs

To provide an **open way to interact and manage the Feature Store**, there should be a set of standard APIs. Additionally, there should be several client SDKs for these APIs to provide programmatic access from multiple programming languages such as Python, R, and SQL (Procedures/UDFs). The SDKs should provide a consistent API for managing the Feature Catalogue (CRUD Operations), for simplifying the reading of training and scoring datasets for Data Scientists, and potentially for managing the actual feature data and data versioning requirements.

Example operations are:
* Create, Read, Update, Delete Feature Catalogue (Feature/Feature Group Metadata),
* Read training features for a specific point in time,
* Read scoring features for both batch and real-time.

### Feature Processor

The role of the Feature Processor is to **interpret the Feature definitions** from the Feature Catalog and process these into the Feature structure called the EFS Master. The EFS Master is a deep narrow table that holds the full history of Features that can be consumed. As part of the process, Features can be pivoted to the form of ADS for training and scoring/serving.

## Feast

Co-created by GO-JEK and Google Cloud, now governed by the Linux Foundation with Tecton as main contributor

... is one of many:

<img src='images/feature_store.png' width='800px'>

Comparison: https://mlops.community/learn/feature-store/

### Feast Architecture

<img src='images/Feast.png' width='1000px'>

## Feast Terminology

### Feature View
Feature views contain features that are properties of a specific object. It consists of a Data Source, entities, name, schema, metadata, and a TTL (time-to-live) which limits how far back Feast will look when generating historical datasets.
### Data source
A Data Source, as the name implies, is **where the feature data is stored**. This can be a parquet file (which is stored locally), a Teradata Vantage platform, a Google Cloud Platform bucket, Snowflake, Amazon Redshift, or an S3 bucket. Feast also allows you to have multiple data sources.
### Feature Service
Feature Services come into play when you want to create **logically related groups of feature views**. Hence, a feature service is an object that contains features from one or more feature views.
### Entity
A collection of semantically related features.
### Timestamps
In the context of Feast, timestamps are used to store the features in chronological order.

### Entity -< Data Source(s) == Feature View(s) >- Feature Services