# Feast: историческое извлечение и online-инференс (без `val_to_add`)

Этот ноутбук демонстрирует запрос признаков из **FeatureView** и **OnDemandFeatureView**:

- **Historical retrieval** (`get_historical_features`) — для подготовки датасета обучения (point-in-time join).
- **Online retrieval** (`get_online_features`) — для инференса с низкой задержкой.

Ожидается, что ваш репозиторий Feast находится в `feature_store/feature_repo` и уже содержит определения:

- `driver_quality_metrics` (FV)
- `driver_activity_metrics` (FV)
- `driver_realtime_features` (OnDemand FV)
- (опционально) FeatureService `driver_activity_v1`


In [1]:
import os
from datetime import datetime

import pandas as pd
from feast import FeatureStore

feature_repo_path = os.path.join("feature_store", "feature_repo")
raw_data_path = os.path.join(feature_repo_path, "data", "driver_stats.parquet")

print("Feature repo:", os.path.abspath(feature_repo_path))
print("Raw data:", os.path.abspath(raw_data_path))


Feature repo: /home/evgeniy/otus/otus-feature-store/feature_store/feature_repo
Raw data: /home/evgeniy/otus/otus-feature-store/feature_store/feature_repo/data/driver_stats.parquet


## 1) Быстрая проверка данных


In [2]:
df = pd.read_parquet(raw_data_path)
df.head(5)


Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2024-10-17 12:07:08.228578+00:00,1001,1.0,1.0,1000,2024-10-17 12:07:08.228581
1,2024-10-02 11:00:00+00:00,1005,0.429879,0.194598,582,2024-10-17 11:30:07.072000
2,2024-10-02 12:00:00+00:00,1005,0.230119,0.642878,551,2024-10-17 11:30:07.072000
3,2024-10-02 13:00:00+00:00,1005,0.1286,0.674187,38,2024-10-17 11:30:07.072000
4,2024-10-02 14:00:00+00:00,1005,0.400603,0.473636,583,2024-10-17 11:30:07.072000


## 2) Инициализация FeatureStore


In [3]:
store = FeatureStore(repo_path=feature_repo_path)
store


<feast.feature_store.FeatureStore at 0x7d8556639210>

## 3) Historical retrieval (для обучения)

Создаём `entity_df` — таблицу запросов: `driver_id` + `event_timestamp`.

Для on-demand признаков добавляем request-поля (контекст запроса): средние и стандартные отклонения.


In [4]:
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003],
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
        ],
        # (опционально) label — Feast не обрабатывает, просто переносит
        "label_driver_reported_satisfaction": [1, 5, 3],

        # request-поля для on-demand (контекст нормализации)
        "conv_rate_mean": [0.20, 0.20, 0.20],
        "conv_rate_std":  [0.05, 0.05, 0.05],
        "acc_rate_mean":  [0.80, 0.80, 0.80],
        "acc_rate_std":   [0.10, 0.10, 0.10],
    }
)
entity_df


Unnamed: 0,driver_id,event_timestamp,label_driver_reported_satisfaction,conv_rate_mean,conv_rate_std,acc_rate_mean,acc_rate_std
0,1001,2021-04-12 10:59:42,1,0.2,0.05,0.8,0.1
1,1002,2021-04-12 08:12:10,5,0.2,0.05,0.8,0.1
2,1003,2021-04-12 16:40:26,3,0.2,0.05,0.8,0.1


### 3.1 Запрос базовых признаков из двух FeatureView


In [5]:
training_df_basic = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_quality_metrics:conv_rate",
        "driver_quality_metrics:acc_rate",
        "driver_activity_metrics:avg_daily_trips",
    ],
).to_df()

print("--- basic schema ---")
print(training_df_basic.info())
training_df_basic.head()




--- basic schema ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 10 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           3 non-null      int64              
 1   event_timestamp                     3 non-null      datetime64[ns, UTC]
 2   label_driver_reported_satisfaction  3 non-null      int64              
 3   conv_rate_mean                      3 non-null      float64            
 4   conv_rate_std                       3 non-null      float64            
 5   acc_rate_mean                       3 non-null      float64            
 6   acc_rate_std                        3 non-null      float64            
 7   conv_rate                           3 non-null      float32            
 8   acc_rate                            3 non-null      float32            
 9   avg_daily_trips           

Unnamed: 0,driver_id,event_timestamp,label_driver_reported_satisfaction,conv_rate_mean,conv_rate_std,acc_rate_mean,acc_rate_std,conv_rate,acc_rate,avg_daily_trips
0,1001,2021-04-12 10:59:42+00:00,1,0.2,0.05,0.8,0.1,0.709758,0.692957,402
1,1002,2021-04-12 08:12:10+00:00,5,0.2,0.05,0.8,0.1,0.718295,0.584081,370
2,1003,2021-04-12 16:40:26+00:00,3,0.2,0.05,0.8,0.1,0.697411,0.19768,25


### 3.2 Запрос on-demand признаков (real-time трансформации) + базовые


In [6]:
training_df_with_od = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_quality_metrics:conv_rate",
        "driver_quality_metrics:acc_rate",
        "driver_activity_metrics:avg_daily_trips",
        "driver_realtime_features:conv_rate_z",
        "driver_realtime_features:acc_rate_z",
        "driver_realtime_features:workload_adjusted_quality",
    ],
).to_df()

print("--- with on-demand schema ---")
print(training_df_with_od.info())
training_df_with_od.head()




--- with on-demand schema ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 13 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           3 non-null      int64              
 1   event_timestamp                     3 non-null      datetime64[ns, UTC]
 2   label_driver_reported_satisfaction  3 non-null      int64              
 3   conv_rate_mean                      3 non-null      float64            
 4   conv_rate_std                       3 non-null      float64            
 5   acc_rate_mean                       3 non-null      float64            
 6   acc_rate_std                        3 non-null      float64            
 7   conv_rate                           3 non-null      float32            
 8   acc_rate                            3 non-null      float32            
 9   avg_daily_trips  

Unnamed: 0,driver_id,event_timestamp,label_driver_reported_satisfaction,conv_rate_mean,conv_rate_std,acc_rate_mean,acc_rate_std,conv_rate,acc_rate,avg_daily_trips,conv_rate_z,acc_rate_z,workload_adjusted_quality
0,1001,2021-04-12 10:59:42+00:00,1,0.2,0.05,0.8,0.1,0.709758,0.692957,402,10.195161,-1.070427,0.100449
1,1002,2021-04-12 08:12:10+00:00,5,0.2,0.05,0.8,0.1,0.718295,0.584081,370,10.365907,-2.159186,0.096095
2,1003,2021-04-12 16:40:26+00:00,3,0.2,0.05,0.8,0.1,0.697411,0.19768,25,9.948224,-6.023196,0.116841


### 3.3 (Опционально) Запрос через FeatureService

Если вы создали `FeatureService(name="driver_activity_v1", ...)`, можно получить консистентный набор так:


In [16]:
# Раскомментируйте, если сервис создан в feature_repo
# training_df_service = store.get_historical_features(
#     entity_df=entity_df,
#     features=store.get_feature_service("driver_activity_v1"),
# ).to_df()
# training_df_service.head()


## 4) Online retrieval (для инференса)

Для online-инференса **online store должен быть заполнен**.

Обычно это делается через:

1. `feast apply`
2. `feast materialize 2021-04-11T00:00:00 2021-04-13T00:00:00` (окно покрывает данные из примера)

Далее — пример чтения online фич.


In [7]:
online_features = store.get_online_features(
    features=[
        "driver_quality_metrics:conv_rate",
        "driver_quality_metrics:acc_rate",
        "driver_activity_metrics:avg_daily_trips",
    ],
    entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}],
).to_dict()

print("Online features for drivers 1001, 1002:")
for k, v in online_features.items():
    print(f"{k}: {v}")




Online features for drivers 1001, 1002:
driver_id: [1001, 1002]
conv_rate: [0.7097580432891846, 0.7182953357696533]
acc_rate: [0.6929572820663452, 0.5840813517570496]
avg_daily_trips: [402, 370]


## 5) Метаданные FeatureView


In [8]:
fv = store.get_feature_view("driver_quality_metrics")
print("Name:", fv.name)
print("Entities:", fv.entities)
print("TTL:", fv.ttl)
print("Online:", fv.online)
print("Features:", [f.name for f in fv.features])


Name: driver_quality_metrics
Entities: ['driver']
TTL: 1 day, 0:00:00
Online: True
Features: ['conv_rate', 'acc_rate']
