# Module 1: Introduction to SageMaker Feature Store

**Note:** Please set kernel to `Python 3 (Data Science)` and select instance to `ml.t3.medium`

---

## Contents

1. [Background](#Background)
1. [Setup](#Setup)
1. [Load and explore datasets](#Load-and-explore-datasets)
1. [Create feature definitions and groups](#Create-feature-definitions-and-groups)
1. [Ingest data into feature groups](#Ingest-data-into-feature-groups)
1. [Get feature record from the Online feature store](#Get-feature-record-from-the-Online-feature-store)
1. [List feature groups](#List-feature-groups)

# Background

이 노트북에서는 SageMaker Feature Store에서 `customers`, `products`, 그리고 `orders` 데이터셋에 대한 **3**개의 피처 그룹을 생성하는 방법을 배우게 됩니다. <BR>
그 다음, SageMaker Python SDK를 사용하여 생성된 피처 그룹(온라인 및 오프라인 스토어 모두)에 피처 컬럼을 수집하는 방법을 학습합니다. <BR>
또한 온라인 스토어에서 수집된 피처 레코드를 가져오는 방법도 볼 수 있습니다. <BR>
마지막으로, Feature Store 내에 생성된 모든 피처 그룹을 나열하고 삭제하는 방법을 알게 될 것입니다. <BR>
    
**참고:** 이 노트북에서 생성된 피처 그룹은 앞으로의 모듈에서 사용될 예정입니다.


# Setup

#### Imports

In [65]:
from sagemaker.feature_store.feature_group import FeatureGroup
from time import gmtime, strftime, sleep
from random import randint
import pandas as pd
import numpy as np
import subprocess
import sagemaker
import importlib
import logging
import time
import sys
from sagemaker.feature_store.inputs import TableFormatEnum

In [66]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

In [67]:
logger.info(f'Using SageMaker version: {sagemaker.__version__}')
logger.info(f'Using Pandas version: {pd.__version__}')

Using SageMaker version: 2.224.4
Using SageMaker version: 2.224.4
Using Pandas version: 2.1.4
Using Pandas version: 2.1.4


#### Essentials

In [68]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()
logger.info(f'Default S3 bucket = {default_bucket}')
prefix = 'sagemaker-feature-store'

Default S3 bucket = sagemaker-us-east-1-419974056037
Default S3 bucket = sagemaker-us-east-1-419974056037


In [69]:
region = sagemaker_session.boto_region_name
region

'us-east-1'

# Load and explore datasets

In [70]:
customers_df = pd.read_csv('./data/customers.csv')
customers_df.head(5)

Unnamed: 0,customer_id,sex,is_married,event_time,age_18-29,age_30-39,age_40-49,age_50-59,age_60-69,age_70-plus,n_days_active
0,C1,0,0,2024-07-04T22:32:32.766Z,False,False,False,True,False,False,0.083562
1,C2,1,0,2024-07-04T22:32:32.767Z,True,False,False,False,False,False,0.659589
2,C3,1,1,2024-07-04T22:32:32.769Z,False,False,False,False,True,False,0.402055
3,C4,1,0,2024-07-04T22:32:32.770Z,False,False,False,True,False,False,0.708904
4,C5,1,1,2024-07-04T22:32:32.772Z,False,True,False,False,False,False,0.765753


In [71]:
customers_df.dtypes

customer_id       object
sex                int64
is_married         int64
event_time        object
age_18-29           bool
age_30-39           bool
age_40-49           bool
age_50-59           bool
age_60-69           bool
age_70-plus         bool
n_days_active    float64
dtype: object

## dtype 변환

**Data types for feature store**

- **String Feature Type:** Strings are Unicode with UTF-8 binary encoding. The minimum length of a string can be zero, the maximum length is constrained by the maximum size of a record.

- **Fractional Feature Type:** Fractional feature values must conform to a double precision floating point number as defined by the IEEE 754 standard.

- **Integral Feature Type:** Feature Store supports integral values in the range of a 64-bit signed integer. Minimum value of -263 and a maximum value: 263 - 1.

- **Event Time Features:** All feature groups have an event time feature with nanosecond precision. Any event time with lower than nanosecond precision will lead to backwards incompatibility. The feature can have a feature type of either String or Fractional.

    - A string event time is accepted in **ISO-8601** format, in UTC time, conforming to the pattern(s): `[yyyy-MM-dd'T'HH:mm:ssZ, yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSZ]`.

    - A fractional event time value is accepted as seconds from unix epoch. Event times must be in the range of [0000-01-01T00:00:00.000000000Z, 9999-12-31T23:59:59.999999999Z]. For feature groups in the Iceberg table format, you can only use String type for the event time.

In [72]:
customers_df = customers_df.convert_dtypes(infer_objects=True, convert_boolean=False)

convert_dtypes는 dypte 자동 추론해주는 기능 (https://wikidocs.net/151423)

In [73]:
customers_df['customer_id'] = customers_df['customer_id'].astype('string')
customers_df['event_time'] = customers_df['event_time'].astype('string')

In [74]:
customers_df.dtypes

customer_id      string[python]
sex                       Int64
is_married                Int64
event_time       string[python]
age_18-29                 Int64
age_30-39                 Int64
age_40-49                 Int64
age_50-59                 Int64
age_60-69                 Int64
age_70-plus               Int64
n_days_active           Float64
dtype: object

In [75]:
products_df = pd.read_csv('./data/products.csv')
products_df.head(5)

Unnamed: 0,product_id,event_time,category_baby_food_formula,category_baking_ingredients,category_candy_chocolate,category_chips_pretzels,category_cleaning_products,category_coffee,category_cookies_cakes,category_crackers,...,category_hair_care,category_ice_cream_ice,category_juice_nectars,category_packaged_cheese,category_refrigerated,category_soup_broth_bouillon,category_spices_seasonings,category_tea,category_vitamins_supplements,category_yogurt
0,P1,2024-07-04T22:34:32.391Z,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False
1,P2,2024-07-04T22:34:32.392Z,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
2,P3,2024-07-04T22:34:32.392Z,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
3,P4,2024-07-04T22:34:32.392Z,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,P5,2024-07-04T22:34:32.392Z,False,False,False,False,False,False,False,False,...,False,False,True,False,False,False,False,False,False,False


In [76]:
products_df = products_df.convert_dtypes(infer_objects=True, convert_boolean=False)

In [77]:
products_df['product_id'] = products_df['product_id'].astype('string')
products_df['event_time'] = products_df['event_time'].astype('string')

In [78]:
products_df.dtypes

product_id                       string[python]
event_time                       string[python]
category_baby_food_formula                Int64
category_baking_ingredients               Int64
category_candy_chocolate                  Int64
category_chips_pretzels                   Int64
category_cleaning_products                Int64
category_coffee                           Int64
category_cookies_cakes                    Int64
category_crackers                         Int64
category_energy_granola_bars              Int64
category_frozen_meals                     Int64
category_hair_care                        Int64
category_ice_cream_ice                    Int64
category_juice_nectars                    Int64
category_packaged_cheese                  Int64
category_refrigerated                     Int64
category_soup_broth_bouillon              Int64
category_spices_seasonings                Int64
category_tea                              Int64
category_vitamins_supplements           

In [79]:
orders_df = pd.read_csv('./data/orders.csv')
orders_df

Unnamed: 0,order_id,customer_id,product_id,purchase_amount,is_reordered,event_time,n_days_since_last_purchase
0,O1,C5731,P16,0.913465,1,2024-07-04T22:35:15.644Z,0.201550
1,O2,C3541,P12802,0.663168,1,2024-07-04T22:35:15.644Z,0.575581
2,O3,C7402,P8320,0.629604,1,2024-07-04T22:35:15.644Z,0.073643
3,O4,C7356,P5165,0.618911,0,2024-07-04T22:35:15.644Z,0.552326
4,O5,C5806,P12940,0.053168,1,2024-07-04T22:35:15.644Z,0.616279
...,...,...,...,...,...,...,...
99995,O99996,C7167,P10590,0.896040,0,2024-07-04T22:35:20.940Z,0.044574
99996,O99997,C3642,P6210,0.129109,1,2024-07-04T22:35:20.940Z,0.589147
99997,O99998,C6145,P5740,0.825050,1,2024-07-04T22:35:20.940Z,0.798450
99998,O99999,C7567,P14942,0.602772,1,2024-07-04T22:35:20.940Z,0.775194


In [80]:
orders_df = orders_df.convert_dtypes(infer_objects=True, convert_boolean=False)

In [81]:
orders_df['order_id'] = orders_df['order_id'].astype('string')
orders_df['customer_id'] = orders_df['customer_id'].astype('string')
orders_df['product_id'] = orders_df['product_id'].astype('string')
orders_df['event_time'] = orders_df['event_time'].astype('string')

In [82]:
orders_df.dtypes

order_id                      string[python]
customer_id                   string[python]
product_id                    string[python]
purchase_amount                      Float64
is_reordered                           Int64
event_time                    string[python]
n_days_since_last_purchase           Float64
dtype: object

In [83]:
customers_count = customers_df.shape[0]
%store customers_count
products_count = products_df.shape[0]
%store products_count
orders_count = orders_df.shape[0]
%store orders_count

Stored 'customers_count' (int)
Stored 'products_count' (int)
Stored 'orders_count' (int)


In [84]:
print (f'customers_count: {customers_count}, products_count: {products_count}, orders_count:{orders_count}')

customers_count: 10000, products_count: 17001, orders_count:100000


# Create feature definitions and groups

## Feature group name

In [85]:
import pytz
from datetime import datetime 

In [221]:
def get_korea_time():
    # 한국 시간대 설정
    korea_tz = pytz.timezone('Asia/Seoul')
    
    # 현재 시간을 한국 시간으로 변환
    korea_time = datetime.now(korea_tz).strftime('%m-%d-%H-%M')
        
    return korea_time

def get_event_time():
    korea_tz = pytz.timezone('Asia/Seoul')
    korea_time = datetime.now(korea_tz).strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z"

    return korea_time

In [None]:
from datetime import datetime
import pytz

# 현재 시간을 UTC로 가져오기
utc_now = datetime.now(pytz.UTC)

# ISO 8601 형식으로 포맷팅
formatted_time = utc_now.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z"

print(formatted_time)

In [87]:
current_timestamp = get_korea_time()

In [88]:
# prefix to track all the feature groups created as part of feature store workshop (fsw)
fs_prefix = 'fsw-' 

In [89]:
customers_feature_group_name = f'{fs_prefix}customers-{current_timestamp}'
%store customers_feature_group_name
products_feature_group_name = f'{fs_prefix}products-{current_timestamp}'
%store products_feature_group_name
orders_feature_group_name = f'{fs_prefix}orders-{current_timestamp}'
%store orders_feature_group_name

Stored 'customers_feature_group_name' (str)
Stored 'products_feature_group_name' (str)
Stored 'orders_feature_group_name' (str)


In [90]:
logger.info(f'Customers feature group name = {customers_feature_group_name}')
logger.info(f'Products feature group name = {products_feature_group_name}')
logger.info(f'Orders feature group name = {orders_feature_group_name}')

Customers feature group name = fsw-customers-07-14-13-54
Customers feature group name = fsw-customers-07-14-13-54
Products feature group name = fsw-products-07-14-13-54
Products feature group name = fsw-products-07-14-13-54
Orders feature group name = fsw-orders-07-14-13-54
Orders feature group name = fsw-orders-07-14-13-54


## Feature groups

In [91]:
customers_feature_group = FeatureGroup(
    name=customers_feature_group_name,
    sagemaker_session=sagemaker_session
)
products_feature_group = FeatureGroup(
    name=products_feature_group_name,
    sagemaker_session=sagemaker_session
)
orders_feature_group = FeatureGroup(
    name=orders_feature_group_name,
    sagemaker_session=sagemaker_session
)

### Feature group definitions (dtypes..)
 - df를 넣어 줌으로써, df에 정의된 dtype으로 feature type을 셋팅

In [95]:
customers_feature_group.load_feature_definitions(data_frame=customers_df)

[FeatureDefinition(feature_name='customer_id', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='sex', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='is_married', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='event_time', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='age_18-29', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='age_30-39', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='age_40-49', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='age_50-59', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='age_60-69

In [96]:
products_feature_group.load_feature_definitions(data_frame=products_df)

[FeatureDefinition(feature_name='product_id', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='event_time', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='category_baby_food_formula', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='category_baking_ingredients', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='category_candy_chocolate', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='category_chips_pretzels', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='category_cleaning_products', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='category_coffee', feature_type=<FeatureTypeEn

In [97]:
orders_feature_group.load_feature_definitions(data_frame=orders_df)

[FeatureDefinition(feature_name='order_id', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='customer_id', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='product_id', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='purchase_amount', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='is_reordered', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>, collection_type=None),
 FeatureDefinition(feature_name='event_time', feature_type=<FeatureTypeEnum.STRING: 'String'>, collection_type=None),
 FeatureDefinition(feature_name='n_days_since_last_purchase', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None)]

### Creat feature groups

Amazon SageMaker Feature Store는 오프라인 스토어를 위해 AWS Glue와 Apache Iceberg 테이블 형식을 지원합니다. 새로운 feature group을 생성할 때 테이블 형식을 선택할 수 있습니다.
이 노트북에서는 Iceberg 테이블 형식을 사용할 것입니다. Apache Iceberg를 사용하여 feature를 저장하면 Iceberg 테이블 압축의 이점을 활용하여 ML 학습 데이터셋을 추출할 때 더 빠른 쿼리 성능을 제공함으로써 모델 개발을 가속화합니다. feature group의 설계와 규모에 따라 이 새로운 기능을 사용하면 학습 쿼리 성능이 10배에서 100배까지 향상될 수 있습니다.
Glue 테이블 형식을 사용해야 하는 경우, 아래 변수를 'Glue'로 업데이트하세요. 오프라인 스토어 형식에 대한 자세한 정보는 문서를 참조하십시오."
이 텍스트는 SageMaker Feature Store의 오프라인 스토어 옵션에 대해 설명하고 있으며, 특히 Apache Iceberg 형식의 이점을 강조하고 있습니다. 또한 사용자가 필요에 따라 Glue 형식을 선택할 수 있는 옵션도 제공하고 있습니다.

https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-storage-configurations-offline-store.html
Glue 테이블 형식
Glue 형식(기본값)은 AWS Glue를 위한 표준 Hive 타입 테이블 형식입니다. AWS Glue를 사용하면 여러 소스에서 데이터를 발견, 준비, 이동 및 통합할 수 있습니다. 또한 작업 작성, 실행 및 비즈니스 워크플로우 구현을 위한 추가적인 생산성 및 데이터 운영 도구를 포함합니다. AWS Glue에 대한 자세한 정보는 AWS Glue란 무엇인가?를 참조하세요.
Iceberg 테이블 형식
Iceberg 형식(권장)은 매우 큰 분석 테이블을 위한 개방형 테이블 형식입니다. Iceberg를 사용하면 파티션 내의 작은 데이터 파일들을 더 적은 수의 큰 파일로 압축할 수 있어, 쿼리 속도가 크게 향상됩니다. 이 압축 작업은 동시에 이루어지며 피처 그룹의 진행 중인 읽기 및 쓰기 작업에 영향을 미치지 않습니다. Iceberg 테이블 최적화에 대한 자세한 정보는 Amazon Athena 및 AWS Lake Formation 사용 안내서를 참조하세요.
Iceberg는 대규모 파일 모음을 테이블로 관리하며 현대적인 분석 데이터 레이크 작업을 지원합니다. 새 피처 그룹을 만들 때 Iceberg 옵션을 선택하면, Amazon SageMaker Feature Store는 Parquet 파일 형식을 사용하여 Iceberg 테이블을 생성하고, 이 테이블을 AWS Glue Data Catalog에 등록합니다. Iceberg 테이블 형식에 대한 자세한 정보는 Apache Iceberg 테이블 사용하기를 참조하세요.

생성 시 **enable_online_store=True**로 해야 oline store enalblemnt 된다.

In [98]:
table_format_param = 'ICEBERG' # or 'GLUE'

In [99]:
if table_format_param == 'ICEBERG':
    table_format = TableFormatEnum.ICEBERG
else:
    table_format = TableFormatEnum.GLUE

In [100]:
def wait_for_feature_group_creation_complete(feature_group):
    status = feature_group.describe().get('FeatureGroupStatus')
    print(f'Initial status: {status}')
    while status == 'Creating':
        logger.info(f'Waiting for feature group: {feature_group.name} to be created ...')
        time.sleep(5)
        status = feature_group.describe().get('FeatureGroupStatus')
    if status != 'Created':
        raise SystemExit(f'Failed to create feature group {feature_group.name}: {status}')
    logger.info(f'FeatureGroup {feature_group.name} was successfully created.')

In [101]:
customers_feature_group.create(
    s3_uri=f's3://{default_bucket}/{prefix}', 
    record_identifier_name='customer_id', ## key 값 
    event_time_feature_name='event_time', ## event time (timestamp)
    role_arn=role, ## sagemaker role
    enable_online_store=True, ## oline store enalblemnt
    table_format=table_format 
)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fsw-customers-07-14-13-54',
 'ResponseMetadata': {'RequestId': '62b551e3-e81f-48a8-8f28-20844e2f63b1',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '62b551e3-e81f-48a8-8f28-20844e2f63b1',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '102',
   'date': 'Sun, 14 Jul 2024 04:56:29 GMT'},
  'RetryAttempts': 0}}

In [102]:
wait_for_feature_group_creation_complete(customers_feature_group)

Waiting for feature group: fsw-customers-07-14-13-54 to be created ...
Waiting for feature group: fsw-customers-07-14-13-54 to be created ...


Initial status: Creating


Waiting for feature group: fsw-customers-07-14-13-54 to be created ...
Waiting for feature group: fsw-customers-07-14-13-54 to be created ...
Waiting for feature group: fsw-customers-07-14-13-54 to be created ...
Waiting for feature group: fsw-customers-07-14-13-54 to be created ...
FeatureGroup fsw-customers-07-14-13-54 was successfully created.
FeatureGroup fsw-customers-07-14-13-54 was successfully created.


In [103]:
products_feature_group.create(
    s3_uri=f's3://{default_bucket}/{prefix}', 
    record_identifier_name='product_id', 
    event_time_feature_name='event_time', 
    role_arn=role, 
    enable_online_store=True,
    table_format=TableFormatEnum.ICEBERG # or 'GLUE'
 )

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fsw-products-07-14-13-54',
 'ResponseMetadata': {'RequestId': '5fdcc0d9-21bd-46ac-b333-d065db2ac86d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '5fdcc0d9-21bd-46ac-b333-d065db2ac86d',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '101',
   'date': 'Sun, 14 Jul 2024 04:59:19 GMT'},
  'RetryAttempts': 0}}

In [104]:
wait_for_feature_group_creation_complete(products_feature_group)

Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...


Initial status: Creating


Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...
Waiting for feature group: fsw-products-07-14-13-54 to be created ...
FeatureGroup fsw-products-07-14-13-54 was successfully created.
FeatureGroup fsw-products-07-14-13-54 was successfully created.


In [105]:
orders_feature_group.create(
    s3_uri=f's3://{default_bucket}/{prefix}', 
    record_identifier_name='order_id', 
    event_time_feature_name='event_time', 
    role_arn=role, 
    enable_online_store=True,
    table_format=TableFormatEnum.ICEBERG # or 'GLUE'
)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fsw-orders-07-14-13-54',
 'ResponseMetadata': {'RequestId': 'f84dcf9f-cc25-42a0-8c6a-f571c40c0bb2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f84dcf9f-cc25-42a0-8c6a-f571c40c0bb2',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '99',
   'date': 'Sun, 14 Jul 2024 04:59:40 GMT'},
  'RetryAttempts': 0}}

In [106]:
wait_for_feature_group_creation_complete(orders_feature_group)

Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...


Initial status: Creating


Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
Waiting for feature group: fsw-orders-07-14-13-54 to be created ...
FeatureGroup fsw-orders-07-14-13-54 was successfully created.
FeatureGroup fsw-orders-07-14-13-54 was successfully created.


# Ingest data into feature groups 

## Using ingest(): 대량의 데이터를 Feature Store에 효율적으로 수집하는 데 사용됩니다.
- data는 s3에 저장됩니다. (저장 경로는 feature group 생성(create()) 시 셋팅할 수 있습니다)
- top으로 보여주기

In [107]:
%%time

logger.info(f'Ingesting data into feature group: {customers_feature_group.name} ...')
customers_feature_group.ingest(
    data_frame=customers_df,
    max_processes=16, # the number of processes will be created to work on different partitions of the data_frame in parallel
    wait=True
)
logger.info(f'{len(customers_df)} customer records ingested into feature group: {customers_feature_group.name}')

Ingesting data into feature group: fsw-customers-07-14-13-54 ...
Ingesting data into feature group: fsw-customers-07-14-13-54 ...
10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


CPU times: user 280 ms, sys: 275 ms, total: 555 ms
Wall time: 15.3 s


In [108]:
%%time

logger.info(f'Ingesting data into feature group: {products_feature_group.name} ...')
products_feature_group.ingest(
    data_frame=products_df,
    max_processes=16,
    wait=True
)
logger.info(f'{len(products_df)} product records ingested into feature group: {products_feature_group.name}')  

Ingesting data into feature group: fsw-products-07-14-13-54 ...
Ingesting data into feature group: fsw-products-07-14-13-54 ...
17001 product records ingested into feature group: fsw-products-07-14-13-54
17001 product records ingested into feature group: fsw-products-07-14-13-54


CPU times: user 356 ms, sys: 309 ms, total: 665 ms
Wall time: 27.5 s


In [109]:
%%time

logger.info(f'Ingesting data into feature group: {orders_feature_group.name} ...')
orders_feature_group.ingest(
    data_frame=orders_df,
    max_processes=16,
    wait=True
)
logger.info(f'{len(orders_df)} order records ingested into feature group: {orders_feature_group.name}')

Ingesting data into feature group: fsw-orders-07-14-13-54 ...
Ingesting data into feature group: fsw-orders-07-14-13-54 ...
100000 order records ingested into feature group: fsw-orders-07-14-13-54
100000 order records ingested into feature group: fsw-orders-07-14-13-54


CPU times: user 1.87 s, sys: 309 ms, total: 2.18 s
Wall time: 2min 7s


## Using put_record(): 단일 레코드를 Feature Store에 추가하는 데 사용됩니다.

In [185]:
featurestore_runtime_client = sagemaker_session.boto_session.client(
    'sagemaker-featurestore-runtime',
    region_name=region
)

In [226]:
record = [
    {'FeatureName': 'customer_id', 'ValueAsString': 'C19841011'},
    {'FeatureName': 'sex', 'ValueAsString': '0'},
    {'FeatureName': 'is_married', 'ValueAsString': '0'},
    {'FeatureName': 'event_time', 'ValueAsString': get_event_time()},
    {'FeatureName': 'age_18-29', 'ValueAsString': '0'},
    {'FeatureName': 'age_30-39', 'ValueAsString': '1'},
    {'FeatureName': 'age_40-49', 'ValueAsString': '0'},
    {'FeatureName': 'age_50-59', 'ValueAsString': '0'},
    {'FeatureName': 'age_60-69', 'ValueAsString': '0'},
    {'FeatureName': 'age_70-plus', 'ValueAsString': '0'},
    {'FeatureName': 'n_days_active', 'ValueAsString': '0.19841013'}]
record

[{'FeatureName': 'customer_id', 'ValueAsString': 'C19841011'},
 {'FeatureName': 'sex', 'ValueAsString': '0'},
 {'FeatureName': 'is_married', 'ValueAsString': '0'},
 {'FeatureName': 'event_time', 'ValueAsString': '2024-07-14T15:13:46.689Z'},
 {'FeatureName': 'age_18-29', 'ValueAsString': '0'},
 {'FeatureName': 'age_30-39', 'ValueAsString': '1'},
 {'FeatureName': 'age_40-49', 'ValueAsString': '0'},
 {'FeatureName': 'age_50-59', 'ValueAsString': '0'},
 {'FeatureName': 'age_60-69', 'ValueAsString': '0'},
 {'FeatureName': 'age_70-plus', 'ValueAsString': '0'},
 {'FeatureName': 'n_days_active', 'ValueAsString': '0.19841013'}]

In [227]:
featurestore_runtime_client.put_record(
    FeatureGroupName=customers_feature_group_name,
    Record=record,
    TargetStores=['OnlineStore', 'OfflineStore']
)
    

{'ResponseMetadata': {'RequestId': '73738a57-3610-4719-bba5-440ff7f21d2c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '73738a57-3610-4719-bba5-440ff7f21d2c',
   'content-type': 'application/json',
   'content-length': '0',
   'date': 'Sun, 14 Jul 2024 06:13:50 GMT'},
  'RetryAttempts': 0}}

# Get feature record from the Online feature store 

In [228]:
featurestore_runtime_client = sagemaker_session.boto_session.client(
    'sagemaker-featurestore-runtime',
    region_name=region
)

Retrieve a record from customers feature group

In [229]:
customer_id =  f'C{randint(1, 10000)}'
customer_id = "C19841011"
logger.info(f'customer_id={customer_id}') 

customer_id=C19841011
customer_id=C19841011


### Using get_record(): 단건 조회

In [230]:
feature_record = featurestore_runtime_client.get_record(
    FeatureGroupName=customers_feature_group_name, 
    RecordIdentifierValueAsString=customer_id
)
feature_record

{'ResponseMetadata': {'RequestId': '30461c02-87aa-4e00-8190-55943069b6cd',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '30461c02-87aa-4e00-8190-55943069b6cd',
   'content-type': 'application/json',
   'content-length': '876',
   'date': 'Sun, 14 Jul 2024 06:13:55 GMT'},
  'RetryAttempts': 0},
 'Record': [{'FeatureName': 'customer_id', 'ValueAsString': 'C19841011'},
  {'FeatureName': 'sex', 'ValueAsString': '0'},
  {'FeatureName': 'is_married', 'ValueAsString': '0'},
  {'FeatureName': 'event_time', 'ValueAsString': '2024-07-14T15:13:46.689Z'},
  {'FeatureName': 'age_18-29', 'ValueAsString': '0'},
  {'FeatureName': 'age_30-39', 'ValueAsString': '1'},
  {'FeatureName': 'age_40-49', 'ValueAsString': '0'},
  {'FeatureName': 'age_50-59', 'ValueAsString': '0'},
  {'FeatureName': 'age_60-69', 'ValueAsString': '0'},
  {'FeatureName': 'age_70-plus', 'ValueAsString': '0'},
  {'FeatureName': 'n_days_active', 'ValueAsString': '0.19841013'}]}

### Using batch_get_record(): 다건 조회

In [123]:
customer_ids = ['C19841010', 'C19841011', 'C9']

In [124]:
feature_records = featurestore_runtime_client.batch_get_record(
    Identifiers=[
        {
            'FeatureGroupName':customers_feature_group_name, 
            'RecordIdentifiersValueAsString':customer_ids
        }
    ]
)

feature_records


{'ResponseMetadata': {'RequestId': '4cd28f5e-4dc8-4c1f-a74a-83604332c282',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4cd28f5e-4dc8-4c1f-a74a-83604332c282',
   'content-type': 'application/json',
   'content-length': '1983',
   'date': 'Sun, 14 Jul 2024 05:06:06 GMT'},
  'RetryAttempts': 0},
 'Records': [{'FeatureGroupName': 'fsw-customers-07-14-13-54',
   'RecordIdentifierValueAsString': 'C19841011',
   'Record': [{'FeatureName': 'customer_id', 'ValueAsString': 'C19841011'},
    {'FeatureName': 'sex', 'ValueAsString': '0'},
    {'FeatureName': 'is_married', 'ValueAsString': '0'},
    {'FeatureName': 'event_time', 'ValueAsString': '2024-07-04T22:32:36.987Z'},
    {'FeatureName': 'age_18-29', 'ValueAsString': '0'},
    {'FeatureName': 'age_30-39', 'ValueAsString': '1'},
    {'FeatureName': 'age_40-49', 'ValueAsString': '0'},
    {'FeatureName': 'age_50-59', 'ValueAsString': '0'},
    {'FeatureName': 'age_60-69', 'ValueAsString': '0'},
    {'FeatureName': 'age_70-plu

# Query to offline store

## Initialize boto3 runtime

In [125]:
import boto3
from utils import Utils

In [126]:
boto_session = boto3.Session(region_name=region)
sagemaker_client = boto_session.client(
    service_name='sagemaker',
    region_name=region
)

featurestore_runtime = boto_session.client(
    service_name='sagemaker-featurestore-runtime',
    region_name=region
)

feature_store_session = sagemaker.Session(
    boto_session=boto_session, 
    sagemaker_client=sagemaker_client, 
    sagemaker_featurestore_runtime_client=featurestore_runtime
)

### feature group setting

In [127]:
customers_fg = FeatureGroup(name=customers_feature_group_name, sagemaker_session=feature_store_session)  
products_fg = FeatureGroup(name=products_feature_group_name, sagemaker_session=feature_store_session)
orders_fg = FeatureGroup(name=orders_feature_group_name, sagemaker_session=feature_store_session)

### Build Athena join query to combine the 3 features groups - `customers`, `products` & `orders`

In [138]:
customers_query = customers_fg.athena_query()
customers_table = customers_query.table_name

products_query = products_fg.athena_query()
products_table = products_query.table_name

orders_query = orders_fg.athena_query()
orders_table = orders_query.table_name

In [139]:
print (f'customers_table: {customers_table}\nproducts_table: {products_table}\norders_table: {orders_table}')

customers_table: fsw_customers_07_14_13_54_1720932989
products_table: fsw_products_07_14_13_54_1720933159
orders_table: fsw_orders_07_14_13_54_1720933180


In [140]:
query_string = f'SELECT * FROM "{customers_table}", "{products_table}", "{orders_table}" ' \
               f'WHERE ("{orders_table}"."customer_id" = "{customers_table}"."customer_id") ' \
               f'AND ("{orders_table}"."product_id" = "{products_table}"."product_id")'
%store query_string
print (query_string)

Stored 'query_string' (str)
SELECT * FROM "fsw_customers_07_14_13_54_1720932989", "fsw_products_07_14_13_54_1720933159", "fsw_orders_07_14_13_54_1720933180" WHERE ("fsw_orders_07_14_13_54_1720933180"."customer_id" = "fsw_customers_07_14_13_54_1720932989"."customer_id") AND ("fsw_orders_07_14_13_54_1720933180"."product_id" = "fsw_products_07_14_13_54_1720933159"."product_id")


In [211]:
query_string = f'SELECT * FROM "{customers_table}"'

In [142]:
query_results= 'sagemaker-featurestore-workshop'
output_location = f's3://{default_bucket}/{query_results}/query_results/'
print(f'Athena query output location: \n{output_location}')

Athena query output location: 
s3://sagemaker-us-east-1-419974056037/sagemaker-featurestore-workshop/query_results/


### Check to see if data is available in offline store.

In [205]:
# Before extracting the data we need to check if the feature store was populated
offline_store_contents = None
while offline_store_contents is None:    
    customers_total_record_count = Utils.get_historical_record_count(customers_feature_group_name)
    products_total_record_count = Utils.get_historical_record_count(products_feature_group_name)
    orders_total_record_count = Utils.get_historical_record_count(orders_feature_group_name)
    if customers_total_record_count >= customers_count and \
        products_total_record_count >= products_count and \
        orders_total_record_count >= orders_count:
        logger.info('[Features are available in Offline Store!]')
        offline_store_contents = orders_total_record_count
    else:
        logger.info('[Waiting for data in Offline Store ...]')
        time.sleep(60)

[Features are available in Offline Store!]
[Features are available in Offline Store!]


### Run Athena query and load the output as a Pandas dataframe.

Amazon SageMaker Feature Store에서 `write_time`, `api_invocation_time`, `is_deleted` 세 컬럼은 시스템에서 자동으로 관리되는 메타데이터 컬럼입니다. 각각의 의미는 다음과 같습니다:

1. `write_time`:
   - 레코드가 Feature Store에 실제로 기록된 시간을 나타냅니다.
   - Unix 타임스탬프 형식으로 저장됩니다.
   - 이 값은 데이터가 Feature Store에 성공적으로 저장된 정확한 시점을 나타냅니다.

2. `api_invocation_time`:
   - Feature Store API가 호출된 시간을 나타냅니다.
   - Unix 타임스탬프 형식으로 저장됩니다.
   - 이 값은 클라이언트가 데이터를 추가하거나 업데이트하기 위해 API를 호출한 시점을 나타냅니다.
   - `write_time`과 약간의 차이가 있을 수 있습니다. (네트워크 지연 등으로 인해)

3. `is_deleted`:
   - 레코드의 삭제 상태를 나타내는 플래그입니다.
   - 불리언 값(true/false)으로 저장됩니다.
   - Feature Store는 실제로 데이터를 삭제하지 않고, 이 플래그를 사용하여 삭제된 레코드를 표시합니다.
   - 이를 통해 데이터의 이력을 유지하면서도 현재 유효한 데이터만 쉽게 조회할 수 있습니다.

이 세 컬럼의 중요성:

1. 데이터 일관성: `write_time`과 `api_invocation_time`을 비교하여 데이터 쓰기 지연을 모니터링할 수 있습니다.

2. 감사 및 추적: 이 메타데이터를 통해 데이터의 변경 이력을 추적할 수 있습니다.

3. 시간 기반 쿼리: 특정 시점의 데이터 상태를 조회하거나 시간에 따른 데이터 변화를 분석할 때 유용합니다.

4. 논리적 삭제: `is_deleted` 플래그를 사용하여 실제로 데이터를 삭제하지 않고도 삭제된 것처럼 처리할 수 있습니다.

이러한 메타데이터 컬럼은 Feature Store에서 자동으로 관리되므로, 사용자가 직접 값을 설정하거나 수정할 필요가 없습니다. 데이터 분석, 모델 학습, 그리고 Feature Store의 관리 및 모니터링에 이 정보를 활용할 수 있습니다.

In [212]:
orders_query.run(query_string=query_string, output_location=output_location)
orders_query.wait()
joined_df = orders_query.as_dataframe()
joined_df.query('customer_id == "C19841011"')

Unnamed: 0,write_time,api_invocation_time,is_deleted,customer_id,sex,is_married,event_time,age_18-29,age_30-39,age_40-49,age_50-59,age_60-69,age_70-plus,n_days_active
51,2024-07-14 05:10:54.535000 UTC,2024-07-14 05:05:55.000000 UTC,False,C19841011,0,0,2024-07-04T22:32:36.987Z,0,1,0,0,0,0,0.19841
3055,2024-07-14 05:59:54.779000 UTC,2024-07-14 05:55:02.000000 UTC,False,C19841011,0,0,2024-07-04T22:32:36.987Z,0,1,0,0,0,0,0.19841
4908,2024-07-14 05:59:54.779000 UTC,2024-07-14 05:55:47.000000 UTC,False,C19841011,0,0,2024-07-05T22:32:36.987Z,0,1,0,0,0,0,0.19841


Amazon SageMaker Feature Store의 오프라인 스토어에서 `api_invocation_time`과 `write_time` 사이에 5분 정도의 차이가 나는 것은 비정상적이지 않습니다. 이는 오프라인 스토어의 특성과 설계 때문입니다. 이에 대해 자세히 설명해 드리겠습니다:

1. 오프라인 스토어의 특성:
   - 오프라인 스토어는 일반적으로 S3에 데이터를 저장합니다.
   - 데이터는 배치 처리 방식으로 S3에 기록됩니다.

2. 데이터 일관성 모델:
   - 오프라인 스토어는 최종 일관성(eventual consistency) 모델을 따릅니다.
   - 이는 데이터가 즉시 반영되지 않고, 일정 시간이 지난 후에 반영됨을 의미합니다.

3. 배치 처리 주기:
   - Amazon은 오프라인 스토어의 데이터를 주기적으로 업데이트합니다.
   - 이 주기는 일반적으로 몇 분에서 15분 정도입니다.

4. 5분의 지연이 정상인 이유:
   - 이는 배치 처리 주기 내에 있는 시간입니다.
   - 오프라인 스토어는 실시간 처리가 목적이 아니라, 대규모 분석이나 모델 학습을 위한 것이므로 이 정도의 지연은 허용됩니다.

5. 온라인 스토어와의 차이:
   - 온라인 스토어는 거의 실시간으로 데이터를 업데이트합니다.
   - 오프라인 스토어는 비용 효율성과 대규모 데이터 처리를 위해 약간의 지연을 허용합니다.

6. 사용 시 고려사항:
   - 실시간성이 중요한 경우 온라인 스토어를 사용해야 합니다.
   - 대규모 배치 처리나 분석에는 오프라인 스토어가 적합합니다.

7. 모니터링 및 최적화:
   - 5분 이상의 지연이 지속적으로 발생한다면 AWS 지원팀에 문의하는 것이 좋습니다.
   - 대부분의 경우, 이 정도의 지연은 정상적인 작동 범위 내에 있습니다.

결론적으로, 오프라인 스토어에서 5분 정도의 시간 차이는 정상적인 범위 내에 있습니다. 이는 오프라인 스토어의 설계 목적과 일관성 모델에 부합하는 동작입니다. 실시간 데이터 접근이 필요한 경우에는 온라인 스토어를 사용하고, 대규모 데이터 분석이나 모델 학습에는 오프라인 스토어를 활용하는 것이 좋습니다.

In [64]:
joined_df.columns

Index(['write_time', 'api_invocation_time', 'is_deleted', 'customer_id', 'sex',
       'is_married', 'event_time', 'age_18-29', 'age_30-39', 'age_40-49',
       'age_50-59', 'age_60-69', 'age_70-plus', 'n_days_active',
       'write_time.1', 'api_invocation_time.1', 'is_deleted.1', 'product_id',
       'event_time.1', 'category_baby_food_formula',
       'category_baking_ingredients', 'category_candy_chocolate',
       'category_chips_pretzels', 'category_cleaning_products',
       'category_coffee', 'category_cookies_cakes', 'category_crackers',
       'category_energy_granola_bars', 'category_frozen_meals',
       'category_hair_care', 'category_ice_cream_ice',
       'category_juice_nectars', 'category_packaged_cheese',
       'category_refrigerated', 'category_soup_broth_bouillon',
       'category_spices_seasonings', 'category_tea',
       'category_vitamins_supplements', 'category_yogurt', 'write_time.2',
       'api_invocation_time.2', 'is_deleted.2', 'order_id', 'customer_id.1'

# Moving fron Offline to Online

In [213]:
query_string = f'SELECT * FROM "{customers_table}"'

가장 최신의 데이터만 남기는 쿼리

In [214]:
query_string = """
                WITH ranked_data AS (
                  SELECT *,
                         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_time DESC) AS row_num
                  FROM fsw_customers_07_14_13_54_1720932989
                )
                SELECT *
                FROM ranked_data
                WHERE row_num = 1
               """

In [215]:
customers_query.run(query_string=query_string, output_location=output_location)
customers_query.wait()

In [216]:
customers_query.as_dataframe().query('customer_id == "C19841011"')

Unnamed: 0,write_time,api_invocation_time,is_deleted,customer_id,sex,is_married,event_time,age_18-29,age_30-39,age_40-49,age_50-59,age_60-69,age_70-plus,n_days_active,row_num
3061,2024-07-14 05:59:54.779000 UTC,2024-07-14 05:55:47.000000 UTC,False,C19841011,0,0,2024-07-05T22:32:36.987Z,0,1,0,0,0,0,0.19841,1


In [220]:
dfs = customers_query.as_dataframe(chunksize=1000)
for df in dfs:
    cols = [col for col in df.columns if col not in ['write_time', 'api_invocation_time', 'is_deleted', 'row_num']]
    df = df[cols]
    print (f'Shape: {df.shape}')
    customers_feature_group.ingest(
        data_frame=df,
        max_processes=16, # the number of processes will be created to work on different partitions of the data_frame in parallel
        wait=False
    )
    logger.info(f'{len(customers_df)} customer records ingested into feature group: {customers_feature_group.name}')

Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (1000, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


Shape: (2, 11)


10000 customer records ingested into feature group: fsw-customers-07-14-13-54
10000 customer records ingested into feature group: fsw-customers-07-14-13-54


# Clean up

In [235]:
# Delete Feature Group, Online-Only storage
feature_groups = [customers_feature_group_name, products_feature_group_name, orders_feature_group_name]
for feature_group_name in feature_groups:
    try:
        response = sagemaker_client.delete_feature_group(FeatureGroupName=feature_group_name)
    except:
        pass

# List feature groups 
Since we created all of our feature groups with a common name pattern, we'll just list all the ones that have our same month and day (e.g., 04-13).

In [50]:
import sys
sys.path.append('..')
from utilities.feature_store_helper import FeatureStore
fs = FeatureStore()

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [51]:
fs.list_feature_groups(current_timestamp[0:5])

[{'FeatureGroupName': 'fscw-products-07-04-22-39',
  'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fscw-products-07-04-22-39',
  'CreationTime': datetime.datetime(2024, 7, 4, 22, 44, 33, 388000, tzinfo=tzlocal()),
  'FeatureGroupStatus': 'Created'},
 {'FeatureGroupName': 'fscw-orders-07-04-22-39',
  'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fscw-orders-07-04-22-39',
  'CreationTime': datetime.datetime(2024, 7, 4, 22, 44, 55, 243000, tzinfo=tzlocal()),
  'FeatureGroupStatus': 'Created'},
 {'FeatureGroupName': 'fscw-customers-07-04-22-39',
  'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:419974056037:feature-group/fscw-customers-07-04-22-39',
  'CreationTime': datetime.datetime(2024, 7, 4, 22, 44, 8, 560000, tzinfo=tzlocal()),
  'FeatureGroupStatus': 'Created'}]