# Centralized Feature Repository with Amazon SageMaker Feature Store

#### Amazon SageMaker Feature Store is a managed repository with capabilities to store, update, retrieve, and share features. SageMaker Feature Store provides the ability to reuse the engineered features in two different scenarios. First, the features can be shared between the training and inference phases of a single ML project resulting in consistent model inputs and reduced training-serving skew. Second, features from SageMaker.

## Creating feature groups

#### In Amazon SageMaker Feature Store, features are stored in a collection called a feature group. A feature group, in turn, is composed of records of features and feature values. Each record is a collection of feature values, identified by a unique RecordIdentifier value. Every record belonging to a feature group will use the same feature as RecordIdentifier. For example, the record identifier for the feature store created for the weather data could be parameter_id or location_id. Think of RecordIdentifier as a primary key for the feature group. Using this primary key, you can query feature groups for the fast lookup of features. It's also important to note that each record of a feature group must, at a minimum, contain a RecordIdentifier and an event time feature. The event time feature is identified by EventTimeFeatureName when a feature group is set up. When a feature record is ingested into a feature group, SageMaker adds three features – is_deleted, api_invocation time, and write_time – for each feature record. is_deleted is used to manage the deletion of records, api_invocation_time is the time when the API call is invoked to write a record to a feature store, and write_time is the time when the feature record is persisted to the offline store.

#### While each feature group is managed and scaled independently, you can search and discover features from multiple feature groups as long as the appropriate access is in place.

#### When you create a feature store group with SageMaker, you can choose to enable an offline store, online store, or both. When both online and offline stores are enabled, the service replicates the online store contents into the offline store maintained in Amazon S3.

In [None]:
import boto3
import pandas as pd
import numpy as np
import io
import sagemaker
import sys
import json
import time
from time import gmtime, strftime, sleep

from sagemaker.session import Session
from sagemaker import get_execution_role

from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.feature_store.feature_group import FeatureDefinition
from sagemaker.feature_store.feature_group import FeatureTypeEnum

prefix = 'sagemaker-featurestore-weather'
role = get_execution_role()

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
s3_bucket_name = sagemaker_session.default_bucket()

#Create the service clients
sagemaker_fs_runtime_client = sagemaker_session.boto_session.client('sagemaker-featurestore-runtime')
sagemaker_runtime = sagemaker_session.boto_session.client('sagemaker-runtime')
sagemaker_client = sagemaker_session.boto_session.client('sagemaker')
s3_client = boto3.client('s3', region_name=region)



In [None]:
#Feature group name
location_feature_group_name_offline = 'location-feature-group-offline-' + strftime('%d-%H-%M-%S', gmtime())
location_feature_group_name_online = 'location-feature-group-online-' + strftime('%d-%H-%M-%S', gmtime())
location_feature_group_name_offline_online = 'location-feature-group-offline-online-' + strftime('%d-%H-%M-%S', gmtime())

##Create FeatureDefinitions
fd_location=FeatureDefinition(feature_name='location', feature_type=FeatureTypeEnum('Fractional'))
fd_value=FeatureDefinition(feature_name='city', feature_type=FeatureTypeEnum('Fractional'))
fd_is_mobile=FeatureDefinition(feature_name='ismobile', feature_type=FeatureTypeEnum('Integral'))
fd_source_name=FeatureDefinition(feature_name='sourcename', feature_type=FeatureTypeEnum('Fractional'))
fd_source_type=FeatureDefinition(feature_name='sourcetype', feature_type=FeatureTypeEnum('Fractional'))
fd_event_time=FeatureDefinition(feature_name='EventTime', feature_type=FeatureTypeEnum('Fractional'))

location_feature_definitions = []
location_feature_definitions.append(fd_location)
location_feature_definitions.append(fd_value)
location_feature_definitions.append(fd_is_mobile)
location_feature_definitions.append(fd_source_name)
location_feature_definitions.append(fd_source_type)
location_feature_definitions.append(fd_event_time)

weather_feature_definitions = []
weather_feature_definitions.append(fd_location)
weather_feature_definitions.append(fd_event_time)

##Define unique identifier
record_identifier_feature_name = "location"


#Create offline feature group
location_feature_group_offline = FeatureGroup(name=location_feature_group_name_offline, 
                                     feature_definitions=location_feature_definitions,
                                     sagemaker_session=sagemaker_session)

location_feature_group_offline.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name="location",
    event_time_feature_name="EventTime",
    role_arn=role,
    tags=[{'Key':'project','Value':'weather-prediction'}]
)

#Describe the feature group
location_feature_group_offline.describe()


#Create offline + online feature group
#Note the usage of enable_online_store parameter
location_feature_group_offline_online = FeatureGroup(name=location_feature_group_name_offline_online, 
                                     feature_definitions=location_feature_definitions,
                                     sagemaker_session=sagemaker_session)

location_feature_group_offline_online.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name="location",
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    tags=[{'Key':'project','Value':'weather-prediction'}]
)

#Describe the feature group
location_feature_group_offline_online.describe()

#Create online feature group
#Note s3_uri flag set to False for the online only FG
location_feature_group_online = FeatureGroup(name=location_feature_group_name_online, 
                                     feature_definitions=location_feature_definitions,
                                     sagemaker_session=sagemaker_session)

location_feature_group_online.create(
    s3_uri=False,
    record_identifier_name="location", 
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    tags=[{'Key':'project','Value':'weather-prediction'}]
)

#Describe the feature group
location_feature_group_online.describe()

#  List all featuregroups
sagemaker_client.list_feature_groups()