## Introduction to Feature Store 

The purpose of this notebook is to demonstrate how you can quickly get started with Feature Store. In this notebook we focus on creating feature groups, and ingesting your data into it these feature groups which will be stored in your Feature Store.

The outline of this notebook is as follows:
* Set up
* Creating a Feature Group
* Ingest Data into Feature Group

Library Dependencies:
* sagemaker>=2.15.0
* numpy
* pandas

Note: You must attach the following policies to your execution role:

* AmazonSageMakerFullAccess
* AmazonS3FullAccess


![Feature Store Policy](images/feature-store-policy.png)

### Set up

In [None]:
import boto3
import pandas as pd
import numpy as np
import io
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role


prefix = 'sagemaker-featurestore-introduction'
role = get_execution_role()

region = boto3.Session().region_name
boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime
)
s3_bucket_name = feature_store_session.default_bucket()

### Ingest your data
In this notebook example we ingest synthetic data. We read from customer.csv and orders.csv.

In [None]:
customer_data = pd.read_csv("data/customer.csv")
orders_data = pd.read_csv("data/orders.csv")

In [None]:
customer_data.head()

In [None]:
orders_data.head()

Below is an illustration on the steps the data goes through before it is ingested into a Feature Store.

![Feature Store Policy](images/feature_store_data_ingest.svg)

### Creating Feature Groups

We first start by creating feature group names for customer_data and orders_data. Following this, we create two Feature Groups, one for customer_dat and another for orders_data

In [None]:
from time import gmtime, strftime, sleep

customers_feature_group_name = 'customers-feature-group-' + strftime('%d-%H-%M-%S', gmtime())
orders_feature_group_name = 'orders-feature-group-' + strftime('%d-%H-%M-%S', gmtime())

Instantiate a FeatureGroup object for customers_data and orders_data. 

In [None]:
from sagemaker.feature_store.feature_group import FeatureGroup

customers_feature_group = FeatureGroup(name=customers_feature_group_name, sagemaker_session=feature_store_session)
orders_feature_group = FeatureGroup(name=orders_feature_group_name, sagemaker_session=feature_store_session)

In [None]:
import time
current_time_sec = int(round(time.time()))

record_identifier_feature_name = "customer_id"

Append EventTime feature to your data frame. This parameter is required, and time stamps each data point.

In [None]:
customer_data["EventTime"] = pd.Series([current_time_sec]*len(customer_data), dtype="float64")
orders_data["EventTime"] = pd.Series([current_time_sec]*len(orders_data), dtype="float64")

Load feature definitions to your feature group. 

In [None]:
customers_feature_group.load_feature_definitions(data_frame=customer_data)
orders_feature_group.load_feature_definitions(data_frame=orders_data)

Below we call create to create two feature groups, customers_feature_group and orders_feature_group respectively

In [None]:
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True
)

orders_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True
)

To confirm that your FeatureGroup has been created we use DescribeFeatureGroup and ListFeatureGroups APIs to display the created FeatureGroup.

In [None]:
customers_feature_group.describe()

In [None]:
orders_feature_group.describe()

In [None]:
sagemaker_client.list_feature_groups() # We use the boto client to list FeatureGroups

### Ingest Data into FeatureGroup

After the FeatureGroups have been created, we can put data into the FeatureGroups by using the PutRecord API. It will take < 1min to ingest data both of these FeatureGroups.

In [None]:
def check_feature_group_status(feature_group):
    status = feature_group.describe().get("FeatureGroupStatus")
    while status == "Creating":
        print("Waiting for Feature Group to be Created")
        time.sleep(5)
        status = feature_group.describe().get("FeatureGroupStatus")
    print(f"FeatureGroup {feature_group.name} successfully created.")
    
check_feature_group_status(customers_feature_group)
check_feature_group_status(orders_feature_group)

In [None]:
customers_feature_group.ingest(
    data_frame=customer_data, max_workers=3, wait=True
)

In [None]:
orders_feature_group.ingest(
    data_frame=orders_data, max_workers=3, wait=True
)

#### Clean up
Here we remove the Feature Groups we created. 


In [None]:
customers_feature_group.delete()
orders_feature_group.delete()