# Module 5: Load features into Online Store InMemory Option
---

**Note:** Please set kernel to `Python 3 (Data Science)` and select instance to `ml.t3.medium`

# Content
1. [Background](#Background)
1. [Setup](#Setup)
1. [Create Feature Group](#Create-Feature-Group)
1. [Bulk Ingest Data to the Online Store](#Bulk-Ingest-Data-to-the-Online-Store)
1. [Read and Write Records to Online Store](#Read-and-Write-Records-to-Online-Store)


# Background

In this example, we demonstrate how customers can use the [Feature Store Spark Connector](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-ingestion-spark-connector-setup.html) to ingest features directly to the SageMaker Feature Store, Online InMemory store. The Online InMemory store is an optional storage choice hosted on AWS Elasticache/REDIS, and can be used to achieve very low latency access to feature data. 

### Online-Only Feature Group

In the example notebook, we will create a new Feature Store Feature Group with Online StorageType set to `InMemory`. This will configure the Feature Group to use AWS Elasticache/REDIS for online storage, and will disable the Offline storage.

Note: Currently (October 2023), the Online InMemory storage tier does *not* allow Offline storage option, and does not replicate records to offline store.


# Setup

In [None]:
%%capture 

!pip install --upgrade sagemaker
!pip install --upgrade boto3

In [None]:
import sagemaker
import boto3

from time import gmtime, strftime, sleep
from random import randint

import pandas as pd
import numpy as np
import subprocess
import importlib
import logging
import time
import sys


In [None]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

In [None]:
# Print SDK library versions
print(f'Boto3 version: {boto3.__version__}')
print(f'Sagemaker version: {sagemaker.__version__}')

#### Essentials

In [None]:
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
region_name = sagemaker_session.boto_region_name
default_bucket = sagemaker_session.default_bucket()

print(region_name)
print(default_bucket)

sm_client = boto3.client('sagemaker')
fs_client = boto3.client('sagemaker-featurestore-runtime', region_name=region_name)

In [None]:
# S3 Location of data files
feature_store_prefix = 'sagemaker-feature-store'
workshop_prefix = 'fscw'
s3_feature_store_workshop_prefix = f'{feature_store_prefix}/{workshop_prefix}'

print(s3_feature_store_workshop_prefix)

### Read CSV file from module 05, orders.csv

We will use the "orders.csv" file locally created and stored here in module 05 directory.

In [None]:
orders_data_file = "../05-module-scalable-batch-ingestion/orders.csv"
orders_df = pd.read_csv(orders_data_file)

In [None]:
orders_df.shape

# Create Feature Group

First, create a new Feature Group with Online storage enabled and Offline storage disabled. To use the In-Memory tier, hosted on AWS Elasticache/REDIS, set the StorageType to `InMemory`. To disable the Offline store, simply remove the `OfflineStoreConfig` from the configuration when calling `create_feature_group`. 

For more information, please refer to the OnlineStoreConfig [documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OnlineStoreConfig.html).

In [None]:
feature_group_name = 'FG-online-only-inmemory'

In [None]:
# Configure OnlineStore 'InMemory' option for Elasticache/REDIS

sm_client.create_feature_group(
    FeatureGroupName=feature_group_name,
    RecordIdentifierFeatureName='order_id',
    EventTimeFeatureName='event_time',
    # StorageType = InMemory
    OnlineStoreConfig={
      'EnableOnlineStore': True,
      'StorageType': 'InMemory',
    },
    # No OfflineStoreConfig
    FeatureDefinitions=[
        {
            'FeatureName': 'order_id',
            'FeatureType': 'String'
        },
        {
            'FeatureName': 'customer_id',
            'FeatureType': 'String'
        },
        {
            'FeatureName': 'product_id',
            'FeatureType': 'String'
        },
        {
            'FeatureName': 'purchase_amount',
            'FeatureType': 'Fractional'
        },
        {
            'FeatureName': 'is_reordered',
            'FeatureType': 'Integral'
        },
        {
            'FeatureName': 'event_time',
            'FeatureType': 'String'
        },
        {
            'FeatureName': 'n_days_since_last_purchase',
            'FeatureType': 'Fractional'
        }        
    ],
    RoleArn=role
)

In [None]:
def wait_for_feature_group_creation_complete(feature_group_name):
    status = sm_client.describe_feature_group(FeatureGroupName=feature_group_name)['FeatureGroupStatus']
    print(f'Initial status: {status}')
    while status == 'Creating':
        logger.info(f'Waiting for feature group: {feature_group_name} to be created ...')
        time.sleep(60)
        status = sm_client.describe_feature_group(FeatureGroupName=feature_group_name)['FeatureGroupStatus']
    if status != 'Created':
        raise SystemExit(f'Failed to create feature group {feature_group_name}: {status}')
    logger.info(f'FeatureGroup {feature_group_name} was successfully created.')

In [None]:
wait_for_feature_group_creation_complete(feature_group_name)

# Bulk Ingest Data to the Online Store

We will create a [SageMaker Processing Job](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) which uses the Feature Store Spark Connector to ingest feature data from a Spark Dataframe directly into the online store.

To use the Feature Store Spark Connector in a Processing Job, we recommend extending the prebuilt SageMaker Spark Processing container as shown in the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-ingestion-spark-connector-setup.html#:~:text=Installation%20on%20a%20Amazon%20SageMaker%20Processing%20Job
). 

For this example, we will install the Spark Connector to a local directory and submit the required modules and Jar file when we run the processing job.

### Prepare Feature Store Pyspark library

The `sagemaker_feature_store_pyspark` library is available on github and can be used in Studio notebook by installing with `pip`. This library implements a Spark connector to Feature Store and enables extra functionality. We will be using the `FeatureStoreManager` class to ingest records to the Online-only InMemory store.

In [None]:
spark_version = '3.1' # MAJOR.MINOR

Install the Spark Connector under `./temp`.

In [None]:
%pip install sagemaker-feature-store-pyspark-{spark_version} -t ./temp --no-binary :all:

Zip up the required Python modules.

In [None]:
import zipfile
import os

zf = zipfile.ZipFile('feature_store_pyspark.zip', 'w', zipfile.ZIP_DEFLATED)

for f in os.listdir('./temp/feature_store_pyspark'):
    if f.endswith('.py'):
        zf.write(os.path.join('./temp/feature_store_pyspark', f), os.path.join('feature_store_pyspark', f))

zf.close()

Use `feature_store_pyspark.classpath_jars()` to get the absolute path to the Jar file.

In [None]:
from temp import feature_store_pyspark

jar_path = feature_store_pyspark.classpath_jars()[0]
jar_path

In [None]:
# Upload 'orders.csv' data file to S3

s3_feature_store_data_prefix = f'{s3_feature_store_workshop_prefix}/data'
s3_uri_upload_prefix = f's3://{default_bucket}/{s3_feature_store_data_prefix}'
sagemaker.s3.S3Uploader.upload(orders_data_file, s3_uri_upload_prefix)

s3_uri_full_csv_path = os.path.join(s3_uri_upload_prefix, "orders.csv")
print(f'\nUploaded CSV file to S3 location: {s3_uri_full_csv_path}')


Run a processing job using `scripts/ingest_to_online_store.py` and include the zipped Python modules and Jar file.

In [None]:
from sagemaker.spark.processing import PySparkProcessor

spark_processor = PySparkProcessor(
    base_job_name='sm-processing-pyspark-fs-ingestion',
    framework_version=spark_version,
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    max_runtime_in_seconds=1200, 
    env={'AWS_DEFAULT_REGION': boto3.Session().region_name, 'mode': 'python'}
)

spark_processor.run(
    submit_app='./scripts/ingest_to_online_store.py',
    arguments=[
        '--feature_group_name', feature_group_name,
        '--region_name', region_name,
        '--s3_uri_csv_path', s3_uri_full_csv_path
    ],
    logs=False,
    submit_jars=[jar_path],
    submit_py_files=[
        './feature_store_pyspark.zip'
    ]
)

# Read and Write Records to Online Store

Next, we verify that orders data is available in the Online store. Then we read, modify, and write back a record from Online store.

In [None]:
# Read one record from Online InMemory Store
response = fs_client.get_record(
    FeatureGroupName=feature_group_name,
    RecordIdentifierValueAsString='O1'
)
record = response['Record']
record

### Modify and write the record back to Online store

In [None]:
# Modify two fields of retrieved orders record 
for feature in record:
    if feature['FeatureName'] == 'purchase_amount':
        feature['ValueAsString'] = '99.99'
    if feature['FeatureName'] == 'customer_id':
        feature['ValueAsString'] = 'C9999'


In [None]:
# Write updated record back to Online store
fs_client.put_record(FeatureGroupName=feature_group_name, Record=record)

Verify that the latest feature data is available in the online store.

In [None]:
response = fs_client.get_record(
    FeatureGroupName=feature_group_name,
    RecordIdentifierValueAsString='O1'
)
record = response['Record']
record

### Use Batch Get Record call to test retrieval of multiple records from the online store.

In [None]:
# Call Batch Get Record to read multiple records from Online store
fs_client.batch_get_record(
    Identifiers=[{
        'FeatureGroupName': feature_group_name,
        'RecordIdentifiersValueAsString': ['O1', 'O2', 'O3', 'O4', 'O5']
    }]
)

# Cleanup (optional)

To delete the Feature Group created in this notebook, uncomment and run the code below

In [None]:
# Delete Feature Group, Online-Only storage

#response = sm_client.delete_feature_group(FeatureGroupName=feature_group_name)
#response