# Music Recommendation Example
## Creating Feature Store for User Ratings

This notebook creates a Feature Store and feature group for the music ratings, and ingests data into them. The feature groups are stored in the Feature Store.

Feature groups are resources that contain metadata for all data stored in your Feature Store. A feature group is a logical grouping of features, defined in the feature store to describe records. A feature group’s definition is composed of a list of feature definitions, a record identifier name, and configurations for its online and offline store. 

### Overview
1. Set up
2. Creating a feature group
3. Ingest data into a feature group

### Set up

In [None]:
%pip install -qU 'sagemaker' 's3fs'

In [None]:
# SageMaker Python SDK version 2.x is required
import sagemaker
import pandas as pd



In [None]:
import sys
import pprint
sys.path.insert(1, './code')
from parameter_store import ParameterStore
ps = ParameterStore()

parameter = ps.read('music-rec')
pprint.pprint(parameter)

dw_ecrlist = parameter['dw_ecrlist']
fg_name_tracks = parameter['fg_name_tracks']
flow_export_id = parameter['flow_export_id']
flow_s3_uri = parameter['flow_s3_uri']
model_path = parameter['model_path']
prefix = parameter['prefix']
ratings_data_source = parameter['ratings_data_source']
tracks_data_source = parameter['tracks_data_source']

"""
s3_output_path = parameter['s3_output_path']
train_data_uri = parameter['train_data_uri']
training_job_name = parameter['training_job_name']
val_data_uri = parameter['val_data_uri']
fg_name_ratings = parameter['fg_name_ratings']
fg_name_user_preferences = parameter['fg_name_user_preferences']
"""

In [None]:
import boto3
import pandas as pd
import numpy as np
import io
from sagemaker.session import Session
from sagemaker import get_execution_role
import s3fs

role = get_execution_role()

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
s3_bucket_name = sagemaker_session.default_bucket()

### Inspect your data
In this notebook example we ingest synthetic data. We read from S3.

In [None]:
## TO DO: Replace with public bucket
ratings_s3_uri = f's3://{s3_bucket_name}/{prefix}/ratings.csv'
ratings_data = pd.read_csv(ratings_s3_uri)
print(ratings_data.shape)

In [None]:
ratings_data.head()

### Create a feature group

We first start by creating feature group names for ratings_data. Following this, we create a Feature Groups for `ratings`

In [None]:
from time import gmtime, strftime, sleep

ratings_feature_group_name = 'ratings-feature-group-' + strftime('%d-%H-%M-%S', gmtime())
print(ratings_feature_group_name)

In [None]:
fg_name_ratings = ratings_feature_group_name  

ps.add({'fg_name_ratings': fg_name_ratings}, namespace='music-rec')
ps.store()

In [None]:
import time
current_time_sec = int(round(time.time()))

record_identifier_feature_name = "ratingEventId"

In [None]:
column_schemas = [
    {
        "name": "ratingEventId",
        "type": "string"
    },
    {
        "name": "ts",
        "type": "float"
    },
    {
        "name": "userId",
        "type": "long"
    },
    {
        "name": "trackId",
        "type": "string"
    },
    {
        "name": "sessionId",
        "type": "long"
    },
    {
        "name": "itemInSession",
        "type": "long"
    },
    {
        "name": "Rating",
        "type": "float"
    }
]

In [None]:
from sagemaker.feature_store.feature_definition import FeatureDefinition
from sagemaker.feature_store.feature_definition import FeatureTypeEnum

default_feature_type = FeatureTypeEnum.STRING
column_to_feature_type_mapping = {
    "float": FeatureTypeEnum.FRACTIONAL,
    "long": FeatureTypeEnum.INTEGRAL
}

feature_definitions = [
    FeatureDefinition(
        feature_name=column_schema['name'], 
        feature_type=column_to_feature_type_mapping.get(column_schema['type'], default_feature_type)
    ) for column_schema in column_schemas
]

Instantiate a FeatureGroup object for ratings_data.

In [None]:
from sagemaker.feature_store.feature_group import FeatureGroup

ratings_feature_group = FeatureGroup(name=ratings_feature_group_name, 
                                     sagemaker_session=sagemaker_session,
                                     feature_definitions=feature_definitions
                                    )

Load feature definitions to your feature group. 

In [None]:
# ratings_feature_group.load_feature_definitions(data_frame=ratings_data)

Below we call create to create ratings_feature_group

In [None]:
ratings_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="ts",
    role_arn=role,
    enable_online_store=True
)

To confirm that your FeatureGroup has been created we use `DescribeFeatureGroup` and `ListFeatureGroups` APIs to display the created FeatureGroup.

In [None]:
ratings_feature_group.describe()

In [None]:
sagemaker_session.boto_session.client('sagemaker', region_name=region).list_feature_groups() # We use the boto client to list FeatureGroups

### Ingest data into a feature group

After the FeatureGroups have been created, we can put data into the FeatureGroups by using the `PutRecord` API. It will take < 1min to ingest data both of these FeatureGroups.

In [None]:
def check_feature_group_status(feature_group):
    status = feature_group.describe().get("FeatureGroupStatus")
    while status == "Creating":
        print("Waiting for Feature Group to be Created")
        time.sleep(5)
        status = feature_group.describe().get("FeatureGroupStatus")
    print(f"FeatureGroup {feature_group.name} successfully created.")
    
check_feature_group_status(ratings_feature_group)

In [None]:
ratings_feature_group.ingest(
    data_frame=ratings_data, max_workers=3, wait=True
)

In [None]:
print('Data successfully ingested')

### Clean up
Here we remove the Feature Groups we created. 

In [None]:
# ratings_feature_group.delete()