<a id='02a-nb'></a>

# Music Recommender 
## Part 2a: Feature Store Creation - Tracks
----

This notebook creates a feature group for our tracks data to place in our feature store using the transformation instructions found in our `.flow` file. [Amazon SageMaker Feature Store](https://www.youtube.com/watch?v=pEg5c6d4etI) is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features.

Features are the attributes or properties models use during training and inference to make predictions. For example, in a ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and how long songs were listened to. The accuracy of a ML model is based on a precise set and composition of features. Often, these features are used repeatedly by multiple teams training multiple models. And whichever feature set was used to train the model needs to be available to make real-time predictions (inference). Keeping a single source of features that is consistent and up-to-date across these different access patterns is a challenge as most organizations keep two different feature stores, one for training and one for inference.

Amazon SageMaker Feature Store is a purpose-built repository where you can store and access features so it’s much easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. SageMaker Feature Store keeps track of the metadata of stored features (e.g. feature name or version number) so that you can query the features for the right attributes in batches or in real time using Amazon Athena, an interactive query service. SageMaker Feature Store also keeps features updated, because as new data is generated during inference, the single repository is updated so new features are always available for models to use during training and inference.


----
### Contents
- [Overview](00_overview_arch_data.ipynb)
- [Part 1: Data Prep using Data Wrangler](01_music_dataprep.flow)
- [Part 2a: Feature Store Creation - Tracks](02a_export_fg_tracks.ipynb)
    - [Define Feature Group](#02a-define-fg)
    - [Configure Feature Group](#02a-config-fg)
    - [Initialize & Create Feature Group](#02a-init-create-fg)
    - [Inputs and Outputs](#02a-input-output)
    - [Upload flow file](#02a-upload-flow)
    - [Run Processing Job](#02a-run-job)
- [Part 2b: Feature Store Creation - User Preferences](02b_export_fg_5star_features.ipynb)
- [Part 2c: Feature Store Creation - Ratings](02c_fg_create_ratings.ipynb)
- [Part 3: Train Model with Debugger Hooks. Set Artifacts and Register Model.](03_train_model_lineage_registry_debugger.ipynb)
- [Part 4: Deploy Model & Inference using Online Feature Store](04_deploy_infer_explain.ipynb)
- [Part 5: Model Monitor](05_model_monitor.ipynb)
- [Part 6: SageMaker Pipelines](06_pipeline.ipynb)

<div class="alert alert-info"> 💡 <strong> Quick Start </strong>
To save your processed data to feature store, <strong><a style="color: #0397a7 " href="#Create-Feature-Group">
    <u>Click here to create a feature group</u></a> and follow the instruction to run a SageMaker processing job.
</strong>
</div>

This notebook uses Amazon SageMaker Feature Store (Feature Store) to create a feature group, 
executes your Data Wrangler Flow `00_music_dataprep.flow` on the entire dataset using a SageMaker 
Processing Job and ingest processed data to Feature Store. 


In [1]:
%store -r
%store

Stored variables and their in-db values:
prefix             -> 'music-recommendation'


## Create Feature Group

_What is a feature group_

A single feature corresponds to a column in your dataset. A feature group is a predefined schema for a 
collection of features - each feature in the feature group has a specified data type and name. 
A single record in a feature group corresponds to a row in your dataframe. A feature store is a 
collection of feature groups. To learn more about SageMaker Feature Store, see 
[Amazon Feature Store Documentation](http://docs.aws.amazon.com/sagemaker/latest/dg/feature-store.html).

<a id='02a-define-fg'></a>

### Define Feature Group 
##### [back to top](#02a-nb)
----
Select Record identifier and Event time feature name. These are required parameters for feature group
creation.
* **Record identifier name** is the name of the feature defined in the feature group's feature definitions 
whose value uniquely identifies a Record defined in the feature group's feature definitions.
* **Event time feature name** is the name of the EventTime feature of a Record in FeatureGroup. An EventTime 
is a timestamp that represents the point in time when a new event occurs that corresponds to the creation or 
update of a Record in the FeatureGroup. All Records in the FeatureGroup must have a corresponding EventTime.

<div class="alert alert-info"> 💡Record identifier and Event time feature name are required 
for feature group. After filling in the values, you can choose <b>Run Selected Cell and All Below</b> 
from the Run Menu from the menu bar. 
</div>

In [2]:
!pip install sagemaker boto3 --upgrade --quiet

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


In [3]:
import sys
import pprint
sys.path.insert(1, './code')
from parameter_store import ParameterStore
ps = ParameterStore()




Loading : 

{'music-rec': {'bucket': 'sagemaker-us-west-2-738335684114',
               'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
               'prefix': 'music-recommendation',
               'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
               'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv'}}


In [4]:
parameters = ps.read('music-rec')
pprint.pprint(parameters)


Reading : music-rec

{'music-rec': {'bucket': 'sagemaker-us-west-2-738335684114',
               'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
               'prefix': 'music-recommendation',
               'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
               'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv'}}
{'bucket': 'sagemaker-us-west-2-738335684114',
 'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
 'prefix': 'music-recommendation',
 'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
 'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv'}


In [5]:
bucket = parameters['bucket']
prefix = parameters['prefix']
model_path = parameters['model_path']
ratings_data_source = parameters['ratings_data_source']
tracks_data_source = parameters['tracks_data_source']

In [6]:
record_identifier_feature_name = 'trackId'
if record_identifier_feature_name is None:
   raise SystemExit("Select a column name as the feature group record identifier.")

event_time_feature_name = 'featureTimestamp'
if event_time_feature_name is None:
   raise SystemExit("Select a column name as the event time feature name.")

### Feature Definitions
The following is a list of the feature names and feature types of the final dataset that will be produced 
when your data flow is used to process your input dataset. These are automatically generated from the 
step `Custom Pyspark` from `Source: Answers.Csv`. To save from a different step, go to Data Wrangler to 
select a new step to export.

<div class="alert alert-info"> 💡 <strong> Configurable Settings </strong>

1. You can select a subset of the features. By default all columns of the result dataframe will be used as 
features.
2. You can change the Data Wrangler data type to one of the Feature Store supported types 
(<b>Integral</b>, <b>Fractional</b>, or <b>String</b>). The default type is set to <b>String</b>. 
This means that, if a column in your dataset is not a <b>float</b> or <b>long</b> type, it will default 
to <b>String</b> in your Feature Store.

For <b>Event Time</b> features, make sure the format follows the feature store
<strong>
    <a style="color: #0397a7 " href="https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-quotas.html#feature-store-data-types">
    <u>Event Time feature format</u>
    </a>
</strong>
</div>

The following is a list of the feature names and data types of the final dataset that will be produced when your data flow is used to process your input dataset.

In [7]:
column_schemas = [
    {
        "name": "trackId",
        "type": "string"
    },
    {
        "name": "length",
        "type": "float"
    },
    {
        "name": "energy",
        "type": "float"
    },
    {
        "name": "acousticness",
        "type": "float"
    },
    {
        "name": "valence",
        "type": "float"
    },
    {
        "name": "speechiness",
        "type": "float"
    },
    {
        "name": "instrumentalness",
        "type": "float"
    },
    {
        "name": "liveness",
        "type": "float"
    },
    {
        "name": "tempo",
        "type": "float"
    },
    {
        "name": "genre_Folk",
        "type": "float"
    },
    {
        "name": "genre_Country",
        "type": "float"
    },
    {
        "name": "genre_Latin",
        "type": "float"
    },
    {
        "name": "genre_Jazz",
        "type": "float"
    },
    {
        "name": "genre_RnB",
        "type": "float"
    },
    {
        "name": "genre_Reggae",
        "type": "float"
    },
    {
        "name": "genre_Rap",
        "type": "float"
    },
    {
        "name": "genre_Pop_Rock",
        "type": "float"
    },
    {
        "name": "genre_Electronic",
        "type": "float"
    },
    {
        "name": "genre_Blues",
        "type": "float"
    },
    {
        "name": "featureTimestamp",
        "type": "float"
    },
    {
        "name": "danceability",
        "type": "float"
    }
]

Below we create the SDK input for those feature definitions. Some schema types in Data Wrangler are not 
supported by Feature Store. The following will create a default_FG_type set to String for these types.

In [8]:
from sagemaker.feature_store.feature_definition import FeatureDefinition
from sagemaker.feature_store.feature_definition import FeatureTypeEnum

default_feature_type = FeatureTypeEnum.STRING
column_to_feature_type_mapping = {
    "float": FeatureTypeEnum.FRACTIONAL,
    "long": FeatureTypeEnum.INTEGRAL
}

feature_definitions = [
    FeatureDefinition(
        feature_name=column_schema['name'], 
        feature_type=column_to_feature_type_mapping.get(column_schema['type'], default_feature_type)
    ) for column_schema in column_schemas
]

<a id='02a-config-fg'></a>

## Configure Feature Group
##### [back to top](#02a-nb)
----
<div class="alert alert-info"> 💡 <strong> Configurable Settings </strong>

1. <b>feature_group_name</b>: name of the feature group.
1. <b>feature_store_offline_s3_uri</b>: SageMaker FeatureStore writes the data in the OfflineStore of a FeatureGroup to a S3 location owned by you.
1. <b>enable_online_store</b>: controls if online store is enabled. Enabling the online store allows quick access to the latest value for a Record via the GetRecord API.
1. <b>iam_role</b>: IAM role for executing the processing job.
</div>

In [9]:
from time import gmtime, strftime
import uuid
import sagemaker 
# IAM role for executing the processing job.
iam_role = sagemaker.get_execution_role()


In [10]:



# Sagemaker session
sess = sagemaker.Session()

# You can configure this with your own bucket name, e.g.
# bucket = <my-own-storage-bucket>
bucket = sess.default_bucket()

# flow name and an unique ID for this export (used later as the processing job name for the export)
flow_name = "01_music_dataprep"
flow_export_id = f"{strftime('%d-%H-%M-%S', gmtime())}-{str(uuid.uuid4())[:8]}"
flow_export_name = f"flow-{flow_export_id}"


model_path

# feature group name, with flow_name and an unique id. You can give it a customized name
feature_group_name = f'track-features-{flow_export_id}'
print(f"Feature Group Name: {feature_group_name}")

# SageMaker FeatureStore writes the data in the OfflineStore of a FeatureGroup to a 
# S3 location owned by you.
feature_store_offline_s3_uri = 's3://' + bucket

# controls if online store is enabled. Enabling the online store allows quick access to 
# the latest value for a Record via the GetRecord API.
enable_online_store = True

Feature Group Name: track-features-20-21-27-19-43cfaf71


In [11]:
ps.add({'flow_export_id': flow_export_id}, namespace='music-rec')

Updating Params : 

{'flow_export_id': '20-21-27-19-43cfaf71'}


In [13]:
parasm = ps.read('music-rec')

Reading : music-rec

{'music-rec': {'bucket': 'sagemaker-us-west-2-738335684114',
               'flow_export_id': '20-21-27-19-43cfaf71',
               'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
               'prefix': 'music-recommendation',
               'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
               'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv'}}


In [14]:
dw_ecrlist = {
    'region':{'us-west-2':'174368400705',
              'us-east-2':'415577184552',
              'us-west-1':'926135532090'
             }
}


ps.add({'dw_ecrlist': dw_ecrlist}, namespace='music-rec')


flow_export_id

Updating Params : 

{'dw_ecrlist': {'region': {'us-east-2': '415577184552',
                           'us-west-1': '926135532090',
                           'us-west-2': '174368400705'}}}


'20-21-27-19-43cfaf71'

In [15]:
fg_name_tracks = feature_group_name

ps.add({'fg_name_tracks': fg_name_tracks}, namespace='music-rec')

print(fg_name_tracks)

Updating Params : 

{'fg_name_tracks': 'track-features-20-21-27-19-43cfaf71'}
track-features-20-21-27-19-43cfaf71


<a id='02a-init-create-fg'></a>

### Initialize & Create Feature Group
##### [back to top](#02a-nb)
----

In [16]:
# Initialize Boto3 session that is required to create feature group
import boto3
from sagemaker.session import Session

region = boto3.Session().region_name
boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime
)

Feature group is initialized and created below

In [17]:
from sagemaker.feature_store.feature_group import FeatureGroup

feature_group = FeatureGroup(
    name=feature_group_name, sagemaker_session=feature_store_session, feature_definitions=feature_definitions)

# only create feature group if it doesn't already exist
try:
    sagemaker_client.describe_feature_group(FeatureGroupName=feature_group_name, NextToken='string')
    print("Feature Group {0} already exists. Using {0}".format(feature_group_name))
except Exception as e:
    error = e.response.get('Error').get('Code')
    if error == "ResourceNotFound":
        print("Creating Feature Group {}".format(feature_group_name))
        feature_group.create(
            s3_uri=feature_store_offline_s3_uri,
            record_identifier_name=record_identifier_feature_name,
            event_time_feature_name=event_time_feature_name,
            role_arn=iam_role,
            enable_online_store=enable_online_store
        )
    if error == 'ResourceInUse':
        print("Feature Group {0} already exists. Using {0}".format(feature_group_name))

Creating Feature Group track-features-20-21-27-19-43cfaf71


Invoke the Feature Store API to create the feature group and wait until it is ready

In [18]:
import time

def wait_for_feature_group_creation_complete(feature_group):
    """Helper function to wait for the completions of creating a feature group"""
    status = feature_group.describe().get("FeatureGroupStatus")
    while status == "Creating":
        print("Waiting for Feature Group Creation")
        time.sleep(5)
        status = feature_group.describe().get("FeatureGroupStatus")
    if status != "Created":
        raise SystemExit(f"Failed to create feature group {feature_group.name}: {status}")
    print(f"FeatureGroup {feature_group.name} successfully created.")

wait_for_feature_group_creation_complete(feature_group=feature_group)

Waiting for Feature Group Creation
Waiting for Feature Group Creation
Waiting for Feature Group Creation
Waiting for Feature Group Creation
FeatureGroup track-features-20-21-27-19-43cfaf71 successfully created.


Now that the feature group is created, You will use a processing job to process your 
        data at scale and ingest the transformed data into this feature group.

<a id='02a-input-output'></a>

## Inputs and Outputs
##### [back to top](#02a-nb)
----

The below settings configure the inputs and outputs for the flow export.

<div class="alert alert-info"> 💡 <strong> Configurable Settings </strong>

In <b>Input - Source</b> you can configure the data sources that will be used as input by Data Wrangler

1. For S3 sources, configure the source attribute that points to the input S3 prefixes
2. For all other sources, configure attributes like query_string, database in the source's 
<b>DatasetDefinition</b> object.

If you modify the inputs the provided data must have the same schema and format as the data used in the Flow. 
You should also re-execute the cells in this section if you have modified the settings in any data sources.
</div>

In [19]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.dataset_definition.inputs import AthenaDatasetDefinition, DatasetDefinition, RedshiftDatasetDefinition

data_sources = []

## Input - S3 Source: tracks.csv

In [20]:
data_sources.append(ProcessingInput(
    source=f"{tracks_data_source}", # You could override this to point to another dataset on S3
    destination="/opt/ml/processing/tracks.csv",
    input_name="tracks.csv",
    s3_data_type="S3Prefix",
    s3_input_mode="File",
    s3_data_distribution_type="FullyReplicated"
))

## Input - S3 Source: ratings.csv

In [21]:
data_sources.append(ProcessingInput(
    source=f"{ratings_data_source}", # You could override this to point to another dataset on S3
    destination="/opt/ml/processing/ratings.csv",
    input_name="ratings.csv",
    s3_data_type="S3Prefix",
    s3_input_mode="File",
    s3_data_distribution_type="FullyReplicated"
))

### Output: Feature Store 

Below are the inputs required by the SageMaker Python SDK to launch a processing job with feature store as an output. Notice the `output_name` variable below; this ID is found within the `.flow` file at the node point you want to capture transformations up to. The `.flow` file contains instructions for SageMaker Data Wrangler to know where to look for data and how to transform it. Each data transformation action is associated with a node and therefore a node ID. Using the associated node ID + output name tells SageMaker up to what point in the transformation process you want to export to a feature store.

In [22]:
from sagemaker.processing import FeatureStoreOutput

# Output name is auto-generated from the select node's ID + output name from the .flow file
output_name = "d0d4f05a-3031-4438-867b-c5fd033d6c15.default"

processing_job_output = ProcessingOutput(
    output_name=output_name,
    app_managed=True,
    feature_store_output=FeatureStoreOutput(feature_group_name=feature_group_name),
)

<a id='02a-upload-flow'></a>

## Upload Flow to S3
##### [back to top](#02a-nb)
----
To use the Data Wrangler as an input to the processing job,  first upload your flow file to Amazon S3.

In [23]:
import os
import json
import boto3

# name of the flow file which should exist in the current notebook working directory
flow_file_name = "01_music_dataprep.flow"

# Load .flow file from current notebook working directory 
!echo "Loading flow file from current notebook working directory: $PWD"

with open(flow_file_name) as f:
    flow = json.load(f)

# Upload flow to S3
s3_client = boto3.client("s3")
s3_client.upload_file(flow_file_name, bucket, f"{prefix}/data_wrangler_flows/{flow_export_name}.flow")

flow_s3_uri = f"s3://{bucket}/{prefix}/data_wrangler_flows/{flow_export_name}.flow"


ps.add({'flow_s3_uri': flow_s3_uri}, namespace='music-rec')


print(f"Data Wrangler flow {flow_file_name} uploaded to {flow_s3_uri}")

Loading flow file from current notebook working directory: /root/blogs/music-rec
Updating Params : 

{'flow_s3_uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow'}
Data Wrangler flow 01_music_dataprep.flow uploaded to s3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow


The Data Wrangler Flow is also provided to the Processing Job as an input source which we configure below.

In [24]:
## Input - Flow: 01_music_dataprep.flow
flow_input = ProcessingInput(
    source=flow_s3_uri,
    destination="/opt/ml/processing/flow",
    input_name="flow",
    s3_data_type="S3Prefix",
    s3_input_mode="File",
    s3_data_distribution_type="FullyReplicated"
)

<a id='02a-run-job'></a>

# Run Processing Job 
##### [back to top](#02a-nb)
----
## Job Configurations

<div class="alert alert-info"> 💡 <strong> Configurable Settings </strong>

You can configure the following settings for Processing Jobs. If you change any configurations you will 
need to re-execute this and all cells below it by selecting the Run menu above and click 
<b>Run Selected Cells and All Below</b>

1. IAM role for executing the processing job. 
2. A unique name of the processing job. Give a unique name every time you re-execute processing jobs
3. Data Wrangler Container URL.
4. Instance count, instance type and storage volume size in GB.
5. Content type for each output. Data Wrangler supports CSV as default and Parquet.
6. Network Isolation settings
</div>

In [25]:
# Unique processing job name. Give a unique name every time you re-execute processing jobs
#processing_job_name = "dw-flow-proc-music-rec-{}-{}".format(flow_export_id, str(uuid.uuid4())[:8])

processing_job_name = "dw-flow-proc-music-rec-{}-{}".format('tracks',flow_export_id)
print (f"{processing_job_name}")

# Data Wrangler Container URL.
container_uri = f"{dw_ecrlist['region'][region]}.dkr.ecr.{region}.amazonaws.com/sagemaker-data-wrangler-container:1.x"

# Processing Job Instance count and instance type.
instance_count = 2
instance_type = "ml.m5.4xlarge"

# Size in GB of the EBS volume to use for storing data during processing
volume_size_in_gb = 30

# Content type for each output. Data Wrangler supports CSV as default and Parquet.
output_content_type = "CSV"

# Network Isolation mode; default is off
enable_network_isolation = False

# Output configuration used as processing job container arguments 
output_config = {
    output_name: {
        "content_type": output_content_type
    }
}

dw-flow-proc-music-rec-tracks-20-21-27-19-43cfaf71


## Create Processing Job

To launch a Processing Job, you will use the SageMaker Python SDK to create a Processor function.

In [26]:
#processing_job_name = "data-wrangler-flow-processing-14-04-21-07-ad03dbf6-e0cc740a"
processing_job_name

'dw-flow-proc-music-rec-tracks-20-21-27-19-43cfaf71'

In [27]:
from sagemaker.processing import Processor
from sagemaker.network import NetworkConfig

processor = Processor(
    role=iam_role,
    image_uri=container_uri,
    instance_count=instance_count,
    instance_type=instance_type,
    volume_size_in_gb=volume_size_in_gb,
    network_config=NetworkConfig(enable_network_isolation=enable_network_isolation),
    sagemaker_session=sess
)

# Run Processing Job if job not already previously ran


try:
    sagemaker_client.describe_processing_job(ProcessingJobName=processing_job_name)
    print("Processing Job {0} already exists. Using {0}".format(processing_job_name))
except Exception as e:
    error = e.response.get('Error').get('Code')
    if error == "ValidationException":
        print("Creating Processing Job: {}".format(processing_job_name))
        processor.run(
            inputs=[flow_input] + data_sources, 
            outputs=[processing_job_output],
            arguments=[f"--output-config '{json.dumps(output_config)}'"],
            wait=False,
            logs=False,
            job_name=processing_job_name
        )
    else:
        raise(e)
        

Creating Processing Job: dw-flow-proc-music-rec-tracks-20-21-27-19-43cfaf71

Job Name:  dw-flow-proc-music-rec-tracks-20-21-27-19-43cfaf71
Inputs:  [{'InputName': 'flow', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow', 'LocalPath': '/opt/ml/processing/flow', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'tracks.csv', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv', 'LocalPath': '/opt/ml/processing/tracks.csv', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'ratings.csv', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv', 'LocalPath': '/opt/ml/processing/ratings.csv', 'S3

## Job Status & S3 Output Location

Below you wait for processing job to finish. If it finishes successfully, your feature group should be populated 
with transformed feature values. In addition the raw parameters used by the Processing Job will be printed.

In [28]:
%%time
job_result = sess.wait_for_processing_job(processing_job_name)
job_result

..................................................................................!CPU times: user 337 ms, sys: 32.6 ms, total: 370 ms
Wall time: 6min 51s


{'ProcessingInputs': [{'InputName': 'flow',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow',
    'LocalPath': '/opt/ml/processing/flow',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}},
  {'InputName': 'tracks.csv',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv',
    'LocalPath': '/opt/ml/processing/tracks.csv',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}},
  {'InputName': 'ratings.csv',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
    'LocalPath': '/opt/ml/processing/ratings.csv',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3

In [29]:
ps.store()

Storing : 

{'music-rec': {'bucket': 'sagemaker-us-west-2-738335684114',
               'dw_ecrlist': {'region': {'us-east-2': '415577184552',
                                         'us-west-1': '926135532090',
                                         'us-west-2': '174368400705'}},
               'fg_name_tracks': 'track-features-20-21-27-19-43cfaf71',
               'flow_export_id': '20-21-27-19-43cfaf71',
               'flow_s3_uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow',
               'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
               'prefix': 'music-recommendation',
               'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
               'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv'}}


In [30]:
parameters

{'bucket': 'sagemaker-us-west-2-738335684114',
 'prefix': 'music-recommendation',
 'tracks_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/tracks.csv',
 'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
 'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
 'flow_export_id': '20-21-27-19-43cfaf71',
 'dw_ecrlist': {'region': {'us-west-2': '174368400705',
   'us-east-2': '415577184552',
   'us-west-1': '926135532090'}},
 'fg_name_tracks': 'track-features-20-21-27-19-43cfaf71',
 'flow_s3_uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow'}

In [31]:
for item in parameters:
    print (item)

bucket
prefix
tracks_data_source
ratings_data_source
model_path
flow_export_id
dw_ecrlist
fg_name_tracks
flow_s3_uri


In [49]:
with open('code.py', 'w') as filehandle:
    for var in parameters:
        filehandle.write(f"{var} = parameter[\'{var}\']\n") #'%s\n' % listitem)

In [43]:
!pip install ipynbname IPython --quiet

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


In [44]:
def get_notebook_name():
    """Execute JS code to save Jupyter notebook name to variable `notebook_name`"""
    from IPython.core.display import Javascript, display_javascript
    js = Javascript("""IPython.notebook.kernel.execute('notebook_name = "' + IPython.notebook.notebook_name + '"');""")
    return display_javascript(js)

In [48]:
import IPython
import json

import ipynbname
%matplotlib inline

nb_fname = get_notebook_name()
nb_fname


You can view newly created feature group in Studio, refer to [Use Amazon SageMaker Feature Store with Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-use-with-studio.html)
for detailed guide. [Learn more about SageMaker Feature Store](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-featurestore)