<a id='04-nb'></a>

# Music Recommender
## Part 4: Deploy Model & Inference using Online Feature Store
----
In this notebook, we'll deploy our chosen model as an endpoint so that we can make predictions/inferences against it. Then we'll make music recommendations for a single user by inferencing against our model. We'll query our Feature Store to get some data to use for inferencing and show you how [SageMaker Clarify](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-explainability.html) can explain which features were most useful in making the recommended music predictions using SHAP values.

Amazon SageMaker Clarify provides tools to help explain how machine learning models make predictions. These tools can help ML modelers and developers and other internal stakeholders understand model characteristics as a whole prior to deployment and to debug predictions provided by the model after it's deployed. Transparency about how ML models arrive at their predictions is also critical to consumers and regulators who need to trust the model predictions if they are going to accept the decisions based on them.

----
### Contents
- [Overview](00_overview_arch_data.ipynb)
- [Part 1: Data Prep using Data Wrangler](01_music_dataprep.flow)
- [Part 2a: Feature Store Creation - Tracks](02a_export_fg_tracks.ipynb)
- [Part 2b: Feature Store Creation - User Preferences](02b_export_fg_5star_features.ipynb)
- [Part 2c: Feature Store Creation - Ratings](02c_fg_create_ratings.ipynb)
- [Part 3: Train Model with Debugger Hooks. Set Artifacts and Register Model.](03_train_model_lineage_registry_debugger.ipynb)
- [Part 4: Deploy Model & Inference using Online Feature Store](04_deploy_inference_explainability.ipynb)
    - [Deploy model](#04-deploy)
    - [Create predictor](#04-predictor)
    - [Infer new songs](#04-infer)
    - [Explain model predictions](#04-explain)
- [Part 5: Model Monitor](05_model_monitor.ipynb)
- [Part 6: SageMaker Pipelines](06_pipeline.ipynb)

In [6]:
try:
    !pip install -qU awswrangler
except ModuleNotFoundError:
    !pip install --no-input awswrangler

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


In [7]:
# update pandas to avoid data type issues in older 1.0 version
!pip install -qU pandas==1.2.0
import pandas as pd
print(pd.__version__)

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m
1.2.0


In [8]:
import time
import boto3
import argparse
import pathlib

import sagemaker
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.estimator import Estimator
import awswrangler as wr

import os
import json
import matplotlib.pyplot as plt
import numpy as np

In [12]:
import sys
import pprint
sys.path.insert(1, './code')
from parameter_store import ParameterStore
ps = ParameterStore()

parameter = ps.read('music-rec')
pprint.pprint(parameter)

dw_ecrlist = parameter['dw_ecrlist']
fg_name_ratings = parameter['fg_name_ratings']
fg_name_tracks = parameter['fg_name_tracks']
fg_name_user_preferences = parameter['fg_name_user_preferences']

flow_export_id = parameter['flow_export_id']
flow_s3_uri = parameter['flow_s3_uri']
model_path = parameter['model_path']
prefix = parameter['prefix']
ratings_data_source = parameter['ratings_data_source']
tracks_data_source = parameter['tracks_data_source']
model_name = parameter['model_name']
"""
endpoint_name = parameter['endpoint_name']
feature_names = parameter['feature_names']
fs_name_ratings = parameter['fs_name_ratings']
fs_name_tracks = parameter['fs_name_tracks']
fs_name_user_preferences = parameter['fs_name_user_preferences']
model_name = parameter['model_name']
model_packages = parameter['model_packages']

mpg_name = parameter['mpg_name']
num_training_samples = parameter['num_training_samples']
pipeline_name = parameter['pipeline_name']


s3_output_path = parameter['s3_output_path']

train_data_uri = parameter['train_data_uri']
training_job_name = parameter['training_job_name']
tuning_job_name = parameter['tuning_job_name']
val_data_uri = parameter['val_data_uri']
best_training_job_name = parameter['best_training_job_name']
deploy_instance_type = parameter['deploy_instance_type']
"""

Loading : 

{'music-rec': {'bucket': 'sagemaker-us-west-2-738335684114',
               'dw_ecrlist': {'region': {'us-east-2': '415577184552',
                                         'us-west-1': '926135532090',
                                         'us-west-2': '174368400705'}},
               'fg_name_ratings': 'ratings-feature-group-20-23-23-50',
               'fg_name_tracks': 'track-features-20-21-27-19-43cfaf71',
               'fg_name_user_preferences': 'user-5star-track-features-20-21-27-19-43cfaf71',
               'flow_export_id': '20-21-27-19-43cfaf71',
               'flow_s3_uri': 's3://sagemaker-us-west-2-738335684114/music-recommendation/data_wrangler_flows/flow-20-21-27-19-43cfaf71.flow',
               'model_path': 's://sagemaker-us-west-2-738335684114/music-recommendation/model.tar.gz',
               'prefix': 'music-recommendation',
               'ratings_data_source': 's3://sagemaker-us-west-2-738335684114/music-recommendation/ratings.csv',
               

KeyError: 'model_name'

In [10]:
sess = sagemaker.Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name
boto3.setup_default_session(region_name=region)

s3_client = boto3.client('s3')
account_id = boto3.client('sts').get_caller_identity()["Account"]

boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)

sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client
)

sagemaker_role = sagemaker.get_execution_role()

<a id='04-deploy'></a>

# Deploy Model
##### [back to top](#04-nb)
----

In [11]:
import datetime
endpoint_name = f'{model_name}-{datetime.datetime.utcnow():%Y-%m-%d-%H%M}'
print(endpoint_name)

ps.add({'endpoint_name':endpoint_name},namespace='music-rec')
ps.store()

NameError: name 'model_name' is not defined

In [8]:
# if you want to use a pretrained model, set use_pretrained = True
## else use_pretrained = False to use the model you trained in the previous notebook
use_pretrained = False

if use_pretrained:
    # or use a pretrained model if you skipped model training in the last notebook
    xgb_estimator = sagemaker.model.Model(
        image_uri=sagemaker.image_uris.retrieve("xgboost", region, "0.90-2"),
        model_data=model_path,
        role=sagemaker_role
    )
else:
    # reinstantiate the estimator we trained in the previous notebook
    xgb_estimator = Estimator.attach(training_job_name)



2021-06-14 21:31:17 Starting - Preparing the instances for training
2021-06-14 21:31:17 Downloading - Downloading input data
2021-06-14 21:31:17 Training - Training image download completed. Training in progress.
2021-06-14 21:31:17 Uploading - Uploading generated training model
2021-06-14 21:31:17 Completed - Training job completed


In [15]:
endpoint_list = sagemaker_client.list_endpoints(
    SortBy='CreationTime',
    SortOrder='Descending',
    NameContains=model_name,
    StatusEquals='InService'
)
endpoint_list

[{'EndpointName': 'music-recommendation-model-2021-05-06-2106',
  'EndpointArn': 'arn:aws:sagemaker:us-east-2:645431112437:endpoint/music-recommendation-model-2021-05-06-2106',
  'CreationTime': datetime.datetime(2021, 5, 6, 21, 6, 10, 820000, tzinfo=tzlocal()),
  'LastModifiedTime': datetime.datetime(2021, 6, 9, 22, 16, 53, 489000, tzinfo=tzlocal()),
  'EndpointStatus': 'InService'}]

In [18]:
if len(endpoint_list['Endpoints']) > 0:
    print(f"Using existing endpoint: {endpoint_list['Endpoints'][0]['EndpointName']}")
    
else:
    xgb_estimator.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge',
                         endpoint_name=endpoint_name,
                         )

Using existing endpoint: music-recommendation-model-2021-05-06-2106


In [19]:
model_package = sagemaker_client.list_model_packages(ModelPackageGroupName=mpg_name)['ModelPackageSummaryList'][0]
model_package_update = {
    'ModelPackageArn': model_package['ModelPackageArn'],
    'ModelApprovalStatus': 'Approved'
}

update_response = sagemaker_client.update_model_package(**model_package_update)

### Create endpoint config and endpoint
Deploying the endpoint may take ~8min

In [None]:
endpoint_instance_count = 1
endpoint_instance_type = "ml.m4.xlarge"

endpoint_config_name=f'{model_name}-endpoint-config'
existing_configs = sagemaker_client.list_endpoint_configs(NameContains=endpoint_config_name, MaxResults = 30)['EndpointConfigs']

if not existing_configs:
    create_ep_config_response = sagemaker_client.create_endpoint_config(
        EndpointConfigName=endpoint_config_name,
        ProductionVariants=[{
            'InstanceType': endpoint_instance_type,
            'InitialVariantWeight': 1,
            'InitialInstanceCount': endpoint_instance_count,
            'ModelName': model_name,
            'VariantName': 'AllTraffic'
        }]
    )
    print('Creating endpoint config')

    ps.add({'endpoint_config_name':endpoint_config_name},namespace='music-rec')
    ps.store()
else:
    print('Using existing endpoint config')

In [None]:
existing_endpoints = sagemaker_client.list_endpoints(NameContains=endpoint_name, MaxResults = 30)['Endpoints']
if not existing_endpoints:
    create_endpoint_response = sagemaker_client.create_endpoint(
        EndpointName=endpoint_name,
        EndpointConfigName=endpoint_config_name)
    
    ps.add({'endpoint_name':endpoint_name},namespace='music-rec')
    ps.store()

endpoint_info = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
endpoint_status = endpoint_info['EndpointStatus']

while endpoint_status == 'Creating':
    endpoint_info = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
    endpoint_status = endpoint_info['EndpointStatus']
    print('Endpoint status:', endpoint_status)
    if endpoint_status == 'Creating':
        time.sleep(60)

In [None]:
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

feature_store_session = sagemaker.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime
)

In [None]:
explainability_output_path = f's3://{bucket}/{prefix}/clarify-output/explainability'

<a id='04-predictor'> </a>

## Create a predictor
##### [back to top](#04-nb)
----

In [None]:
predictor = sagemaker.predictor.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker_session)

### Pull user data from feature group

In [None]:
sample_user_id = 11005

In [None]:
# pull the sample user's 5 star preferences record from the feature store
fg_response = featurestore_runtime.get_record(
    FeatureGroupName=fg_name_user_preferences, 
    RecordIdentifierValueAsString=str(sample_user_id)
)

record = fg_response['Record']
df_user = pd.DataFrame(record).set_index('FeatureName')

### Pull sample of 1000 tracks from feature group

In [None]:
# pull a sample of the tracks data (multiple records) from the feature store using athena query
fg_name_tracks_obj = FeatureGroup(name=fg_name_tracks, sagemaker_session=feature_store_session)
tracks_query = fg_name_tracks_obj.athena_query()
tracks_table = tracks_query.table_name

# use escaped quotes aound table name since it contains '-' symbols
query_string = ("SELECT * FROM \"{}\" LIMIT 1000".format(tracks_table))
print("Running " + query_string)

# run Athena query. The output is loaded to a Pandas dataframe.
tracks_query.run(query_string=query_string, output_location=f"s3://{bucket}/{prefix}/query_results/")
tracks_query.wait()
df_tracks = tracks_query.as_dataframe()

In [None]:
data = df_tracks.merge(pd.DataFrame(df_user['ValueAsString']).T, how='cross')
data.columns = [c.lower() for c in data.columns]
inference_df = data[feature_names]

### Format the datapoint
The datapoint must match the exact input format as the model was trained--with all features in the correct order. In this example, the `col_order` variable was saved when you created the train and test datasets earlier in the guide.

In [None]:
data_inputs = [','.join([str(i) for i in row]) for row in inference_df.values]

<a id='04-infer'> </a>

## Infer (predict) new songs using model
##### [back to top](#04-nb)
----

In [None]:
predictions = []
for data_input in data_inputs:
    results = predictor.predict(data_input, initial_args = {"ContentType": "text/csv"})
    prediction = json.loads(results)
    predictions.append(prediction)
print(f'Predicted rating for user {int(sample_user_id)}:', prediction)

In [None]:
# Write to csv in S3 without headers and index column.
inference_df['rating'] = predictions
inference_df = inference_df[['rating']+feature_names]
inference_df.to_csv('data/prediction_data.csv', header=False, index=False)

s3_client.upload_file('data/prediction_data.csv', bucket, f'{prefix}/data/pred/prediction_data.csv')

pred_data_uri = f's3://{bucket}/{prefix}/data/pred/prediction_data.csv'

In [None]:
df_train = pd.read_csv(train_data_uri)

label = 'rating'

<a id='04-explain'> </a>

## Explain model predictions
##### [back to top](#04-nb)
----

In [None]:
clarify_processor = sagemaker.clarify.SageMakerClarifyProcessor(
    role=sagemaker_role,
    instance_count=1,
    instance_type='ml.c4.xlarge',
    sagemaker_session=sagemaker_session)

model_config = sagemaker.clarify.ModelConfig(
    model_name=model_name,
    instance_type='ml.m4.xlarge',
    instance_count=1,
    accept_type='text/csv')

shap_config = sagemaker.clarify.SHAPConfig(
    baseline=[df_train.median().values[1:].tolist()],  # ignore the first column since that is that target
    num_samples=100,
    agg_method='mean_abs')

explainability_data_config = sagemaker.clarify.DataConfig(
    s3_data_input_path=pred_data_uri,
    s3_output_path=explainability_output_path,
    label=label,
    headers=[label]+feature_names,
    dataset_type='text/csv')


In [None]:
clarify_processor.run_explainability(
    data_config=explainability_data_config,
    model_config=model_config,
    explainability_config=shap_config)

clarify_expl_job_name = clarify_processor.latest_job.name

In [None]:
inference_df['trackid'] = data['trackid']

In [None]:
playlist_length = 10  # number of songs to recommend in playlist
playlist = inference_df.sort_values(by='rating', ascending=False).head(playlist_length)
print('Curated Playlist:\n', playlist['trackid'])

In [None]:
local_explanations_out = pd.read_csv(explainability_output_path+'/explanations_shap/out.csv')
local_explanations_out.columns = feature_names

print("Model prediction:", playlist.iloc[0, 0])
plt.figure(figsize=(12,6))
local_explanations_out.iloc[0].sort_values().plot.barh(title='Local explanation for prediction')