Skip to content

Latest commit

 

History

History
70 lines (48 loc) · 3.77 KB

File metadata and controls

70 lines (48 loc) · 3.77 KB

Music Recommender

Contents

  1. Background
  2. Prereqs
  3. Data
  4. Approach
  5. Clean Up

Background

use case video

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy machine learning models quickly by bringing together a broad set of purpose-built capabilities. In this demo, learn about how SageMaker can accelerate machine learning development by way of an example where we build the perfect musical playlist tailored to a user's tastes.

Prereqs

You will need an AWS account to use this solution. Sign up for an account before you proceed.

You will also need to have permission to use Amazon SageMaker Studio. All AWS permissions can be managed through AWS IAM. Admin users will have the required permissions, but please contact your account's AWS administrator if your user account doesn't have the required permissions.

Data

Example track (track.csv) and user ratings (ratings.csv) data is provided on a publicly available S3 bucket found here: s3://sagemaker-example-files-prod-{region}/datasets/tabular/synthetic-music We'll be running a notebook to download the data in the demo so no need to manually download it from here just yet.

tracks.csv

  • trackId: unique identifier for each song/track
  • length: song length in seconds (numerical)
  • energy: (numerical)
  • acousticness: (numerical)
  • valence: (numerical)
  • speechiness: (numerical)
  • instrumentalness: (numerical)
  • liveness: (numerical)
  • tempo: (numerical)
  • genre: (categorical)

ratings.csv

  • ratingEventId: unique identifier for each rating
  • ts: timestamp of rating event (datetime in seconds since 1970)
  • userId: unique id for each user
  • trackId: unique id for each song/track
  • sessionId: unique id for the user's session
  • Rating: user's rating of song on scale from 1 to 5

For this tutorial, we'll be using our own generated track and user ratings data, but publicly available datasets/apis such as the Million Song Dataset and open-source song ratings APIs are available for personal research purposes.

Approach

In the following notebooks we'll take 2 different approaches with the same modeling solution to create our music recommender.

  1. Run the following notebooks in order to walkthrough each data prep and modeling step
    • 01_music_dataprep.flow: Flow file defining our data input and transformation steps; this file is created in the Sagemaker Data Wrangler GUI
    • 02_export_feature_groups.ipynb: export our tracks data, 5-star rated tracks data, and user ratings data created in Data Wrangler to a feature store
    • 03_train_deploy_debugger_explain_monitor_registry.ipynb: train and deploy the model using xgboost to predict each song rating for each user. We also go over feature importances using SHAP values and setup Sagemaker Model Monitor.
  2. Setup a Sagemaker Pipeline to do all the aformentioned steps in a single notebook so that it can be ran automatically over time
    • end_to_end_pipeline.ipynb: setup each modeling step using sagemaker.workflow Pipeline object

Solution Architecture

architecture diagram

Clean Up

In order to prevent ongoing charges to your AWS account, clean up any resources we spun up during this tutorial at the end of notebooks Train, Deploy, and Monitor the Music Recommender Model using SageMaker SDK and Train, Deploy, and Monitor the Music Recommender Model using SageMaker Pipelines.