# Recommender Playlists

Exploration implementing recommender system techniques using Spotify playlist data:

- Popularity Filter: Recommend popular songs regardless of user's preferences
- Collaborative Filter: Predict what songs a user might be interested in based on a collection of preference information from multiple users
- Content-Based Filter: Use song attributes (e.g. genre) to recommend similar songs
- Hybrid Filter: A hybrid approach can be used to overcome some of the common problems in recommender systems such as the cold start problem and the sparsity problem
- Popularity + Hybrid Filter: An extension of the hybrid approach which applies weighting/mixes in songs based on popularity

For the purposes of this dataset we will focus on the playlist_tracks_df dataset and treat different playlists as different users.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import yaml

In [3]:
from sklearn.model_selection import train_test_split

In [4]:
# To create a playlist and add tracks
import spotipy 
from spotipy.oauth2 import SpotifyOAuth

## Import Data

- Artist and track data was pulled using the Spotify API via the spotipy package
- Data was saved in pickle format using music_data.py and data_functions.py modules
- The data can now be quickly read by multiple workflows

In [5]:
top_artist_df = pd.read_pickle("spotify/top_artists.pkl")
followed_artists_df = pd.read_pickle("spotify/followed_artists.pkl")
top_tracks_df = pd.read_pickle("spotify/top_tracks.pkl")
saved_tracks_df = pd.read_pickle("spotify/saved_tracks.pkl")
playlist_tracks_df = pd.read_pickle("spotify/playlist_tracks.pkl")
recommendation_tracks_df = pd.read_pickle("spotify/recommendation_tracks.pkl")

In [7]:
playlist_tracks_df.head()

Unnamed: 0,id,name,popularity,type,is_local,explicit,duration_ms,disc_number,track_number,artist_id,...,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,analysis_url,time_signature
0,6fgbQt13JlpN59PytgTMsA,Snow (Hey Oh),61,audio_features,False,False,334667,1,1,0L8ExT028jH3ddEcZwqJJ5,...,0.0499,0.116,1.7e-05,0.119,0.599,104.655,spotify:track:6fgbQt13JlpN59PytgTMsA,https://api.spotify.com/v1/tracks/6fgbQt13JlpN...,https://api.spotify.com/v1/audio-analysis/6fgb...,4
1,3JOdLCIBzQYwHIvpN3isVf,Grand Theft Autumn / Where Is Your Boy,62,audio_features,False,False,191373,1,3,4UXqAaa6dQYAk18Lv7PEgX,...,0.0608,0.000706,0.0,0.275,0.72,135.45,spotify:track:3JOdLCIBzQYwHIvpN3isVf,https://api.spotify.com/v1/tracks/3JOdLCIBzQYw...,https://api.spotify.com/v1/audio-analysis/3JOd...,4
2,7pAT4dOUzjq8Ziap5ShIqC,Where'd You Go (feat. Holly Brook & Jonah Matr...,58,audio_features,False,True,231867,1,6,7dWYWUbO68rXJOcyA7SpJk,...,0.238,0.262,0.00197,0.113,0.25,179.999,spotify:track:7pAT4dOUzjq8Ziap5ShIqC,https://api.spotify.com/v1/tracks/7pAT4dOUzjq8...,https://api.spotify.com/v1/audio-analysis/7pAT...,4
3,1b7vg5T9YKR3NNqXfBYRF7,Check Yes Juliet,53,audio_features,False,False,220133,1,3,3ao3jf5d70Tf4fPh2bnXVl,...,0.0774,0.0024,0.0,0.163,0.314,166.866,spotify:track:1b7vg5T9YKR3NNqXfBYRF7,https://api.spotify.com/v1/tracks/1b7vg5T9YKR3...,https://api.spotify.com/v1/audio-analysis/1b7v...,4
4,12qZHAeOyTf93YAWvGDTat,All The Small Things,0,audio_features,False,False,168000,1,8,6FBDaR13swtiWwGhX1WQsP,...,0.0505,0.00844,0.0,0.529,0.712,148.119,spotify:track:12qZHAeOyTf93YAWvGDTat,https://api.spotify.com/v1/tracks/12qZHAeOyTf9...,https://api.spotify.com/v1/audio-analysis/12qZ...,4


In [23]:
with open("spotify/playlists.yml", 'r') as stream:
    playlist_ids = yaml.safe_load(stream)

## Evaluation Metric

Here we use the Top-N accuracy metric, which evaluates the accuracy of the top recommendations provided to a user by comparing to the items the user has actually interacted in test set. This evaluation method works as follows:

- For each user
    - For each item the user has interacted in test set
        - Sample n other items the user has never interacted with (assume these are not relevant, but the user may just have not been aware of them)
        - Ask the recommender model to produce a ranked list of recommended items, from a set composed one interacted item and the 100 non-interacted ("non-relevant") items
        - Compute the Top-N accuracy metrics for this user and interacted item from the recommendations ranked list (is the item along the Top-N ranked items)
- Aggregate the global Top-N accuracy metrics

In [154]:
def get_interacted_tracks(tracks, playlist_id):
    interacted_track_ids = set(tracks[tracks['playlist_id'] == playlist_id]['id'])
    tracks_interacted = tracks[tracks['id'].isin(interacted_track_ids)]
    tracks_not_interacted = tracks[~tracks['id'].isin(interacted_track_ids)]

    tracks_interacted = tracks_interacted.drop_duplicates(subset='id', keep="first").reset_index()
    tracks_not_interacted = tracks_not_interacted.drop_duplicates(subset='id', keep="first").reset_index()

    return tracks_interacted, tracks_not_interacted

In [175]:
class ModelEvaluator:
    
    def __init__(self, tracks):
        self.tracks = tracks
        
    def evaluate_model_for_playlist(self, model, playlist_id, n=100, seed=42):
        # Getting the items in test set
        tracks_interacted, tracks_not_interacted = get_interacted_tracks(self.tracks, playlist_id)
        train, test = train_test_split(tracks_interacted, test_size=0.2, random_state=seed)
        # Getting a ranked recommendation list from a model for a given user
        ranked_recommendations_df = model.recommend_items(playlist_id)

        hits_at_5_count, hits_at_10_count = 0, 0
        for index, row in test.iterrows():
            non_interacted_sample = tracks_not_interacted.sample(n, random_state=seed)
            evaluation_ids = [row['id']] + non_interacted_sample['id'].tolist()
            evaluation_recommendations_df = ranked_recommendations_df[ranked_recommendations_df['id'].isin(evaluation_ids)]
            # Verifying if the current interacted item is among the Top-N recommended items
            hits_at_5_count += 1 if row['id'] in evaluation_recommendations_df['id'][:5].tolist() else 0
            hits_at_10_count += 1 if row['id'] in evaluation_recommendations_df['id'][:10].tolist() else 0

        playlist_metrics = {'n': n,
                            'evaluation_count': len(test),
                            'hits@5': hits_at_5_count,
                            'hits@10': hits_at_10_count, 
                            'recall@5': hits_at_5_count / len(test),
                            'recall@10': hits_at_10_count / len(test),
                           }
        
        return playlist_metrics

    def evaluate_model(self, model, n=100, seed=42):
        playlists = []
        for playlist_id in self.tracks['playlist_id'].unique():
            playlist_metrics = self.evaluate_model_for_playlist(model, playlist_id, n=n, seed=seed)  
            playlist_metrics['playlist_id'] = playlist_id
            playlists.append(playlist_metrics)

        detailed_playlists_metrics = pd.DataFrame(playlists).sort_values('evaluation_count', ascending=False)
        
        global_recall_at_5 = detailed_playlists_metrics['hits@5'].sum() / detailed_playlists_metrics['evaluation_count'].sum()
        global_recall_at_10 = detailed_playlists_metrics['hits@10'].sum() / detailed_playlists_metrics['evaluation_count'].sum()
        
        global_metrics = {'model_name': model.model_name,
                          'recall@5': global_recall_at_5,
                          'recall@10': global_recall_at_10,
                         }  
                            
        return global_metrics, detailed_playlists_metrics
    
model_evaluator = ModelEvaluator(playlist_tracks_df)

## Popularity Filter

Spotify's 'audio features' API call automatically comes with a 'popularity' feature. Although it is 0 in ~10% of cases (higher than you'd expect - these are probably default null values), this is perfect for creating a Popularity filter.

As song popularity generally accounts for the "wisdom of the crowds", it usually provides good recommendations overall. However this isn't tailored to the user in particular, as a good recommender system should be.

In [127]:
class PopularityRecommender:
    
    def __init__(self, tracks):
        self.tracks = tracks
        self.model_name = 'Popularity'
    
    def recommend_items(self, playlist_id, ignore_ids=[]):
        recommendations_df = self.tracks[~self.tracks['id'].isin(ignore_ids)] \
                                .drop_duplicates(subset='id', keep="first").reset_index() \
                                .sort_values('popularity', ascending=False)

        return recommendations_df
    
popularity_model = PopularityRecommender(playlist_tracks_df)

In [156]:
# You can see this is essentially sorted by popularity
interacted_tracks, non_interacted_tracks = get_interacted_tracks(playlist_tracks_df, playlist_ids['Chill'])

popularity_model_recommendations = popularity_model.recommend_items(playlist_ids['Chill'], interacted_tracks['id'].tolist())
popularity_model_recommendations.head()

Unnamed: 0,index,id,name,popularity,type,is_local,explicit,duration_ms,disc_number,track_number,...,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,analysis_url,time_signature
203,354,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,96,audio_features,False,False,200040,1,9,...,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,4
301,539,2Fxmhks0bxGSBdJ92vM42m,bad guy,88,audio_features,False,False,194088,1,2,...,0.375,0.328,0.13,0.1,0.562,135.128,spotify:track:2Fxmhks0bxGSBdJ92vM42m,https://api.spotify.com/v1/tracks/2Fxmhks0bxGS...,https://api.spotify.com/v1/audio-analysis/2Fxm...,4
406,685,3AJwUDP919kvQ9QcozQPxg,Yellow,86,audio_features,False,False,266773,1,5,...,0.0281,0.00239,0.000121,0.234,0.285,173.372,spotify:track:3AJwUDP919kvQ9QcozQPxg,https://api.spotify.com/v1/tracks/3AJwUDP919kv...,https://api.spotify.com/v1/audio-analysis/3AJw...,4
292,526,2dpaYNEQHiRxtZbfNsse99,Happier,85,audio_features,False,False,214290,1,1,...,0.0452,0.191,0.0,0.167,0.671,100.015,spotify:track:2dpaYNEQHiRxtZbfNsse99,https://api.spotify.com/v1/tracks/2dpaYNEQHiRx...,https://api.spotify.com/v1/audio-analysis/2dpa...,4
2304,3278,7w87IxuO7BDcJ3YUqCyMTT,Pumped Up Kicks,85,audio_features,False,False,239600,1,2,...,0.0292,0.145,0.115,0.0956,0.965,127.975,spotify:track:7w87IxuO7BDcJ3YUqCyMTT,https://api.spotify.com/v1/tracks/7w87IxuO7BDc...,https://api.spotify.com/v1/audio-analysis/7w87...,4


In [179]:
popularity_model_metrics, popularity_model_details = model_evaluator.evaluate_model(popularity_model)

print(popularity_model_metrics)
popularity_model_details[[x for x in popularity_model_details.columns if x != 'playlist_id']] \
    .sort_values('recall@5', ascending=False) \
    .head(10)

{'model_name': 'Popularity', 'recall@5': 0.09318497913769123, 'recall@10': 0.17385257301808066}


Unnamed: 0,n,evaluation_count,hits@5,hits@10,recall@5,recall@10
28,100,2,2,2,1.0,1.0
2,100,14,7,7,0.5,0.5
12,100,17,7,9,0.411765,0.529412
9,100,17,7,9,0.411765,0.529412
14,100,20,8,11,0.4,0.55
41,100,21,8,11,0.380952,0.52381
0,100,19,6,6,0.315789,0.315789
11,100,36,9,11,0.25,0.305556
46,100,4,1,1,0.25,0.25
15,100,20,5,6,0.25,0.3


## Content Filter
