# Recommender Playlists

Exploration implementing recommender system techniques using Spotify playlist data:

- Popularity Recommender: Recommend popular songs regardless of user's preferences
- Content-based Recommender: Use song attributes (e.g. genre) to recommend similar songs
- Collaborative Recommender: Predict what songs a user might be interested in based on a collection of preference information from multiple users
- Hybrid Recommender: A hybrid approach can be used to overcome some of the common problems in recommender systems such as the cold start problem and the sparsity problem
- Popularity + Hybrid Recommender: An extension of the hybrid approach which applies weighting/mixes in songs based on popularity

For the purposes of this dataset we will focus on the playlist_tracks_df dataset and treat different playlists as different users.

In [53]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import yaml

In [54]:
from nltk.corpus import stopwords
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
from scipy.sparse.linalg import svds
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, normalize

In [55]:
# To create a playlist and add tracks
import spotipy 
from spotipy.oauth2 import SpotifyOAuth

## Import Data

- Artist and track data was pulled using the Spotify API via the spotipy package
- Data was saved in pickle format using music_data.py and data_functions.py modules
- The data can now be quickly read by multiple workflows

In [56]:
top_artist_df = pd.read_pickle("spotify/top_artists.pkl")
followed_artists_df = pd.read_pickle("spotify/followed_artists.pkl")
top_tracks_df = pd.read_pickle("spotify/top_tracks.pkl")
saved_tracks_df = pd.read_pickle("spotify/saved_tracks.pkl")
playlist_tracks_df = pd.read_pickle("spotify/playlist_tracks.pkl")
recommendation_tracks_df = pd.read_pickle("spotify/recommendation_tracks.pkl")

In [57]:
playlist_tracks_df.head()

Unnamed: 0,id,name,popularity,type,is_local,explicit,duration_ms,disc_number,track_number,artist_id,...,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,analysis_url,time_signature
0,6fgbQt13JlpN59PytgTMsA,Snow (Hey Oh),61,audio_features,False,False,334667,1,1,0L8ExT028jH3ddEcZwqJJ5,...,0.0499,0.116,1.7e-05,0.119,0.599,104.655,spotify:track:6fgbQt13JlpN59PytgTMsA,https://api.spotify.com/v1/tracks/6fgbQt13JlpN...,https://api.spotify.com/v1/audio-analysis/6fgb...,4
1,3JOdLCIBzQYwHIvpN3isVf,Grand Theft Autumn / Where Is Your Boy,62,audio_features,False,False,191373,1,3,4UXqAaa6dQYAk18Lv7PEgX,...,0.0608,0.000706,0.0,0.275,0.72,135.45,spotify:track:3JOdLCIBzQYwHIvpN3isVf,https://api.spotify.com/v1/tracks/3JOdLCIBzQYw...,https://api.spotify.com/v1/audio-analysis/3JOd...,4
2,7pAT4dOUzjq8Ziap5ShIqC,Where'd You Go (feat. Holly Brook & Jonah Matr...,58,audio_features,False,True,231867,1,6,7dWYWUbO68rXJOcyA7SpJk,...,0.238,0.262,0.00197,0.113,0.25,179.999,spotify:track:7pAT4dOUzjq8Ziap5ShIqC,https://api.spotify.com/v1/tracks/7pAT4dOUzjq8...,https://api.spotify.com/v1/audio-analysis/7pAT...,4
3,1b7vg5T9YKR3NNqXfBYRF7,Check Yes Juliet,53,audio_features,False,False,220133,1,3,3ao3jf5d70Tf4fPh2bnXVl,...,0.0774,0.0024,0.0,0.163,0.314,166.866,spotify:track:1b7vg5T9YKR3NNqXfBYRF7,https://api.spotify.com/v1/tracks/1b7vg5T9YKR3...,https://api.spotify.com/v1/audio-analysis/1b7v...,4
4,12qZHAeOyTf93YAWvGDTat,All The Small Things,0,audio_features,False,False,168000,1,8,6FBDaR13swtiWwGhX1WQsP,...,0.0505,0.00844,0.0,0.529,0.712,148.119,spotify:track:12qZHAeOyTf93YAWvGDTat,https://api.spotify.com/v1/tracks/12qZHAeOyTf9...,https://api.spotify.com/v1/audio-analysis/12qZ...,4


In [58]:
with open("spotify/playlists.yml", 'r') as stream:
    playlist_ids = yaml.safe_load(stream)

## Evaluation Metric

Here we use the Top-N accuracy metric, which evaluates the accuracy of the top recommendations provided to a user by comparing to the items the user has actually interacted in test set. This evaluation method works as follows:

- For each user
    - For each item the user has interacted in test set
        - Sample n other items the user has never interacted with (assume these are not relevant, but the user may just have not been aware of them)
        - Ask the recommender model to produce a ranked list of recommended items, from a set composed one interacted item and the 100 non-interacted ("non-relevant") items
        - Compute the Top-N accuracy metrics for this user and interacted item from the recommendations ranked list (is the item along the Top-N ranked items)
- Aggregate the global Top-N accuracy metrics

In [135]:
class ModelEvaluator:
    
    def __init__(self, tracks):
        self.tracks = tracks
        
    def evaluate_model_for_playlist(self, model, playlist_id, n=100, seed=42):
        # Getting the items in test set
        tracks_interacted, tracks_not_interacted = get_interacted_tracks(self.tracks, playlist_id)
        train, test = train_test_split(tracks_interacted, test_size=0.2, random_state=seed)
        # Getting a ranked recommendation list from a model for a given user
        ranked_recommendations_df = model.recommend_tracks(playlist_id)

        hits_at_5_count, hits_at_10_count = 0, 0
        for index, row in test.iterrows():
            non_interacted_sample = tracks_not_interacted.sample(n, random_state=seed)
            evaluation_ids = [row['id']] + non_interacted_sample['id'].tolist()
            evaluation_recommendations_df = ranked_recommendations_df[ranked_recommendations_df['id'].isin(evaluation_ids)]
            # Verifying if the current interacted item is among the Top-N recommended items
            hits_at_5_count += 1 if row['id'] in evaluation_recommendations_df['id'][:5].tolist() else 0
            hits_at_10_count += 1 if row['id'] in evaluation_recommendations_df['id'][:10].tolist() else 0

        playlist_metrics = {'n': n,
                            'evaluation_count': len(test),
                            'hits@5': hits_at_5_count,
                            'hits@10': hits_at_10_count, 
                            'recall@5': hits_at_5_count / len(test),
                            'recall@10': hits_at_10_count / len(test),
                           }
        
        return playlist_metrics

    def evaluate_model(self, model, n=100, seed=42):
        playlists = []
        for playlist_id in self.tracks['playlist_id'].unique():
            playlist_metrics = self.evaluate_model_for_playlist(model, playlist_id, n=n, seed=seed)  
            playlist_metrics['playlist_id'] = playlist_id
            playlists.append(playlist_metrics)

        detailed_playlists_metrics = pd.DataFrame(playlists).sort_values('evaluation_count', ascending=False)
        
        global_recall_at_5 = detailed_playlists_metrics['hits@5'].sum() / detailed_playlists_metrics['evaluation_count'].sum()
        global_recall_at_10 = detailed_playlists_metrics['hits@10'].sum() / detailed_playlists_metrics['evaluation_count'].sum()
        
        global_metrics = {'model_name': model.model_name,
                          'recall@5': global_recall_at_5,
                          'recall@10': global_recall_at_10,
                         }  
                            
        return global_metrics, detailed_playlists_metrics
    
model_evaluator = ModelEvaluator(playlist_tracks_df)

### Interacted tracks

Now to evaluate a model for a playlist (and overall), we need to get both iteracted and non-interacted tracks for a playlist.

In [60]:
def get_interacted_tracks(tracks, playlist_id, drop_duplicates=True):
    interacted_track_ids = set(tracks[tracks['playlist_id'] == playlist_id]['id'])
    tracks_interacted = tracks[tracks['id'].isin(interacted_track_ids)]
    tracks_not_interacted = tracks[~tracks['id'].isin(interacted_track_ids)]

    if drop_duplicates is True:
        tracks_interacted = tracks_interacted.drop_duplicates(subset='id', keep="first").reset_index()
        tracks_not_interacted = tracks_not_interacted.drop_duplicates(subset='id', keep="first").reset_index()

    return tracks_interacted, tracks_not_interacted

In [61]:
interacted_tracks, non_interacted_tracks = get_interacted_tracks(playlist_tracks_df, playlist_ids['Chill'])

## Popularity Recommender

A popularity based recommender recommends songs in order of overall popularity, regardless of what the user has listened to. Spotify's 'audio features' API call automatically comes with a 'popularity' feature. Although it is 0 in ~10% of cases (higher than you'd expect - these are probably default null values), this is perfect for creating a Popularity recommender.

As song popularity generally accounts for the "wisdom of the crowds", it usually provides good recommendations overall. However this isn't tailored to the user in particular, as a good recommender system should be.

In [120]:
class PopularityRecommender:
    
    def __init__(self, tracks):
        self.tracks = tracks
        self.model_name = 'Popularity Recommender'
    
    def recommend_tracks(self, playlist_id, ignore_ids=[]):
        recommendations_df = self.tracks[~self.tracks['id'].isin(ignore_ids)] \
                                .drop_duplicates(subset='id', keep="first").reset_index() \
                                .sort_values('popularity', ascending=False)

        return recommendations_df
    
popularity_model = PopularityRecommender(playlist_tracks_df)

In [121]:
# You can see this is essentially sorted by popularity
popularity_model_recommendations = popularity_model.recommend_tracks(playlist_ids['Chill'], interacted_tracks['id'].tolist())
popularity_model_recommendations.head()

Unnamed: 0,index,id,name,popularity,type,is_local,explicit,duration_ms,disc_number,track_number,...,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,analysis_url,time_signature,genres_str
203,354,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,96,audio_features,False,False,200040,1,9,...,0.00146,9.5e-05,0.0897,0.334,171.005,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,4,canadian contemporary r&b canadian pop pop
301,539,2Fxmhks0bxGSBdJ92vM42m,bad guy,88,audio_features,False,False,194088,1,2,...,0.328,0.13,0.1,0.562,135.128,spotify:track:2Fxmhks0bxGSBdJ92vM42m,https://api.spotify.com/v1/tracks/2Fxmhks0bxGS...,https://api.spotify.com/v1/audio-analysis/2Fxm...,4,electropop pop
406,685,3AJwUDP919kvQ9QcozQPxg,Yellow,86,audio_features,False,False,266773,1,5,...,0.00239,0.000121,0.234,0.285,173.372,spotify:track:3AJwUDP919kvQ9QcozQPxg,https://api.spotify.com/v1/tracks/3AJwUDP919kv...,https://api.spotify.com/v1/audio-analysis/3AJw...,4,permanent wave pop
292,526,2dpaYNEQHiRxtZbfNsse99,Happier,85,audio_features,False,False,214290,1,1,...,0.191,0.0,0.167,0.671,100.015,spotify:track:2dpaYNEQHiRxtZbfNsse99,https://api.spotify.com/v1/tracks/2dpaYNEQHiRx...,https://api.spotify.com/v1/audio-analysis/2dpa...,4,brostep edm pop pop dance progressive electro ...
2304,3278,7w87IxuO7BDcJ3YUqCyMTT,Pumped Up Kicks,85,audio_features,False,False,239600,1,2,...,0.145,0.115,0.0956,0.965,127.975,spotify:track:7w87IxuO7BDcJ3YUqCyMTT,https://api.spotify.com/v1/tracks/7w87IxuO7BDc...,https://api.spotify.com/v1/audio-analysis/7w87...,4,indie pop indietronica modern rock pop rock


In [64]:
popularity_model_metrics, popularity_model_details = model_evaluator.evaluate_model(popularity_model)

print(popularity_model_metrics)
popularity_model_details[[x for x in popularity_model_details.columns if x != 'playlist_id']] \
    .sort_values('recall@5', ascending=False) \
    .head(10)

{'model_name': 'Popularity Recommender', 'recall@5': 0.09318497913769123, 'recall@10': 0.17385257301808066}


Unnamed: 0,n,evaluation_count,hits@5,hits@10,recall@5,recall@10
28,100,2,2,2,1.0,1.0
2,100,14,7,7,0.5,0.5
12,100,17,7,9,0.411765,0.529412
9,100,17,7,9,0.411765,0.529412
14,100,20,8,11,0.4,0.55
41,100,21,8,11,0.380952,0.52381
0,100,19,6,6,0.315789,0.315789
11,100,36,9,11,0.25,0.305556
46,100,4,1,1,0.25,0.25
15,100,20,5,6,0.25,0.3


As one would expected, solely recommending by popularity is a poor way to recommend tracks. As we will see later it is however a good method to mix in for variety and to avoid the cold-start problem.

## Content-based Recommender

A content-based recommender leverages attributes from items the user has interacted with to recommend similar items. As it depends only on the past this method avoids the cold-start problem.

For text items we can use a popular information retrieval method used in search engines named TF-IDF. This technique converts unstructured text into a vector structure, where each word is represented by a position in the vector, and the value measures how relevant a given word is for an article. We can then compute the cosine similarity between tracks the user has iteracted with and those they haven't.

### TF-IDF

First we need to apply the TF-IDF technique, and use it to build playlist profiles

In [65]:
def get_tfidf(tracks, ngram_range=(1,2), min_df=0.003, max_df=0.5, max_features=5000):
    # Transform list cols to string, we use bigrams later so no need to remove spaces
    tracks['genres_str'] = tracks['genres'].apply(lambda x: ' '.join(x))

    # Vector size 5000 model
    vectorizer = TfidfVectorizer(analyzer='word',
                                 ngram_range=ngram_range,
                                 min_df=min_df,
                                 max_df=max_df,
                                 max_features=max_features,
                                 stop_words=stopwords.words('english'))  # might need to download stopwords, follow prompt

    # Don't include album_genres column here as similar to genres column and we don't want additional genre weighting in this case 
    # vectorizer.fit_transform takes a string
    tfidf_matrix = vectorizer.fit_transform(tracks['name'] + ' ' +
                                            tracks['artist_name'] + ' ' +
                                            tracks['album_name'] + ' ' +
                                            tracks['playlist_name'] + ' ' +
                                            tracks['genres_str']
                                           )
    tfidf_feature_names = vectorizer.get_feature_names()

    return tfidf_matrix, tfidf_feature_names

In [66]:
tfidf_matrix, tfidf_feature_names = get_tfidf(playlist_tracks_df)
tfidf_matrix

<3505x948 sparse matrix of type '<class 'numpy.float64'>'
	with 47341 stored elements in Compressed Sparse Row format>

In [114]:
def get_track_profile(tracks, track_id, tfidf_matrix):
    idx = tracks['id'].tolist().index(track_id)
    track_profile = tfidf_matrix[idx:idx+1]
    return track_profile

def get_track_profiles(tracks, track_ids, tfidf_matrix):
    track_profiles_list = [get_track_profile(tracks, x, tfidf_matrix) for x in track_ids]
    track_profiles = vstack(track_profiles_list)
    return track_profiles

def build_playlists_profile(tracks, playlist_id, interactions_indexed_df, tfidf_matrix):
    # There isn't any weighting we want to do in this case, 
    # but a common approach is weighting by interaction strength (liking, commenting, etc.)
    interaction_tracks_df = interactions_indexed_df.loc[playlist_id]  # duplicate interacted tracks, filter to current playlist
    playlist_track_profiles = get_track_profiles(tracks, interaction_tracks_df['id'], tfidf_matrix)
    # Weighted average for event strengths
#     playlist_track_strengths = np.array(interaction_tracks_df['event_strength']).reshape(-1,1)
#     playlist_track_profiles_array = np.sum(playlist_track_profiles.multiply(playlist_track_strengths), axis=0) / np.sum(playlist_track_strengths)
    playlist_track_profiles_array = np.sum(playlist_track_profiles, axis=0)  # Flattens 49x948 matrix to 1x948 array
    playlist_track_profiles_norm = normalize(playlist_track_profiles_array)
    return playlist_track_profiles_norm

def build_playlists_profiles(tracks, tfidf_matrix): 
    playlist_profiles = {}
    for playlist_id in tracks['playlist_id'].unique():
        interacted_tracks, non_interacted_tracks = get_interacted_tracks(tracks, playlist_id, drop_duplicates=False)
        playlist_profiles[playlist_id] = build_playlists_profile(tracks, playlist_id, interacted_tracks.set_index('playlist_id'), tfidf_matrix)
    return playlist_profiles

In [115]:
playlist_profiles = build_playlists_profiles(playlist_tracks_df, tfidf_matrix)
len(playlist_profiles), len(playlist_tracks_df['playlist_id'].unique())  # all playlists accounted for

(49, 49)

In [117]:
# Get the keywords for my "Chill" playlist
chill_profile = playlist_profiles[playlist_ids['Chill']]
print(chill_profile.shape)  # 95 songs of vector length 948
pd.DataFrame(sorted(zip(tfidf_feature_names, chill_profile.flatten().tolist()), key=lambda x: -x[1])[:10],  # sort by value desc
             columns=['token', 'relevance'])

(1, 948)


Unnamed: 0,token,relevance
0,chill,0.515348
1,rock,0.315593
2,punk,0.208226
3,dance,0.192835
4,pop punk,0.186408
5,modern,0.166279
6,chill pop,0.148753
7,rap,0.147725
8,chill modern,0.141358
9,modern rock,0.137031


### Content-based Recommender

Now with our playlist profiles setup, we can apply a content-based recommender.

In [138]:
class ContentRecommender:
    
    def __init__(self, tracks, tfidf_matrix, playlist_profiles):
        self.tracks = tracks
        self.tfidf_matrix = tfidf_matrix
        self.playlist_profiles = playlist_profiles
        self.model_name = 'Content-based Recommender'

    def _get_similar_tracks(self, playlist_id):
        #Computes the cosine similarity between the playlist profile and all profiles
        cosine_similarities = cosine_similarity(self.playlist_profiles[playlist_id], self.tfidf_matrix)
        similar_indices = cosine_similarities.argsort().flatten()
        #Sort the similar tracks by similarity
        similar_tracks = sorted([(self.tracks['id'].tolist()[i], cosine_similarities[0,i]) for i in similar_indices], key=lambda x: -x[1])
        return similar_tracks
        
    def recommend_tracks(self, playlist_id, ignore_ids=[]):
        similar_tracks = self._get_similar_tracks(playlist_id)
        similar_tracks_non_interacted = list(filter(lambda x: x[0] not in ignore_ids, similar_tracks))
        recommendations_df = pd.DataFrame(similar_tracks_non_interacted, columns=['id', 'recStrength'])

        return recommendations_df
    
content_model = ContentRecommender(playlist_tracks_df, tfidf_matrix, playlist_profiles)

In [149]:
content_model_recommendations = content_model.recommend_tracks(playlist_ids['Chill'], interacted_tracks['id'].tolist())
# Get track details from original track data
content_model_recommendations_name = pd.merge(content_model_recommendations, playlist_tracks_df.drop_duplicates(subset='id', keep="first"), how='left', on='id')
content_model_recommendations_name[['id', 'name', 'artist_name', 'album_name', 'recStrength']].head()

Unnamed: 0,id,name,artist_name,album_name,recStrength
0,6QZ8h3RqIgTRTo3hfaqryx,Black Butterflies and Déjà Vu,The Maine,Lovely Little Lonely,0.481373
1,7DUCNQRudZwc5TzxThlaRg,Rescue Me,You Me At Six,Rescue Me,0.473033
2,4GMCZUi6cVQJ5wPWwJ36rW,Underdog,You Me At Six,Hold Me Down,0.473033
3,0J8KmsNtnDY0FdfCD1i2zx,Straight to My Head,You Me At Six,VI,0.473033
4,2AeaghRmCn2AXapvOKMbW4,Cold Night,You Me At Six,Cavalier Youth,0.465654


In [142]:
content_model_metrics, content_model_details = model_evaluator.evaluate_model(content_model)

print(content_model_metrics)
content_model_details[[x for x in popularity_model_details.columns if x != 'playlist_id']] \
    .sort_values('recall@5', ascending=False) \
    .head(10)

{'model_name': 'Content-based Recommender', 'recall@5': 0.9026425591098748, 'recall@10': 0.9652294853963839}


Unnamed: 0,n,evaluation_count,hits@5,hits@10,recall@5,recall@10
4,100,1,1,1,1.0,1.0
8,100,6,6,6,1.0,1.0
34,100,2,2,2,1.0,1.0
36,100,3,3,3,1.0,1.0
6,100,11,11,11,1.0,1.0
26,100,55,55,55,1.0,1.0
43,100,8,8,8,1.0,1.0
10,100,8,8,8,1.0,1.0
48,100,6,6,6,1.0,1.0
7,100,6,6,6,1.0,1.0


From the token relevance and exceptionally high recall it like the genre of a song is a very powerful recommender for me personally. It is likely that the Spotify breakdown of genres e.g. modern pop and chill pop helps a lot with this, alongside the bias of the dataset being playlist based (which tend to be genre/mood based).

## Collaborative Recommender

There are two main implementation strategies for a collaborative recommender:

- **Memory-based:** Computes user (user-based) or item (item-based) similarities based on past user interactions with items
- **Model-based:** Like in the accompanying recommender_playlists.ipynb file, models are developed using different ML algorithms such as neural networks, bayesian networks, and clustering models, as well as latent factor models such as Singular Value Decomposition (SVD) and, probabilistic latent semantic analysis

### Matrix Factorization

Latent factor models compress a user-item matrix into a low-dimensional representation in terms of latent factors. This has several advantages including fewer missing values, better scalability during similarity comparison, and that we will be dealing with a much smaller matrix in lower-dimensional space. Note that model-based approaches handle the sparsity of the original matrix better than memory-based ones. An import number is the number of selected factors to factor the user-item matrix. The higher the number of factors, the more precise is the factorization, and the more details of the original matrix which are memorized. Reducing the number of factors increases the model generalization whilst too many factors can lead to overfitting (high variance).

Here we a use popular latent factor model named Singular Value Decomposition (SVD) which is available in SciPy. Other choices might have been surprise, mrec, or python-recsys.

In [184]:
#Creating a sparse pivot table with users in rows and items in columns
playlist_tracks_df['event_strength'] = 1  ## create dummy column for pivot value
playlist_tracks_matrix_df = playlist_tracks_df.pivot_table(index='playlist_id',
                                                           columns='id',
                                                           values='event_strength',
                                                           aggfunc='sum',
                                                          ).fillna(0)

users_items_pivot_matrix_df.values

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [172]:
playlist_tracks_matrix = playlist_tracks_matrix_df.values
playlist_tracks_sparse = csr_matrix(playlist_tracks_matrix)
playlist_tracks_sparse

<49x2609 sparse matrix of type '<class 'numpy.float64'>'
	with 3499 stored elements in Compressed Sparse Row format>

In [174]:
#Performs matrix factorization of the original user item matrix
u, s, vt = svds(playlist_tracks_sparse, k = 15)  # k is number of factors
s = np.diag(s)
print(u.shape, vt.shape, s.shape)

(49, 15) (15, 2609) (15, 15)


In [180]:
playlist_predicted_ratings = np.dot(np.dot(u, s), vt) 
playlist_predicted_ratings
# MinMaxScaler doesn't work for global min(), max()
playlist_predicted_ratings_norm = (playlist_predicted_ratings - playlist_predicted_ratings.min()
                                  ) / (playlist_predicted_ratings.max() - playlist_predicted_ratings.min())
playlist_predicted_ratings_norm

array([[0.17640704, 0.17640294, 0.17647311, ..., 0.17725056, 0.17637331,
        0.17749934],
       [0.1748612 , 0.17509977, 0.15775461, ..., 0.17791478, 0.54944338,
        0.17798536],
       [0.18397378, 0.22086351, 0.64397648, ..., 0.20251616, 0.19243753,
        0.14780581],
       ...,
       [0.1763876 , 0.1763876 , 0.1763876 , ..., 0.1763876 , 0.1763876 ,
        0.1763876 ],
       [0.1717029 , 0.180459  , 0.18870632, ..., 0.25506078, 0.17570142,
        0.16681618],
       [0.17639438, 0.17623027, 0.17602779, ..., 0.17667016, 0.17641338,
        0.1770732 ]])

In [183]:
#Converting the reconstructed matrix back to a Pandas dataframe
matrix_preds_df = pd.DataFrame(playlist_predicted_ratings_norm, columns = playlist_tracks_matrix_df.columns, 
                               index=playlist_tracks_df['playlist_id'].unique()).transpose()
len(matrix_preds_df)

2609

### Collaborative Recommender

Now that we have completed our matrix factorisation, we can apply a collaborative recommender.

In [186]:
class CollaborativeRecommender:
    
    def __init__(self, tracks, matrix_preds_df):
        self.tracks = tracks
        self.matrix_preds_df = matrix_preds_df
        self.model_name = 'Collaborative Recommender'
    
    def recommend_tracks(self, playlist_id, ignore_ids=[], topn=10, verbose=False):
        sorted_playlist_predictions = self.matrix_preds_df[playlist_id].sort_values(ascending=False) \
                                        .reset_index().rename(columns={playlist_id: 'recStrength'})        
        recommendations_df = sorted_playlist_predictions[~sorted_playlist_predictions['id'].isin(ignore_ids)] \
                               .sort_values('recStrength', ascending = False)
        
        return recommendations_df

collaborative_model = CollaborativeRecommender(playlist_tracks_df, matrix_preds_df)

In [None]:
content_model_recommendations = content_model.recommend_tracks(playlist_ids['Chill'], interacted_tracks['id'].tolist())
# Get track details from original track data
content_model_recommendations_name = pd.merge(content_model_recommendations, playlist_tracks_df.drop_duplicates(subset='id', keep="first"), how='left', on='id')
content_model_recommendations_name[['id', 'name', 'artist_name', 'album_name', 'recStrength']].head()

In [None]:
content_model_metrics, content_model_details = model_evaluator.evaluate_model(content_model)

print(content_model_metrics)
content_model_details[[x for x in popularity_model_details.columns if x != 'playlist_id']] \
    .sort_values('recall@5', ascending=False) \
    .head(10)

## Hybrid Recommender

What if we combine Collaborative Filtering and Content-Based Filtering approaches?
Would that provide us with more accurate recommendations?
In fact, hybrid methods have performed better than individual approaches in many studies and have being extensively used by researchers and practioners.
Let's build a simple hybridization method, as an ensemble that takes the weighted average of the normalized CF scores with the Content-Based scores, and ranking by resulting score. In this case, as the CF model is much more accurate than the CB model, the weights for the CF and CB models are 100.0 and 1.0, respectivelly.

In [None]:
class HybridRecommender:
    
    MODEL_NAME = 'Hybrid'
    
    def __init__(self, cb_rec_model, cf_rec_model, items_df, cb_ensemble_weight=1.0, cf_ensemble_weight=1.0):
        self.cb_rec_model = cb_rec_model
        self.cf_rec_model = cf_rec_model
        self.cb_ensemble_weight = cb_ensemble_weight
        self.cf_ensemble_weight = cf_ensemble_weight
        self.items_df = items_df
        
    def get_model_name(self):
        return self.MODEL_NAME
        
    def recommend_items(self, user_id, items_to_ignore=[], topn=10, verbose=False): 
    
    #Getting the top-1000 Content-based filtering recommendations
        cb_recs_df = self.cb_rec_model.recommend_items(user_id, items_to_ignore=items_to_ignore, verbose=verbose,
                                                           topn=1000).rename(columns={'recStrength': 'recStrengthCB'})
        
        #Getting the top-1000 Collaborative filtering recommendations
        cf_recs_df = self.cf_rec_model.recommend_items(user_id, items_to_ignore=items_to_ignore, verbose=verbose, 
                                                           topn=1000).rename(columns={'recStrength': 'recStrengthCF'})
        
        #Combining the results by contentId
        recs_df = cb_recs_df.merge(cf_recs_df,
                                   how = 'outer', 
                                   left_on = 'contentId', 
                                   right_on = 'contentId').fillna(0.0)
        
        #Computing a hybrid recommendation score based on CF and CB scores
        #recs_df['recStrengthHybrid'] = recs_df['recStrengthCB'] * recs_df['recStrengthCF'] 
        recs_df['recStrengthHybrid'] = (recs_df['recStrengthCB'] * self.cb_ensemble_weight) \
                                     + (recs_df['recStrengthCF'] * self.cf_ensemble_weight)
        #Sorting recommendations by hybrid score
        recommendations_df = recs_df.sort_values('recStrengthHybrid', ascending=False).head(topn)

        if verbose:
            if self.items_df is None:
                raise Exception('"items_df" is required in verbose mode')

            recommendations_df = recommendations_df.merge(self.items_df, how = 'left', 
                                                          left_on = 'contentId', 
                                                          right_on = 'contentId')[['recStrengthHybrid', 'contentId', 'title', 'url', 'lang']]


        return recommendations_df
    
hybrid_recommender_model = HybridRecommender(content_based_recommender_model, cf_recommender_model, articles_df,
                                             cb_ensemble_weight=1.0, cf_ensemble_weight=100.0)

### Incorporating Popularity Recommender

Things that are popular are liked by a lot of people right, so logically we should also recommend a few things that are popular. This also helps to avoid a recommendation loop where a user is consistently recommended songs from a certain genre (as similar tracks via collaborative content-based recommendations will likely be of a few set genres).

## Recommendations