In [1]:
import pandas as pd
import numpy as np
from grouprecommender import GroupRecommender

Using TensorFlow backend.


# Loading the Dataset

The last.fm dataset was extracted from http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html and contains  information about the listening habits of almost 1000 users listening to almost a million songs. Each row represents contains information about one song listened by one user at a certain time. Its columns are:

* user_id: the identification of the user that listened to the song.
* timestamp: when was the song listend to.
* artist_id: the identification of the artist performing the song.
* artist_name: the corresponding name of said artist.
* track_id: the identification of the song that was listened.
* track_name: the name of said song.

Here we read the dataset into a pandas dataframe. Some of the columns were poorly formatted and broke the reading process. As they were just 9 out of millions, we considered it safe to just skip them and not include them on the dataframe.

In [2]:
poorly_formatted_rows = [2120259, 2446317, 11141080, 11152098, 11152401, 11882086, 12902538, 12935043, 17589538]
df = pd.read_csv('lastfm_data/userid-timestamp-artid-artname-traid-traname.tsv', header=None, skiprows=poorly_formatted_rows, sep='\t')
df.columns = ['user_id', 'timestamp', 'artist_id', 'artist_name', 'track_id', 'track_name']
df.dropna(inplace=True)

# Group Recommendations

The group recommender object is created here. It takes as parameters:

* utility_matrix: The utility matrix derived from the above dataset and stored in a pickle format. It has a row for each song and a column for each user. Each entry represents how many times the user in the columns listend to the track in the row. As most tracks haven't been listened by each user, this matrix is sparse and represented in an appropriate format.
* dataset: the dataset that generated the utility matrix.
* algo_path: path to the pickled ALS model that makes single user recommendations.
* embedding_model_path: the item2vec model which takes a track id and maps it into an embedding space which captures the "semantics" of the song.
* model_weights_path: the weights of the embedding model.
* embedding_space_path: this is a list where each entry is a song-vector of the chosen vocabulary.
* dicts_path: dictionaries that map from track_ids to inputs to the embedding model and back.

In [3]:
gr = GroupRecommender('utility_matrix.pickle',
                      dataset=df,
                      pickled_model_path='model.pickle',
                      embedding_model_path='embedding_model.yaml',
                      model_weights_path='embedding_model.h5',
                      embedding_space_path='embedding.npy',
                      dicts_path='song_dicts.pickle')

## Obtaining Group Recommendations

Here we use the object to make recommendations. We create a group of $N$ random users from the dataset and make `max_recommendations` recommendations to that group.

There are 3 recommendation methods for the group, but the important one is item2vec.

* The `naive` method will make `max_recommendation` using the ALS algorithm to each user and then get the intersection of these recommendations. It might return no results as there might be no songs in the intersections, so it is usually better to use large values of `max_recommendation` and the number of songs in the result is not predictable.
* the `mean` method uses the ALS to rank all the songs in the dataset for each user and then takes the mean of their scores and selects the ones with the `max_recommendation` highest scores.
* the `item2vec` method uses ALS to make the best recommendation to each user in the group, then uses the item2vec embedding model to convert them to song-vectors in the embedding space. It then takes the median song vector of these songs and finds the `max_recommendation` nearest neighbors to the median vector and then converts them back to track ids.

In [4]:
user_ids = np.random.choice(df['user_id'].unique(), 3, replace=False)
max_recommendations = 10
playlist = gr.full_recommendation(user_ids, max_recommendations, df, method='item2vec')

In [5]:
print("Recommended Songs for the Group:", user_ids)
for i, track in enumerate(playlist):
    print(str(i) + ')' + track[0] + ' - ' + track[1])

Recommended Songs for the Group: ['user_000167' 'user_000658' 'user_000213']
0)Sex Pistols - God Save The Queen
1)Placebo - Song To Say Goodbye
2)Uriah Heep - Lady In Black
3)Judas Priest - A Touch Of Evil
4)Eddie Vedder - Society
5)The Who - Baba O'Riley
6)Marilyn Manson - Heart-Shaped Glasses (When The Heart Guides The Hand)
7)The Chemical Brothers - Do It Again
8)Sly & The Family Stone - If You Want Me To Stay
9)The 69 Eyes - Velvet Touch
10)Cat Power - Maybe Not


## Creating Group Playlist

The following functions use Spotify's API and our app's id and secret to create a session in Spotify.

In [8]:
import spotipy
import pprint
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

client_id = '5de79a00f1b9475098f277ce2a609120'
client_secret = '909633d1415345fe89882b6e398390df'
client_credentials_manager = SpotifyClientCredentials(client_id=client_id,
                                                          client_secret=client_secret)

sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

We use the track ids from the recommendations and get the track name in a format of 'track - artist' and then search for them on Spotify and store the first result's spotify id for said track. This is done for all the recommended tracks.

In [9]:
spotify_track_ids = []
for track in playlist:
    search_str = track[0] + ' ' + track[1]
    result = sp.search(search_str, limit=1)
    spotify_track_ids.append(result['tracks']['items'][0]['id'])

A playlist is created and for the username provided, the playlist is added to their profile. This can be done for the users in the group.

In [10]:
playlist_name = 'Group recommendations for '
playlist_name += ', '.join(user_ids)

In [11]:
username = '11130460071'
scope='playlist-modify-public'
token = util.prompt_for_user_token(username, 
                                   client_id=client_id, 
                                   client_secret=client_secret, 
                                   redirect_uri='https://github.com/alexing10/musicmagal',
                                   scope=scope)

In [12]:
sp = spotipy.Spotify(auth=token)
playlist = sp.user_playlist_create(username, playlist_name)
sp.user_playlist_add_tracks(username, playlist['id'], spotify_track_ids)
print("boom!")

boom!


Just checking the creation of the playlist:

In [19]:
playlist_id = '0ChGv9bC33XK1zLtA78y9F'
sp.user_playlist_add_tracks(username, playlist_id, spotify_track_ids)

{'snapshot_id': 'oRoC3cCOO7mnauSmpMKZpVtSHgzRSSjGlAFVyyajNmNXYltq4fC0QA65082rDFtp'}

Currently, our project is not scalable. We can only make recommendations for the users already in the dataset and then create playlists for users whose spotify ids are provided to us. There is still no way to recommend to unseen users whose information is taken from Spotify and then adding the playlist to their respective accounts.