# Song Recommendation

In this case, I make a Recommender Popularity-based recommendation system which is a custom function that accepts 3 arguments: dataset, recommended number of songs, and minimum number of song ratings that will allow us to find a top 10 songs recommendation.

Also, I create a users collaborative filtering recommendation system in the form of a custom function that accepts 4 arguments: dataset, user id, number of recommended songs, and algorithm that will allow us to find the top 10 songs recommendation for a single user. with the recommendations given are songs that have not been rated by the user.

The sections of this project are:

1. Data Cleansing & Preparation
2. Find a top 10 songs recommendation
3. Recommender Model Building for a top 10 songs recommendation for single user

**Import Libraries**

In [24]:
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
from surprise import KNNBasic, Dataset, Reader, accuracy
from surprise.model_selection import train_test_split as tts_surprise
from surprise.model_selection import GridSearchCV as gscv_surprise

# 1. Data Cleansing & Preparation

input the dataset

In [2]:
song = pd.read_csv('song_quarter.csv')
song.head()

Unnamed: 0,song_id,title,artist_name,user_id,rating
0,3165,Mockingbird,Eminem,20852,1.0
1,6263,Here We Go Again (Album Version),Paramore,24713,1.0
2,8518,Love Is Here,Tenth Avenue North,72102,1.0
3,5706,Mykonos,Fleet Foxes,44037,1.0
4,859,Stroke You Up (LP Version),Changing Faces,8557,1.0


quick look the data

In [58]:
song.duplicated().sum()

0

no duplicated data

In [13]:
song.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21736 entries, 0 to 21735
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   song_id      21736 non-null  int64  
 1   title        21736 non-null  object 
 2   artist_name  21736 non-null  object 
 3   user_id      21736 non-null  int64  
 4   rating       21736 non-null  float64
dtypes: float64(1), int64(2), object(2)
memory usage: 849.2+ KB


no missing values

# 2. Find a top 10 songs recommendation

first, I build a dictionaries with contents is song id, title, and artist name

In [3]:
inc = []
song_dict = {}
for x, y, z in zip(song['song_id'], song['title'], song['artist_name']):
    if x not in inc:
        song_dict[x] = y, z
        inc.append(x)

In [9]:
song_dict

{3165: ('Mockingbird', 'Eminem'),
 6263: ('Here We Go Again (Album Version)', 'Paramore'),
 8518: ('Love Is Here', 'Tenth Avenue North'),
 5706: ('Mykonos', 'Fleet Foxes'),
 859: ('Stroke You Up (LP Version)', 'Changing Faces'),
 1891: ('Loud Pipes', 'Ratatat'),
 2764: ('Lesson Learned', 'Ray LaMontagne'),
 4160: ('How You Remind Me', 'Nickelback'),
 8392: ('Aerodynamic (Daft Punk Remix)', 'Daft Punk'),
 2679: ('Sidewinder (Album Version)', 'Avenged Sevenfold'),
 8269: ('Naughty Girl', 'Beyoncé'),
 3291: ('Michael', 'Soltero'),
 3389: ('The Nest', 'José González'),
 9562: ('Riverside', 'Sidney Samson'),
 2030: ("Ball Of Confusion (That's What The World Is Today)",
  'The Temptations'),
 3564: ('Fuck Kitty', 'Frumpies'),
 4684: ('Swagger', 'Flogging Molly'),
 652: ('Charlotte Sometimes', 'The Cure'),
 7381: ('The Rain', 'DMX'),
 1331: ('Para Hacerme Perdonar (En Vivo Teatro Metropolitan)', 'Ely Guerra'),
 8091: ('Passion', 'Gat Décor'),
 5062: ('Re: Your Brains', 'Jonathan Coulton'),
 3

then, build a dataframe that we need which is the dataset, recommended number of songs, and minimum number of song ratings

In [10]:
rating_mean = song.groupby('song_id').mean()
rating_count = song.groupby('song_id').count()

In [12]:
song_df = rating_mean.join(rating_count, lsuffix='_mean', rsuffix='_count').drop(['user_id_mean','user_id_count', 'artist_name', 'title'], axis=1).reset_index()
song_df.head()

Unnamed: 0,song_id,rating_mean,rating_count
0,16,1.186047,172
1,28,1.464286,28
2,44,1.0,21
3,45,1.0,34
4,98,1.0,22


Build a Recommender system

In [15]:
def pop_recom(df, thre, n):
    df_recom = df.copy(deep=True)
    ori_title = []
    for id in df_recom['song_id']:
        ori_title.append(song_dict[id])
    df_recom['title'] = ori_title
    
    df_recom = df_recom[df_recom['rating_count'] < thre]
    return df_recom.sort_values('rating_mean', ascending=False).iloc[:n]

Let's test our recommender system for top 10 recommendation songs!

In [19]:
pop_recom(song_df, 100, 10)

Unnamed: 0,song_id,rating_mean,rating_count,title
385,8346,1.782609,23,(Walk Through Hell (featuring Max Bemis Acoust...
293,6183,1.772727,22,"(Chapel Of Ghouls (Live), Morbid Angel)"
111,2471,1.736842,19,"(OVO MI JE `KOLA, Gibonni)"
33,761,1.65,20,"(People Got To Be Free, The Rascals)"
53,1160,1.64,25,"(Here Comes The Monkey, The Hawaiians)"
378,8145,1.630952,84,"(Observándonos (Satélites), Soda Stereo)"
205,4176,1.625,16,"(Romanticótico, Cuentos Borgeanos)"
71,1540,1.6,15,"(Stratus [The Bottom Shelf], Tommy Bolin)"
42,874,1.583333,12,"(Ein Schiff Wird Kommen..., Lale Andersen)"
218,4400,1.578947,57,"(KissKiss, Parov Stelar)"


# 3. Recommender Model Building for a top 10 songs recommendation for single user

In [21]:
song_us = song.drop(['artist_name', 'title'], axis=1)
song_us.head()

Unnamed: 0,song_id,user_id,rating
0,3165,20852,1.0
1,6263,24713,1.0
2,8518,72102,1.0
3,5706,44037,1.0
4,859,8557,1.0


In [23]:
reader = Reader(rating_scale=(0,5))
data = Dataset.load_from_df(song_us, reader)
trainset, testset = tts_surprise(data, test_size=0.2, random_state=42)

In [46]:
param_grid = {'k': [10, 20, 30],
             'min_k': [3, 6, 9],
             'sim_options': {'name': ['msd', 'pearson', 'cosine'], 'user_based': [True]}}

gscv = gscv_surprise(KNNBasic, param_grid, cv=5, measures = ['rmse'])

In [47]:
gscv.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine s

Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the ms

In [48]:
gscv.best_params

{'rmse': {'k': 10,
  'min_k': 3,
  'sim_options': {'name': 'pearson', 'user_based': True}}}

In [49]:
gscv.best_score

{'rmse': 0.49079114002207086}

In [50]:
knn_basic_user = KNNBasic(**gscv.best_params)

In [51]:
knn_basic_user.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x19b55215f10>

In [52]:
prediction = knn_basic_user.test(testset)
accuracy.rmse(prediction)

RMSE: 0.5065


0.5064576520368301

we have a model with an Root Mean Square Error (RMSE) is 0.5064576520368301

then, build a recommendation system for single user

In [53]:
def recom_cf(df, user, n, model):
    rated = df[df['user_id'] == user]['song_id']
    
    unrated = []
    for id in df['song_id'].unique():
        if id not in rated:
            if id not in unrated:
                unrated.append(id)
    
    est_rating = []
    for id in unrated:
        est_rating.append(model.predict(user, id).est)
    
    ori = []
    for id in unrated:
        ori.append(song_dict[id])

    return pd.DataFrame({'song_id': unrated,
                        'est_rating': est_rating,
                        'title': ori}).sort_values('est_rating', ascending=False).iloc[:n].reset_index(drop=True)

Let's try the top 10 recommendation songs system with someone who has user_id 8557

In [55]:
recom_cf(song_us, 8557, 10, knn_basic_user)

Unnamed: 0,song_id,est_rating,title
0,3165,1.188636,"(Mockingbird, Eminem)"
1,3398,1.188636,"(Cold Water, Damien Rice)"
2,2621,1.188636,"(Excuse Me Mr., No Doubt)"
3,995,1.188636,"(Kept It Too Real (Amended Album Version), Plies)"
4,6852,1.188636,"(June Evenings, Air France)"
5,6784,1.188636,"(Mad World, Tears For Fears)"
6,9105,1.188636,"(Rest, The Temper Trap)"
7,7775,1.188636,"(Weenie Beenie, Foo Fighters)"
8,2990,1.188636,"(Right Down The Line, Gerry Rafferty)"
9,6006,1.188636,"(Let's Go All The Way (Short Blix Mix), Sly Fox)"
