## Music Recommendation System (Machine Learning)

This project is aimed upon building a music recommendation system that gives the user recommendations on music based on his music taste by analysing his previously heard music and playlist. This project is done in two ways, using 'User - to - User Recommendation' and 'Item - to - Item Recommendation'. Birch, MiniBatchKMeans and KMeans algorithms are being used along with 'Surprise' module to compute the similarity between recommendations and user's already existing playlist for evaluation

### Obtaining Data

In [1]:
import pandas as pd
import numpy as np

In [2]:
final = pd.read_csv('datasets/final/final.csv')
metadata = pd.read_csv('datasets/final/metadata.csv')

### Model Selection - K Means Algorithm

In [3]:
from sklearn.cluster import KMeans
from sklearn.utils import shuffle

In [4]:
final = shuffle(final)

In [5]:
X = final.loc[[i for i in range(0, 6000)]]
Y = final.loc[[i for i in range(6000, final.shape[0])]]

In [6]:
X = shuffle(X)
Y = shuffle(Y)

In [7]:
metadata.head()

Unnamed: 0,track_id,album_title,artist_name,genre,track_title
0,2,AWOL - A Way Of Life,AWOL,HipHop,Food
1,3,AWOL - A Way Of Life,AWOL,HipHop,Electric Ave
2,5,AWOL - A Way Of Life,AWOL,HipHop,This World
3,10,Constant Hitmaker,Kurt Vile,Pop,Freeway
4,134,AWOL - A Way Of Life,AWOL,HipHop,Street Music


In [8]:
metadata = metadata.set_index('track_id')

In [9]:
# X.drop(['label'], axis= 1, inplace= True)

In [10]:
kmeans = KMeans(n_clusters=6)

In [11]:
# data['Cluster'] = kmeans.labels_
# data['Cluster'].sample(n=10)

In [12]:
Y.head()

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Holiday,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental
10401,10401,48263,0.389023,0.44018,0.664774,0.828462,0.089983,0.034304,113.975,0.684336,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7682,7682,33137,0.086819,0.677494,0.825255,0.663946,0.368027,0.193311,87.508,0.66939,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7579,7579,32875,0.668297,0.57791,0.787367,0.000667,0.114019,0.116422,102.859,0.362615,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8656,8656,38767,0.241785,0.412119,0.38809,0.903038,0.096951,0.044921,93.203,0.296774,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10885,10885,51206,0.001354,0.598659,0.766626,8.1e-05,0.409981,0.032713,129.957,0.247469,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
def fit(df, algo, flag=0):
    if flag:
        algo.fit(df)
    else:
         algo.partial_fit(df)          
    df['label'] = algo.labels_
    return (df, algo)

In [14]:
def predict(t, Y):
    y_pred = t[1].predict(Y)
    mode = pd.Series(y_pred).mode()
    return t[0][t[0]['label'] == mode.loc[0]]

In [15]:
def recommend(recommendations, meta, Y):
    dat = []
    for i in Y['track_id']:
        dat.append(i)
    genre_mode = meta.loc[dat]['genre'].mode()
    artist_mode = meta.loc[dat]['artist_name'].mode()
    return meta[meta['genre'] == genre_mode.iloc[0]], meta[meta['artist_name'] == artist_mode.iloc[0]], meta.loc[recommendations['track_id']]

In [16]:
t = fit(X, kmeans, 1)

In [17]:
recommendations = predict(t, Y)

In [18]:
output = recommend(recommendations, metadata, Y)
output

(                      album_title                              artist_name  \
 track_id                                                                     
 153                Arc and Sender                           Arc and Sender   
 154                Arc and Sender                           Arc and Sender   
 155               unreleased demo                           Arc and Sender   
 169                  Boss of Goth                               Argumentix   
 170                  Nightmarcher                               Argumentix   
 ...                           ...                                      ...   
 123823                      Vazio  E A Terra Nunca Me Pareceu Tão Distante   
 123824                      Vazio  E A Terra Nunca Me Pareceu Tão Distante   
 124184    Pseudologia Phantastica                                 Bloodgod   
 124185    Pseudologia Phantastica                                 Bloodgod   
 124186    Pseudologia Phantastica                  

In [19]:
genre_recommend, artist_name_recommend, mixed_recommend = output[0], output[1], output[2]

In [20]:
genre_recommend.shape

(3892, 4)

In [21]:
artist_name_recommend.shape

(52, 4)

In [22]:
mixed_recommend.shape

(1142, 4)

In [23]:
# Genre wise recommendations
genre_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [24]:
# Artist wise recommendations
artist_name_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [25]:
# Mixed Recommendations
mixed_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
14795,Philly Time!,Pink Skull,Electronic,choco taco
23979,bambada,uiutna,Electronic,MZ001
23978,bambada,uiutna,Electronic,all the crocodiles
23078,Halloween,BLOB,Jazz,Stone Cold
20001,Embered Recollections,Heosphoros,AvantGarde|International|,Tree of Daath (Divinity and Freedom)


In [26]:
recommendations.head()

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
3761,3761,14795,0.041038,0.921934,0.654352,0.838161,0.112213,0.363014,130.009,0.035935,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
5922,5922,23979,0.784419,0.243774,0.10993,0.93767,0.111165,0.045007,149.983,0.238244,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
5921,5921,23978,2.4e-05,0.622546,0.340256,0.920285,0.087766,0.145666,179.063,0.344469,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
5687,5687,23078,0.560856,0.269121,0.425737,0.884421,0.098914,0.044708,71.807,0.231842,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
5056,5056,20001,0.436054,0.416578,0.75664,0.699119,0.113516,0.059987,123.203,0.458656,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2


In [27]:
artist_name_recommend['artist_name'].value_counts()

51%    52
Name: artist_name, dtype: int64

In [28]:
genre_recommend['genre'].value_counts()

Rock    3892
Name: genre, dtype: int64

In [29]:
genre_recommend['artist_name'].value_counts()

Glove Compartment               65
Blah Blah Blah                  62
Mors Ontologica                 50
Les Baudouins Morts             38
Kraus                           35
                                ..
Modern Exteriors                 1
The Dalai Lama Rama Fa Fa Fa     1
Dirge                            1
Moon Duo                         1
Roberto Billi                    1
Name: artist_name, Length: 725, dtype: int64

#### Testing

In [30]:
testing = Y.iloc[6:12]['track_id']

In [31]:
testing

7135     31129
9152     40936
7527     32521
6685     28553
11961    82472
10437    48402
Name: track_id, dtype: int64

In [32]:
ids = testing.loc[testing.index]

In [33]:
songs = metadata.loc[testing.loc[list(testing.index)]]

In [34]:
songs

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
31129,Antique Phonograph Music Program 03/23/10,Fred van Eps,OldTime|Historic,Maurice Tango
40936,Live on WFMU's The Evan Funk Davies Show 12/08/10,The Sights,Rock,Chat With EFD
32521,Terra Firma,Sascha Müller,Electronic,Contact
28553,ccMixter,_ghost,Electronic,Lullaby
82472,SADITREVNiSAIEHCLOcIMEsSAUd,dUASsEMIcOLCHEIASiNVERTIDAS,AvantGarde|International|,Apneia
48402,Orff: Carmina Burana,MIT Concert Choir,Classical,"Dies, nox et omnia"


In [35]:
re = predict(t, Y.iloc[6:12])

In [36]:
output = recommend(re, metadata, Y.iloc[6:12])

In [37]:
ge_re, ge_ar, ge_mix = output[0], output[1], output[2]

In [38]:
ge_re.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
384,Summer Set,Blanketship,Electronic,Baja Jones
386,Summer Set,Blanketship,Electronic,Clapartroach
387,Summer Set,Blanketship,Electronic,I wish I wish
396,On the Back of a Dying Beast: Volume 1,Borful Tang,Electronic,Juggernaut Soliloquy
397,On the Back of a Dying Beast: Volume 1,Borful Tang,Electronic,The Tides Of Land


In [39]:
ge_ar.head(10)

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
16425,Antique Phonograph Music Program 09/23/2008,Fred van Eps,OldTime|Historic,Rag Pickins
17992,Antique Phonograph Music Program 01/13/2009,Fred van Eps,OldTime|Historic,Irish Hearts
31129,Antique Phonograph Music Program 03/23/10,Fred van Eps,OldTime|Historic,Maurice Tango


In [40]:
ge_mix.head(10)

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
7545,Live at WFMU on Liz Berg's Show on 9/29/2008,Antiguo Automata Mexicano,Electronic,Chez Nobody
1306,Electrified Being,Nicky Andrews,Electronic,Affective-2
19132,"Paul ""Wine"" Jones, T-Model Ford & Kenny Brown ...",T-Model Ford,Blues,I Love My Babe
3924,Live at WFMU on Scott's Show on 5/26/2000,Mink Lungs,Rock,Medley - Ultimate Slumber Party
11870,Wildahead Portibeast,Wildahead Portibeast,HipHop,What The Truth Should Be
4542,Live on WFMU on Brian Turner's Show 4/8/08,Wildildife,Rock,Kross
16383,Antique Phonograph Music Program 08/26/2008,Dorothy Kingsley,OldTime|Historic,Call Round Any Old Time
1608,Ice Machine,Six Star General,Rock,One Jack One Jerry
22434,Pocket Monster,Henry Homesweet,Electronic,Tragic But Magic
19588,Antique Phonograph Music Program 10/06/2009,Ed Morton,OldTime|Historic,What Do You Mean You Lost Your Dog


In [41]:
ge_re.shape

(2170, 4)

In [42]:
ge_ar.shape

(3, 4)

In [43]:
ge_mix.shape

(2807, 4)

### Model Selection - MiniBatchKMeans

In [44]:
from sklearn.cluster import MiniBatchKMeans

In [45]:
mini = MiniBatchKMeans(n_clusters = 6)

In [46]:
X.drop('label', axis=1, inplace=True)

In [47]:
# Let's divide the intital dataset into pieces to demonstrate online learning
part_1, part_2, part_3 = X.iloc[0: 2000], X.iloc[2000:4000], X.iloc[4000:6000]

In [48]:
for i in [part_1, part_2, part_3]:
    t = fit(i, mini)
    mini = t[1]
    i = t[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_


In [49]:
X = pd.concat([part_1, part_2, part_3])

In [50]:
X.columns

Index(['Unnamed: 0', 'track_id', 'acousticness', 'danceability', 'energy',
       'instrumentalness', 'liveness', 'speechiness', 'tempo', 'valence',
       ...
       'Salsa', 'NuJazz', 'HipHop Beats', 'Modern Jazz', 'Turkish', 'Tango',
       'Fado', 'Christmas', 'Instrumental', 'label'],
      dtype='object', length=931)

In [51]:
X.head(3)

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
3761,3761,14795,0.041038,0.921934,0.654352,0.838161,0.112213,0.363014,130.009,0.035935,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
4471,4471,17794,0.994946,0.292073,0.050897,0.110574,0.666249,0.050318,132.85,0.13336,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2106,2106,7545,0.87332,0.242819,0.364811,0.782725,0.113607,0.059082,192.401,0.267495,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


In [52]:
X['label'].value_counts()

1    2838
2    1179
3     911
0     794
4     210
5      68
Name: label, dtype: int64

In [53]:
recommendations = predict((X, mini), Y)

In [54]:
output = recommend(recommendations, metadata, Y)

In [55]:
genre_recommend_mini, artist_name_recommend_mini, mixed_mini = output[0], output[1], output[2]

In [56]:
genre_recommend_mini.shape

(3892, 4)

In [57]:
artist_name_recommend_mini.shape

(52, 4)

In [58]:
# Genre wise recommendations
genre_recommend_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [59]:
# Artist wise recommendations
artist_name_recommend_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [60]:
# Mixed Recommendations
mixed_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
14795,Philly Time!,Pink Skull,Electronic,choco taco
19330,Viandanze (EP),Fabrizio Paterlini,Classical,Profondo Blu
23979,bambada,uiutna,Electronic,MZ001
23978,bambada,uiutna,Electronic,all the crocodiles
23078,Halloween,BLOB,Jazz,Stone Cold


### Model Selection - Birch

In [61]:
from sklearn.cluster import Birch

In [62]:
birch = Birch(n_clusters = 6)

In [63]:
X.drop('label', axis=1, inplace=True)

In [64]:
# Let's divide the intital dataset into pieces to demonstrate online learning
part_1, part_2, part_3 = X.iloc[0: 2000], X.iloc[2000:4000], X.iloc[4000:6000]

In [65]:
for i in [part_1, part_2, part_3]:
    t = fit(i, birch)
    mini = t[1]
    i = t[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_


In [66]:
X = pd.concat([part_1, part_2, part_3])

In [67]:
X.columns

Index(['Unnamed: 0', 'track_id', 'acousticness', 'danceability', 'energy',
       'instrumentalness', 'liveness', 'speechiness', 'tempo', 'valence',
       ...
       'Salsa', 'NuJazz', 'HipHop Beats', 'Modern Jazz', 'Turkish', 'Tango',
       'Fado', 'Christmas', 'Instrumental', 'label'],
      dtype='object', length=931)

In [68]:
X.head(3)

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
3761,3761,14795,0.041038,0.921934,0.654352,0.838161,0.112213,0.363014,130.009,0.035935,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5
4471,4471,17794,0.994946,0.292073,0.050897,0.110574,0.666249,0.050318,132.85,0.13336,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2106,2106,7545,0.87332,0.242819,0.364811,0.782725,0.113607,0.059082,192.401,0.267495,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


In [69]:
X['label'].value_counts()

1    1946
3    1690
4     895
0     865
5     462
2     142
Name: label, dtype: int64

In [70]:
recommendations = predict((X, birch), Y)

In [71]:
output = recommend(recommendations, metadata, Y)

In [72]:
genre_recommend_birch, artist_name_recommend_birch, mixed_birch = output[0], output[1], output[2]

In [73]:
genre_recommend_birch.shape

(3892, 4)

In [74]:
artist_name_recommend_birch.shape

(52, 4)

In [75]:
# Genre wise recommendations
genre_recommend_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [76]:
# Artist wise recommendations
artist_name_recommend_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [77]:
# Mixed Recommendations
mixed_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
7545,Live at WFMU on Liz Berg's Show on 9/29/2008,Antiguo Automata Mexicano,Electronic,Chez Nobody
1306,Electrified Being,Nicky Andrews,Electronic,Affective-2
19132,"Paul ""Wine"" Jones, T-Model Ford & Kenny Brown ...",T-Model Ford,Blues,I Love My Babe
3924,Live at WFMU on Scott's Show on 5/26/2000,Mink Lungs,Rock,Medley - Ultimate Slumber Party
11870,Wildahead Portibeast,Wildahead Portibeast,HipHop,What The Truth Should Be
