# Music Recommendation System
##### autor: Nikola Obradović 
Opis: „Sistem za preporuku muzike reporučuje pesme koje korisnik nije poslušao na osnovu njegove prethodne istorije slušanja.“

#### Funkcionisanje sistema
Sistem koristi podatke iz Millions songs Dataset skupa podataka.
U podacima je opisano koliko puta je svaki korisnik poslušao neku pesmu i metapodaci koji bolje opisuju svaku pesmu.

### Algoritam
Pretpostavimo da se korisniku neka pesma koju je samo jedanput poslušao nije svidela, a sviđaju mu se pesme ukoliko ih je slušao više puta.
Želimo da preporučimo pesme koje korisnik nije poslušao, a odgovaraju njegovim muzičkim afinitetima.

 Izdvajamo podatke u 2 skupa:
1. Kojem korisniku se koje pesme sviđaju
2. Kojem korisniku se koje pesme ne sviđaju
Na prvi skup upotrebimo NearestNeighbors algoritam da bi nam pokazao 5 korisnika sličnih afiniteta.
Pesme koje vole slični korisnici izdvojimo, uporedimo sa pesmama koje je korisnik već poslušao i preporučimo mu samo one pesme koje još nije poslušao.
Ukoliko je broj preporučenih pesama manji od 20, dodajemo onoliko najpopularnijih pesama da broj preporučenih pesama bude 20.

## Ukljucujemo biblioteke koje koristimo

In [50]:
import numpy as np
import pandas as pd
import sklearn
#from sklearn.decomposition import TruncatedSVD
from sklearn.neighbors import NearestNeighbors

## Ucitavamo podatke

Ucitavamo podatke sa adrese:https://static.turi.com/datasets/millionsong/10000.txt , https://static.turi.com/datasets/millionsong/song_data.csv

Na prvoj adresi su prikazani podaci u formi tabele: user_id | song_id | listen_count

Na drugoj adresi su metapodaci koji opisuju pesmu.

In [51]:
#Read userid-songid-listen_count triplets
#This step might take time to download data from external sources
triplets_file = 'https://static.turi.com/datasets/millionsong/10000.txt'
songs_metadata_file = 'https://static.turi.com/datasets/millionsong/song_data.csv'

song_df_1 = pd.read_table(triplets_file,header=None)
song_df_1.columns = ['user_id', 'song_id', 'listen_count']

#Read song  metadata
song_df_2 =  pd.read_csv(songs_metadata_file)

#Merge the two dataframes above to create input dataframe for recommender systems
song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on="song_id", how="left") 

Smanjujemo listu podataka, kako se ne bi podaci ucitavali satima

In [52]:
song_df = song_df.head(10000)

Dodajemo novi podatak u matricu

In [53]:
#Merge song title and artist_name columns to make a merged column
song_df['song'] = song_df['artist_name'].map(str) + " - " + song_df['title']

In [54]:
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,Jack Johnson - The Cove
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Paco De Lucia - Entre Dos Aguas
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Kanye West - Stronger
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Jack Johnson - Constellations
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Foo Fighters - Learn To Fly


Sortiramo pesme po popularnosti

In [55]:
# most popular songs

song_grouped = song_df.groupby(['song']).agg({'listen_count': 'count'}).reset_index()
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage']  = song_grouped['listen_count'].div(grouped_sum)*100
most_popular_songs = song_grouped.sort_values(['listen_count', 'song'], ascending = [0,1])
most_popular_songs

Unnamed: 0,song,listen_count,percentage
1994,Harmonia - Sehr kosmisch,45,0.45
515,Björk - Undo,32,0.32
1402,Dwight Yoakam - You're The One,32,0.32
1694,Florence + The Machine - Dog Days Are Over (Ra...,28,0.28
3429,OneRepublic - Secrets,28,0.28
988,Coldplay - The Scientist,27,0.27
2603,Kings Of Leon - Use Somebody,27,0.27
2598,Kings Of Leon - Revelry,26,0.26
874,Charttraxx Karaoke - Fireflies,24,0.24
374,Barry Tuckwell/Academy of St Martin-in-the-Fie...,23,0.23


### Odvajamo podatke u dva skupa

#### 1. skup
Podaci o tome koji je korisnik koju pesmu poslusao vise puta

#### 2. skup
Podaci o tome koji korisnik je koju pesmu poslusao samo jedanput

In [56]:
#skup podataka koje su pesme koji korisnici slusali vise puta
skup1 = song_df.loc[song_df['listen_count'] > 1]
print(len(skup1))

#skup podataka pesama koje su jednom poslusali
skup2 = song_df.loc[song_df['listen_count'] == 1]
print(len(skup2))

len(song_df)

4349
5651


10000

In [57]:
skup1.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Paco De Lucia - Entre Dos Aguas
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll,Antología Audiovisual,Héroes del Silencio,2007,Héroes del Silencio - Apuesta Por El Rock 'N' ...
11,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOIZAZL12A6701C53B,5,I'll Be Missing You (Featuring Faith Evans & 1...,No Way Out,Puff Daddy,0,Puff Daddy - I'll Be Missing You (Featuring Fa...
14,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOKRIMP12A6D4F5DA3,5,I?'m A Steady Rollin? Man,Diggin' Deeper Volume 7,Robert Johnson,0,Robert Johnson - I?'m A Steady Rollin? Man
16,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOMGIYR12AB0187973,6,Behind The Sea [Live In Chicago],Live In Chicago,Panic At The Disco,0,Panic At The Disco - Behind The Sea [Live In C...


In [58]:
skup2.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,Jack Johnson - The Cove
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Kanye West - Stronger
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Jack Johnson - Constellations
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Foo Fighters - Learn To Fly
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODXRTY12AB0180F3B,1,Paper Gangsta,The Fame Monster,Lady GaGa,2008,Lady GaGa - Paper Gangsta


### Pravimo matricu za preporucivanje
Preporucujemo korisniku pesme koje se svidjaju korisnicima kojima se svidjaju iste pesme koje se svidjaju korisniku kojem preporucujemo

In [59]:
# Building a Utility matrix

dataframe = skup1.pivot_table(values = 'listen_count', index = 'user_id', columns = 'song', fill_value=0)
dataframe

song,+ / - {Plus/Minus} - The Queen of Nothing,112 - Only You-Bad Boy Remix (Featuring The Notorious B.I.G. & Mase) (Album Version),16Volt - Machine Kit,1990s - Pollockshields,2Pac - Starin' Through My Rear View,3 Doors Down - Here Without You,3 Doors Down - Kryptonite,3 Doors Down - When I'm Gone,3OH!3 - My First Kiss (Feat. Ke$ha) [Album Version],3OH!3 - STARSTRUKK [FEATURINGKATYPERRY] (Explicit Bonus Version),...,matchbox twenty - Unwell (Album Version),the bird and the bee - Baby,the bird and the bee - Diamond Dave,the bird and the bee - Love Letter To Japan,the bird and the bee - Meteor,the bird and the bee - My Love,the bird and the bee - Polite Dance Song,the bird and the bee - Ray Gun,the bird and the bee - What's In The Middle,the bird and the bee - You're A Cad
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0152fcbd02b172a874c75a57a913f0f0109ba272,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
01655ae6bc52e29c9cd100a7dde4e9eeae5e4031,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
03c90bfd09151973863c4cadd5a749cd7982abc0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0560337e6a33be7149c9568b5fde5788294fe101,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
06b31818386e598017a475f8e349b3ca31ba3178,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
07caa920795cd4f20bfeeb0e192a5ddd9566ecdd,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
08c129083a44492415e40b70d8f90755e15f4a91,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0a00498b9d607844a8826184ae7278097d1c008a,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0a004c08b700e4edb74b44c2dbceca4280760a9a,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0afaa5d9d04bf85af720fe8cc566a41ca3e41c97,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Koristimo NearestNeighbors algoritam kako bismo nasli slicne korisnike

In [60]:
nbrs = NearestNeighbors(n_neighbors=5).fit(dataframe)

In [61]:
user = dataframe.iloc[[50]]
user

song,+ / - {Plus/Minus} - The Queen of Nothing,112 - Only You-Bad Boy Remix (Featuring The Notorious B.I.G. & Mase) (Album Version),16Volt - Machine Kit,1990s - Pollockshields,2Pac - Starin' Through My Rear View,3 Doors Down - Here Without You,3 Doors Down - Kryptonite,3 Doors Down - When I'm Gone,3OH!3 - My First Kiss (Feat. Ke$ha) [Album Version],3OH!3 - STARSTRUKK [FEATURINGKATYPERRY] (Explicit Bonus Version),...,matchbox twenty - Unwell (Album Version),the bird and the bee - Baby,the bird and the bee - Diamond Dave,the bird and the bee - Love Letter To Japan,the bird and the bee - Meteor,the bird and the bee - My Love,the bird and the bee - Polite Dance Song,the bird and the bee - Ray Gun,the bird and the bee - What's In The Middle,the bird and the bee - You're A Cad
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
28b232e7ecb32c47c05b795a017786d4be96ef7e,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [62]:
similar_users = nbrs.kneighbors(user)

In [63]:
indexi = list(similar_users[1])
preporuke = []
for user_index in indexi[0]:
    userTest = dataframe.iloc[[user_index]]
    for val in list(userTest.columns.values):
        if(userTest.iloc[0][val] > 0):
            preporuke.append(val)
set(preporuke)

{"Ashra - Don't Trust The Kids",
 'Blackalicious - Side To Side (Featuring Lateef & Pigeon John) (Album Version)',
 'Damian Marley - Welcome To Jamrock',
 'Diana Ross - Touch Me In The Morning',
 'Flogging Molly - Drunken Lullabies',
 'Gino Fioravanti_ Gianluigi Toso - Il tempio interiore',
 'Harmonia - Sehr kosmisch',
 'Ignition - D.O.A. Part 2',
 'Metric - Gold Guns Girls',
 'Metric - Grow Up and Blow Away',
 'Metric - Wet Blanket',
 "Minnie Riperton - Lovin' You (1993 Digital Remaster)",
 'Notes From Underground - Be Like You',
 'Red Hot Chili Peppers - Fortune Faded (Album Version)',
 'Shotta - El Justiciero',
 'The Rolling Stones - Angie (1993 Digital Remaster)'}

In [64]:
# ako je broj preporuka manji od 20, dodaj najpopularnije


brojPreporuka = len(preporuke)

if(brojPreporuka < 20):
    most_popular_songs_list = list(most_popular_songs['song'].head(20-brojPreporuka))
    
    for song in most_popular_songs_list:
        preporuke.append(song)

preporuke

["Ashra - Don't Trust The Kids",
 'Blackalicious - Side To Side (Featuring Lateef & Pigeon John) (Album Version)',
 'Damian Marley - Welcome To Jamrock',
 'Diana Ross - Touch Me In The Morning',
 'Flogging Molly - Drunken Lullabies',
 'Gino Fioravanti_ Gianluigi Toso - Il tempio interiore',
 'Ignition - D.O.A. Part 2',
 'Metric - Gold Guns Girls',
 'Metric - Wet Blanket',
 'Notes From Underground - Be Like You',
 'Red Hot Chili Peppers - Fortune Faded (Album Version)',
 'Shotta - El Justiciero',
 'Harmonia - Sehr kosmisch',
 "Minnie Riperton - Lovin' You (1993 Digital Remaster)",
 'The Rolling Stones - Angie (1993 Digital Remaster)',
 'Metric - Grow Up and Blow Away',
 'Harmonia - Sehr kosmisch',
 'Björk - Undo',
 "Dwight Yoakam - You're The One",
 'Florence + The Machine - Dog Days Are Over (Radio Edit)']

In [65]:
# What user likes??

userLikes = []
for val in list(user.columns.values):
    if(user.iloc[0][val] > 0):
        userLikes.append(val)
userLikes

["Ashra - Don't Trust The Kids",
 'Blackalicious - Side To Side (Featuring Lateef & Pigeon John) (Album Version)',
 'Damian Marley - Welcome To Jamrock',
 'Diana Ross - Touch Me In The Morning',
 'Flogging Molly - Drunken Lullabies',
 'Gino Fioravanti_ Gianluigi Toso - Il tempio interiore',
 'Ignition - D.O.A. Part 2',
 'Metric - Gold Guns Girls',
 'Metric - Wet Blanket',
 'Notes From Underground - Be Like You',
 'Red Hot Chili Peppers - Fortune Faded (Album Version)',
 'Shotta - El Justiciero']

In [66]:
# What user doesn't likes??

userDislikeTest = dataframe.iloc[[50]]
userDislikeTestIndex = userDislikeTest.index[0]
userDislikes = list(skup2.loc[skup2['user_id'] == userDislikeTestIndex]['song'])

userDislikes

['Atmosphere - Godlovesugly',
 'Red Hot Chili Peppers - Easily (Album Version)',
 'Metric - Grow Up and Blow Away',
 'Red Hot Chili Peppers - Dosed (Album Version)',
 'Red Hot Chili Peppers - Death Of A Martian (Album Version)',
 'Red Hot Chili Peppers - Especially In Michigan (Album Version)',
 "Red Hot Chili Peppers - C'mon Girl (Album Version)",
 'Red Hot Chili Peppers - Hard To Concentrate (Album Version)',
 'Red Hot Chili Peppers - Californication (Album Version)',
 'Metric - Collect Call',
 'Atmosphere - Trying To Find A Balance',
 'Angels and Airwaves - The Gift',
 'Rancid - Fall Back Down (Album Version)',
 'Sublime - Right Back',
 'Kari Hansa & Gregers Hes - Se Dagen Kom']

In [67]:
for s1 in preporuke:
    for s2 in userDislikes:
        if(s1==s2):
            print('korisnik ne voli pesmu ',s1)
            preporuke.remove(s1)
len(preporuke)

korisnik ne voli pesmu  Metric - Grow Up and Blow Away


19

In [68]:
# ako je broj preporuka manji od 20, dodaj najpopularnije

brojPreporuka = len(preporuke)

if(brojPreporuka < 20):
    most_popular_songs_list = list(most_popular_songs['song'].head(20-brojPreporuka))
    
    for song in most_popular_songs_list:
        preporuke.append(song)

preporuke

["Ashra - Don't Trust The Kids",
 'Blackalicious - Side To Side (Featuring Lateef & Pigeon John) (Album Version)',
 'Damian Marley - Welcome To Jamrock',
 'Diana Ross - Touch Me In The Morning',
 'Flogging Molly - Drunken Lullabies',
 'Gino Fioravanti_ Gianluigi Toso - Il tempio interiore',
 'Ignition - D.O.A. Part 2',
 'Metric - Gold Guns Girls',
 'Metric - Wet Blanket',
 'Notes From Underground - Be Like You',
 'Red Hot Chili Peppers - Fortune Faded (Album Version)',
 'Shotta - El Justiciero',
 'Harmonia - Sehr kosmisch',
 "Minnie Riperton - Lovin' You (1993 Digital Remaster)",
 'The Rolling Stones - Angie (1993 Digital Remaster)',
 'Harmonia - Sehr kosmisch',
 'Björk - Undo',
 "Dwight Yoakam - You're The One",
 'Florence + The Machine - Dog Days Are Over (Radio Edit)',
 'Harmonia - Sehr kosmisch']