# Müzik Öneri Sistemi
Müzik öneri sisteminde `Content Based` İçerik Tabanlı öneriler yapılacak.

İçerik tabanlı öneri sisteminde müziklerin sözlerine odaklanılır.

In [1]:
import pandas as pd
import numpy as np

In [2]:
musics = pd.read_csv('spotify_millsongdata.csv')
musics.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [3]:
musics.shape

(57650, 4)

Null değerler kontrol edilir.

In [4]:
musics.isnull().sum()

artist    0
song      0
link      0
text      0
dtype: int64

Öneri sisteminde kullanılmayacak olan alanlar datasetten çıkartılır.

Uygulamamızda içeriklerin doğru olması kadar hızlı bir şekilde gösterilmesi de önemlidir. Bunun için database 5000 örnek olarak küçültülür.

In [5]:
musics = musics.sample(5000).drop('link',axis = 1).reset_index(drop=True)
musics

Unnamed: 0,artist,song,text
0,Kenny Chesney,She Don't Know She's Beautiful,We go out to a party somewhere \r\nThe moment...
1,Leo Sayer,You Make Me Feel Like Dancing,You've got a cute way of talking \r\nYou got ...
2,Roxy Music,End Of The Line,Think I'll walk out in the rain \r\nCalled yo...
3,Enya,Only If...,"When there's a shadow near, reach for the sun ..."
4,Eminem,Business,"Marshall, sounds like an S.O.S. \r\nHoly whac..."
...,...,...,...
4995,Warren Zevon,Model Citizen,Don't bring the milk in \r\nLeave it on the p...
4996,Kenny Loggins,Blue On Blue,Found your picture in an old coat \r\nAnd the...
4997,The White Stripes,Screwdriver,Tuesday mornin' now \r\nI gotta have somewher...
4998,Guns N' Roses,One In A Million,Yes I needed some time to get away \r\nI need...


In [6]:
musics.shape

(5000, 3)

## Test Preprocessing 
Datasette bulunan text çok uzun ve gereksiz karakter içeriyor. Bunun temizlenmesi gerekiyor.

In [7]:
musics['text'] = musics['text'].str.lower().replace(r'^\w\s','').replace(r'\n',' ',regex = True)
musics

Unnamed: 0,artist,song,text
0,Kenny Chesney,She Don't Know She's Beautiful,we go out to a party somewhere \r the moment ...
1,Leo Sayer,You Make Me Feel Like Dancing,you've got a cute way of talking \r you got t...
2,Roxy Music,End Of The Line,think i'll walk out in the rain \r called you...
3,Enya,Only If...,"when there's a shadow near, reach for the sun ..."
4,Eminem,Business,"marshall, sounds like an s.o.s. \r holy whack..."
...,...,...,...
4995,Warren Zevon,Model Citizen,don't bring the milk in \r leave it on the po...
4996,Kenny Loggins,Blue On Blue,found your picture in an old coat \r and the ...
4997,The White Stripes,Screwdriver,tuesday mornin' now \r i gotta have somewhere...
4998,Guns N' Roses,One In A Million,yes i needed some time to get away \r i neede...


* Text token'lara dönüştürülür.
* Token'lardan vectorler elde edilir.
* 3 teknikten biri kullanılır: TF-IDF, Bag if Word, Word2Vec

Tokenization işleminde benzer kelimler çok önemli, bunların aynı sayıyla temsiledilmesi gerekir.
Ortak bir dil için stemmer kullanılır.

Vektorizasyon işlemi ile tüm metinsel veri vektörler halinde temsil edilir.

Her vektörün mesafesi hesaplanır ve hesaplanan mesafeye göre şarkı önerisi yapılır.
* Öklid mesafesi kullanılmaz.
* Cosine Similarity kullanılacak.

## Tokenization
Cümleler tokenlere ayrılır

In [8]:
import nltk 
from nltk.stem.porter import PorterStemmer
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\emine\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [9]:
stemmer = PorterStemmer()

In [10]:
def token(txt):
    token = nltk.word_tokenize(txt)
    tokens = [stemmer.stem(w) for w in token]
    return " ".join(tokens)

In [11]:
token('you are beatiful')

'you are beati'

Tüm dataset tokenleştirilir.

In [12]:
musics['text'].apply(lambda x:token(x))

0       we go out to a parti somewher the moment we wa...
1       you 've got a cute way of talk you got the bet...
2       think i 'll walk out in the rain call you time...
3       when there 's a shadow near , reach for the su...
4       marshal , sound like an s.o. . holi whack unly...
                              ...                        
4995    do n't bring the milk in leav it on the porch ...
4996    found your pictur in an old coat and the ghost...
4997    tuesday mornin ' now i got ta have somewher to...
4998    ye i need some time to get away i need some pe...
4999    freedom , freedom get up stand up , let 's cel...
Name: text, Length: 5000, dtype: object

## Vectorization 
Her bir tokene karşılık gelen bir sayı belirlenir, ve cümleler vektör haline getirilir.

Tf-Idf tekniği kullanılır.
* TfidVectorizer

Distance hesaplamak için cosine_similarity kullanılır.

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [14]:
tfid = TfidfVectorizer(analyzer= 'word',stop_words = 'english')

In [15]:
matrix = tfid.fit_transform(musics['text'])

In [16]:
smiler = cosine_similarity(matrix)

In [17]:
smiler[0]

array([1.        , 0.02062241, 0.08004748, ..., 0.01386358, 0.08565929,
       0.05221624])

Recommender

In [18]:
def recommender(song_name):
    idx =  musics[musics['song'] == song_name].index[0]
    distance = sorted(list(enumerate(smiler[idx])),reverse=True, key=lambda x:x[1])
    song = []
    for s_id in distance[1:20]:
        song.append(musics.iloc[s_id[0]].song)
    return song

In [21]:
songs = recommender("You Make Me Feel Like Dancing")

In [22]:
songs

['Daily Disco',
 'Jupiter Spin',
 'Dancing In The Dark',
 'Back Street Joe',
 'I Could Sing Of Your Love Forever',
 'Cheek To Cheek',
 'Cheek To Cheek',
 'Cheek To Cheek',
 'Cheek To Cheek',
 "John, I'm Only Dancing",
 'Like A Machine',
 'Do You Wanna Dance?',
 'We Own The Night',
 'Why',
 'I Feel Like Dying',
 'Only Girl',
 'Nothing To Fear',
 'I Wanna Dance With Somebody',
 'Feel You All Over']

In [20]:
musics

Unnamed: 0,artist,song,text
0,Kenny Chesney,She Don't Know She's Beautiful,we go out to a party somewhere \r the moment ...
1,Leo Sayer,You Make Me Feel Like Dancing,you've got a cute way of talking \r you got t...
2,Roxy Music,End Of The Line,think i'll walk out in the rain \r called you...
3,Enya,Only If...,"when there's a shadow near, reach for the sun ..."
4,Eminem,Business,"marshall, sounds like an s.o.s. \r holy whack..."
...,...,...,...
4995,Warren Zevon,Model Citizen,don't bring the milk in \r leave it on the po...
4996,Kenny Loggins,Blue On Blue,found your picture in an old coat \r and the ...
4997,The White Stripes,Screwdriver,tuesday mornin' now \r i gotta have somewhere...
4998,Guns N' Roses,One In A Million,yes i needed some time to get away \r i neede...


In [102]:
songs = musics[musics['song'].isin(songs)][['artist', 'song']]

In [103]:
songs

Unnamed: 0,artist,song
340,Peter Cetera,Only Love Knows Why
664,NOFX,American Errorist
872,Hanson,I Am
1368,Christina Aguilera,Make The World Move
1613,Offspring,Cool To Hate
2284,Alphaville,Spirit Of The Age
2314,Kris Kristofferson,Jesus Was A Capricorn
2391,Beautiful South,Bell-Bottomed Tear
3131,Dusty Springfield,Another Night
3174,Unseen,No Master Race


Json dosyasına kaydedilir.

In [104]:
songs.to_json("songs.json")