# Müzik Öneri Sistemi
Müzik öneri sisteminde `Content Based` İçerik Tabanlı öneriler yapılacak.

İçerik tabanlı öneri sisteminde müziklerin sözlerine odaklanılır.

In [61]:
import pandas as pd
import numpy as np

In [62]:
musics = pd.read_csv('spotify_millsongdata.csv')
musics.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [63]:
musics.shape

(57650, 4)

Null değerler kontrol edilir.

In [64]:
musics.isnull().sum()

artist    0
song      0
link      0
text      0
dtype: int64

Öneri sisteminde kullanılmayacak olan alanlar datasetten çıkartılır.

Uygulamamızda içeriklerin doğru olması kadar hızlı bir şekilde gösterilmesi de önemlidir. Bunun için database 5000 örnek olarak küçültülür.

In [65]:
musics = musics.sample(5000).drop('link',axis = 1).reset_index(drop=True)
musics

Unnamed: 0,artist,song,text
0,Cyndi Lauper,A Part Hate,Somber sister \r\nThis is a strange and bitte...
1,Cat Stevens,Randy,"Oh randy, if they knew. \r\nI think they'd ta..."
2,Pat Benatar,Tell Me,Why do I have these thoughts go through my hea...
3,Ziggy Marley,Head Top,"The session is nice, feel real good \r\nThe s..."
4,Grateful Dead,Hey Jude,Hey Jude don't make it bad \r\nTake a sad son...
...,...,...,...
4995,Christina Perri,Backwards,Take me backwards \r\nTurn me around \r\nCan...
4996,John Prine,Everybody Wants To Feel Like You,While out sailing on the ocean \r\nWhile out ...
4997,Halloween,Witches' Brew,"By Hap Palmer \r\n \r\nDead leaves, seaweed,..."
4998,Allman Brothers Band,Dreams,"Just one more mornin, I had to wake up with th..."


In [66]:
musics.shape

(5000, 3)

## Test Preprocessing 
Datasette bulunan text çok uzun ve gereksiz karakter içeriyor. Bunun temizlenmesi gerekiyor.

In [67]:
musics['text'] = musics['text'].str.lower().replace(r'^\w\s','').replace(r'\n',' ',regex = True)
musics

Unnamed: 0,artist,song,text
0,Cyndi Lauper,A Part Hate,somber sister \r this is a strange and bitter...
1,Cat Stevens,Randy,"oh randy, if they knew. \r i think they'd tak..."
2,Pat Benatar,Tell Me,why do i have these thoughts go through my hea...
3,Ziggy Marley,Head Top,"the session is nice, feel real good \r the se..."
4,Grateful Dead,Hey Jude,hey jude don't make it bad \r take a sad song...
...,...,...,...
4995,Christina Perri,Backwards,take me backwards \r turn me around \r canno...
4996,John Prine,Everybody Wants To Feel Like You,while out sailing on the ocean \r while out s...
4997,Halloween,Witches' Brew,"by hap palmer \r \r dead leaves, seaweed, r..."
4998,Allman Brothers Band,Dreams,"just one more mornin, i had to wake up with th..."


* Text token'lara dönüştürülür.
* Token'lardan vectorler elde edilir.
* 3 teknikten biri kullanılır: TF-IDF, Bag if Word, Word2Vec

Tokenization işleminde benzer kelimler çok önemli, bunların aynı sayıyla temsiledilmesi gerekir.
Ortak bir dil için stemmer kullanılır.

Vektorizasyon işlemi ile tüm metinsel veri vektörler halinde temsil edilir.

Her vektörün mesafesi hesaplanır ve hesaplanan mesafeye göre şarkı önerisi yapılır.
* Öklid mesafesi kullanılmaz.
* Cosine Similarity kullanılacak.

## Tokenization
Cümleler tokenlere ayrılır

In [68]:
import nltk 
from nltk.stem.porter import PorterStemmer
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\emine\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [69]:
stemmer = PorterStemmer()

In [70]:
def token(txt):
    token = nltk.word_tokenize(txt)
    tokens = [stemmer.stem(w) for w in token]
    return " ".join(tokens)

In [71]:
token('you are beatiful')

'you are beati'

Tüm dataset tokenleştirilir.

In [72]:
musics['text'].apply(lambda x:token(x))

0       somber sister thi is a strang and bitter fruit...
1       oh randi , if they knew . i think they 'd take...
2       whi do i have these thought go through my head...
3       the session is nice , feel real good the set i...
4       hey jude do n't make it bad take a sad song an...
                              ...                        
4995    take me backward turn me around can not find m...
4996    while out sail on the ocean while out sail on ...
4997    by hap palmer dead leav , seawe , rotten egg ,...
4998    just one more mornin , i had to wake up with t...
4999    it cut both way our love is like knife that cu...
Name: text, Length: 5000, dtype: object

## Vectorization 
Her bir tokene karşılık gelen bir sayı belirlenir, ve cümleler vektör haline getirilir.

Tf-Idf tekniği kullanılır.
* TfidVectorizer

Distance hesaplamak için cosine_similarity kullanılır.

In [73]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [74]:
tfid = TfidfVectorizer(analyzer= 'word',stop_words = 'english')

In [75]:
matrix = tfid.fit_transform(musics['text'])

In [76]:
smiler = cosine_similarity(matrix)

In [77]:
smiler[0]

array([1.        , 0.01494966, 0.00973464, ..., 0.        , 0.01250236,
       0.03270661])

Recommender

In [98]:
def recommender(song_name):
    idx =  musics[musics['song'] == song_name].index[0]
    distance = sorted(list(enumerate(smiler[idx])),reverse=True, key=lambda x:x[1])
    song = []
    for s_id in distance[1:20]:
        song.append(musics.iloc[s_id[0]].song)
    return song

In [99]:
songs = recommender("A Part Hate")

In [100]:
songs

['Cool To Hate',
 'I Am',
 'Platypus',
 'Love To Hate',
 'Make Believe',
 "I'll Wear It Proudly",
 'I Hate It When That Happens To Me',
 'Only Love Knows Why',
 'Bell-Bottomed Tear',
 'Spirit Of The Age',
 'No Master Race',
 'Another Night',
 'Soul Creation By Cinder',
 'Jesus Was A Capricorn',
 'American Errorist',
 'Crying Days',
 'Pearl In The Shell',
 'Make The World Move',
 'For You, and Your Denial']

In [101]:
musics

Unnamed: 0,artist,song,text
0,Cyndi Lauper,A Part Hate,somber sister \r this is a strange and bitter...
1,Cat Stevens,Randy,"oh randy, if they knew. \r i think they'd tak..."
2,Pat Benatar,Tell Me,why do i have these thoughts go through my hea...
3,Ziggy Marley,Head Top,"the session is nice, feel real good \r the se..."
4,Grateful Dead,Hey Jude,hey jude don't make it bad \r take a sad song...
...,...,...,...
4995,Christina Perri,Backwards,take me backwards \r turn me around \r canno...
4996,John Prine,Everybody Wants To Feel Like You,while out sailing on the ocean \r while out s...
4997,Halloween,Witches' Brew,"by hap palmer \r \r dead leaves, seaweed, r..."
4998,Allman Brothers Band,Dreams,"just one more mornin, i had to wake up with th..."


In [102]:
songs = musics[musics['song'].isin(songs)][['artist', 'song']]

In [103]:
songs

Unnamed: 0,artist,song
340,Peter Cetera,Only Love Knows Why
664,NOFX,American Errorist
872,Hanson,I Am
1368,Christina Aguilera,Make The World Move
1613,Offspring,Cool To Hate
2284,Alphaville,Spirit Of The Age
2314,Kris Kristofferson,Jesus Was A Capricorn
2391,Beautiful South,Bell-Bottomed Tear
3131,Dusty Springfield,Another Night
3174,Unseen,No Master Race


Json dosyasına kaydedilir.

In [104]:
songs.to_json("songs.json")