# Pencarian Lagu melalui Lirik dengan Pendekatan TF-IDF dan Cosine Similarity


Nama : Ahmad Fajar Kusumajati

NIM : A11.2020.12995

Kelompok : A11.4708


Import library yang dibutuhkan


In [1]:
import warnings
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import cross_val_score, train_test_split

warnings.filterwarnings("ignore")

## 1. Dataset


Dataset yang digunakan untuk eksperimen ini didapatkan dari kaggle dengan judul [Song lyrics from 79 musical genres](https://www.kaggle.com/datasets/neisse/scrapped-lyrics-from-6-genres). Terdapat 2 dataset yang digunakan, yaitu dataset yang berisi informasi lirik lagu dan dataset yang berisi informasi artists. Dataset yang berisi informasi lirik lagu memiliki 5 kolom yaitu ALink, SName, SLink, Lyric, dan language dengan 379931 baris data. Sedangkan dataset yang berisi informasi artists memiliki 5 kolom yaitu Artist, Genre, Songs, Popularity, Link dengan 4168 baris data. Kedua dataset tersebut kemudian digabungkan menjadi satu dataset dengan 4 kolom yaitu ALink, Artist, SName, dan Lyric yang diganti nama kolom tersebut menjadi artist_id, artist, title, dan lyrics.


Melakukan import dataset lirik lagu


In [2]:
df_lyrics = pd.read_csv('data/lyrics-data.csv')
df_lyrics = df_lyrics[df_lyrics['language'] == 'en']
df_lyrics.rename(columns={'SName': 'title',
                          'Lyric': 'lyrics', 'ALink': 'artist_id'}, inplace=True)
df_lyrics.drop(columns=['SLink', 'language'], inplace=True)
df_lyrics.dropna(inplace=True)
df_lyrics.head()

Unnamed: 0,artist_id,title,lyrics
69,/ivete-sangalo/,Careless Whisper,I feel so unsure\nAs I take your hand and lead...
86,/ivete-sangalo/,Could You Be Loved / Citação Musical do Rap: S...,"Don't let them fool, ya\nOr even try to school..."
88,/ivete-sangalo/,Cruisin' (Part. Saulo),"Baby, let's cruise, away from here\nDon't be c..."
111,/ivete-sangalo/,Easy,"Know it sounds funny\nBut, I just can't stand ..."
140,/ivete-sangalo/,For Your Babies (The Voice cover),You've got that look again\nThe one I hoped I ...


Melakukan import dataset artists


In [3]:
df_artists = pd.read_csv('data/artists-data.csv')
df_artists.drop(columns=['Genres', 'Songs', 'Popularity'], inplace=True)
df_artists.rename(columns={'Artist': 'artist',
                  'Link': 'artist_id'}, inplace=True)
df_artists.columns = df_artists.columns.str.lower()
df_artists.dropna(inplace=True)
df_artists = df_artists[df_artists['artist_id'].isin(
    df_lyrics['artist_id'].unique())]
df_artists.head()

Unnamed: 0,artist,artist_id
0,Ivete Sangalo,/ivete-sangalo/
4,Claudia Leitte,/claudia-leitte/
7,Daniela Mercury,/daniela-mercury/
8,Olodum,/olodum/
13,Carlinhos Brown,/carlinhos-brown/


Menggabungkan dataset lirik lagu dan artists


In [4]:
df = pd.merge(df_lyrics, df_artists, on='artist_id')
df = df[['artist_id', 'artist', 'title', 'lyrics']]
df.head()

Unnamed: 0,artist_id,artist,title,lyrics
0,/ivete-sangalo/,Ivete Sangalo,Careless Whisper,I feel so unsure\nAs I take your hand and lead...
1,/ivete-sangalo/,Ivete Sangalo,Could You Be Loved / Citação Musical do Rap: S...,"Don't let them fool, ya\nOr even try to school..."
2,/ivete-sangalo/,Ivete Sangalo,Cruisin' (Part. Saulo),"Baby, let's cruise, away from here\nDon't be c..."
3,/ivete-sangalo/,Ivete Sangalo,Easy,"Know it sounds funny\nBut, I just can't stand ..."
4,/ivete-sangalo/,Ivete Sangalo,For Your Babies (The Voice cover),You've got that look again\nThe one I hoped I ...


Melakukan cek missing value


In [5]:
df.isnull().sum()

artist_id    0
artist       0
title        0
lyrics       0
dtype: int64

## 2. Permasalahan atau Tujuan Eksperimen


Tujuan dari eksperimen ini adalah untuk mencari tau bagaimana cara mencari lagu berdasarkan lirik lagu yang dimasukkan oleh user. Dengan menggunakan pendekatan TF-IDF dan Cosine Similarity, diharapkan dapat menemukan lagu yang sesuai dengan lirik yang dimasukkan oleh user.


## 3. Tahapan Eksperimen


Dari dataset yang telah digabungkan, dilakukan preprocessing terhadap kolom lyrics. Preprocessing yang dilakukan adalah menghapus tanda baca, menghapus angka, menghapus kata-kata yang tidak penting, menghapus kata-kata yang terdapat dalam stopwords, dan mengubah semua kata menjadi huruf kecil. Setelah itu, dilakukan proses tokenisasi dan stemming. Setelah proses preprocessing selesai, dilakukan proses pembobotan dengan menggunakan TF-IDF. Setelah proses pembobotan selesai, dilakukan proses pencarian lagu berdasarkan lirik yang dimasukkan oleh user dengan menggunakan pendekatan Cosine Similarity. Lagu yang ditemukan akan ditampilkan kepada user.


Menggunakan TF-IDF dan Cosine Similarity untuk mencari kemiripan lirik lagu


In [6]:
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(df['lyrics'])

In [7]:
def recommender(input_text, top_n=10):
    input_tfidf_vector = tfidf_vectorizer.transform([input_text])
    similarities = cosine_similarity(input_tfidf_vector, tfidf_matrix)
    similar_songs_indices = similarities.argsort()[0][-top_n:][::-1]
    recommended_songs = df.iloc[similar_songs_indices][['artist', 'title']]
    recommended_songs['similarity_score'] = similarities[0,
                                                         similar_songs_indices]
    return recommended_songs

## 4. Pengujian


Pengujian dilakukan dengan cara manual dengan memasukkan lirik lagu. Lirik lagu yang dimasukkan adalah lirik lagu yang terdapat pada dataset. Dari hasil pengujian, lagu yang dimasukkan akan ditemukan oleh program dan ditampilkan kepada user.


### 1. Contoh pencarian lagu berjudul "Shadow Moses" oleh Bring Me The Horizon dengan memasukan lirik "This is sempiternal"


In [8]:
input_text = "This is sempiternal"
top_n = 10
recommended_songs = recommender(input_text, top_n=top_n)

print(f"\nRecommended Songs for Input: '{input_text}'")
print(recommended_songs)


Recommended Songs for Input: 'This is sempiternal'
                     artist                          title  similarity_score
71036  Bring Me The Horizon                   Shadow Moses          0.501670
92092                Samael                           Ave!          0.242627
52640             Eluveitie                       Sucellos          0.153403
52623             Eluveitie                      Inception          0.150330
63791               Bauhaus        Spirit (single version)          0.000000
63799               Bauhaus             The Spy In The Cab          0.000000
63798               Bauhaus                  The Passenger          0.000000
63797               Bauhaus    The Man With the X-Ray Eyes          0.000000
63796               Bauhaus  The Lady in the Radiator Song          0.000000
63795               Bauhaus     Terror Couple Kill Colonel          0.000000


In [9]:
sempiternal = df[(df['title'] == 'Shadow Moses')
                 & (df['artist'] == 'Bring Me The Horizon')]
sempiternal

Unnamed: 0,artist_id,artist,title,lyrics
71036,/bring-me-the-horizon/,Bring Me The Horizon,Shadow Moses,Can you tell from the look in our eyes?\nWe're...


Disimpulkan bahwa lagu yang berjudul "Shadow Moses" oleh Bring Me The Horizon dapat ditemukan dengan memasukkan lirik "This is sempiternal" karena memiliki kemiripan lirik sebesar 0.5.


### 2. Contoh pencarian lagu berjudul "Blue (da Ba Dee)" oleh Blue dengan memasukan lirik "I'm blue da ba dee da ba daa"


In [10]:
# Usage of the recommender function
input_text = "I'm blue da ba dee da ba daa"
top_n = 10
recommended_songs = recommender(input_text, top_n=top_n)

# Display the recommended songs with similarity scores
print(f"\nRecommended Songs for Input: '{input_text}'")
print(recommended_songs)


Recommended Songs for Input: 'I'm blue da ba dee da ba daa'
                  artist                                    title  \
138118              Blue                         Blue (da Ba Dee)   
114507            Beulah               A Good Man Is Easy To Kill   
97429   Gym Class Heroes  Cupid's Chokehold (feat. Patrick Stump)   
5283             Cherish                        Cupid's Chokehold   
142924      All Time Low                                 Birthday   
119267          LazyTown                    Cleaning All Together   
186920           America                     People In The Valley   
118851        Gummy Bear                         Blue (Da Ba Dee)   
141172    Geri Halliwell                Perhaps, Perhaps, Perhaps   
31901            Madonna                  To Have And Not To Hold   

        similarity_score  
138118          0.851909  
114507          0.785105  
97429           0.776989  
5283            0.776896  
142924          0.755564  
119267          0

In [11]:
blue = df[(df['title'] == 'Blue (da Ba Dee)') & (df['artist'] == 'Blue')]
blue

Unnamed: 0,artist_id,artist,title,lyrics
138118,/blue/,Blue,Blue (da Ba Dee),Yo listen up here's a story\nAbout a little gu...


Disimpulkan bahwa lagu yang berjudul "Blue (da Ba Dee)" oleh Blue dapat ditemukan dengan memasukkan lirik "I'm blue da ba dee da ba daa" karena memiliki kemiripan lirik sebesar 0.85.


### 3. Contoh pencarian lagu berjudul "Hello" oleh Adele dengan memasukan setidaknya 1 paragraf lirik lagu


In [12]:
# Usage of the recommender function
input_text = "Hello from the otherside I must've called a thousand times To tell you I'm sorry for everything that I've done But when I call, you never seem to be home"
top_n = 10
recommended_songs = recommender(input_text, top_n=top_n)

# Display the recommended songs with similarity scores
print(f"\nRecommended Songs for Input: '{input_text}'")
print(recommended_songs)


Recommended Songs for Input: 'Hello from the otherside I must've called a thousand times To tell you I'm sorry for everything that I've done But when I call, you never seem to be home'
                artist                        title  similarity_score
127932           Adele  Hello (Bradley Allan Remix)          0.566918
127895           Adele                        Hello          0.511660
146632    Boyce Avenue                        Hello          0.508983
139921   Conor Maynard           Hello (Feat. Anth)          0.460379
127933           Adele      Hello (Dave Audé Remix)          0.447459
58245        Third Day                    Otherside          0.376696
113743  Perfume Genius                    Otherside          0.376658
133216            Glee                  Hello Again          0.375084
142204        Paramore                  Hello Hello          0.360677
111870  Tegan And Sara        Hello, I'm Right Here          0.355922


In [13]:
# generate code to find song titled 'Hello' and artist 'Adele'
hello = df[(df['title'] == 'Hello') & (df['artist'] == 'Adele')]
hello

Unnamed: 0,artist_id,artist,title,lyrics
127895,/adele/,Adele,Hello,Hello\nIt's me\nI was wondering if after all t...


Disimpulkan bahwa lagu yang berjudul "Hello" oleh Adele dapat ditemukan dengan memasukkan setidaknya 1 paragraf lirik lagu.


### 4. Contoh pencarian lagu berjudul "Drown" oleh Bring Me The Horizon dengan memasukan setidaknya 1 paragraf lirik lagu


In [14]:
# Usage of the recommender function
input_text = "Who will fix me now? Dive in when I'm down? Save me from myself Don't let me drown Who will make me fight? Drag me out alive? Save me from myself Don't let me drown"
top_n = 10
recommended_songs = recommender(input_text, top_n=top_n)

# Display the recommended songs with similarity scores
print(f"\nRecommended Songs for Input: '{input_text}'")
print(recommended_songs)


Recommended Songs for Input: 'Who will fix me now? Dive in when I'm down? Save me from myself Don't let me drown Who will make me fight? Drag me out alive? Save me from myself Don't let me drown'
                      artist  \
147440   5 Seconds Of Summer   
70957   Bring Me The Horizon   
129036         Avril Lavigne   
95833                Ja Rule   
71591            Limp Bizkit   
3192             Chris Brown   
66198            Soundgarden   
129035         Avril Lavigne   
90934                 Dokken   
70005                   Kiss   

                                                    title  similarity_score  
147440  Drown (Bring Me the Horizon cover at the Live ...          0.675575  
70957                                               Drown          0.659537  
129036                Head Above Water (Ft. Travis Clark)          0.574428  
95833                                               Drown          0.546025  
71591                                               Drown   

In [15]:
drown = df[(df['title'] == 'Drown') & (df['artist'] == 'Bring Me The Horizon')]
drown

Unnamed: 0,artist_id,artist,title,lyrics
70957,/bring-me-the-horizon/,Bring Me The Horizon,Drown,What doesn't kill you makes you wish you were ...


Disimpulkan bahwa lagu yang berjudul "Drown" oleh Bring Me The Horizon dapat ditemukan dengan memasukkan setidaknya 1 paragraf lirik lagu.


### 5. Contoh pencarian lagu berjudul "Mr. Blue Sky" oleh Electric Light Orchestra dengan memasukan setidaknya 1 paragraf lirik lagu


In [16]:
# Usage of the recommender function
input_text = "Sun is shinin' in the sky There ain't a cloud in sight It's stopped rainin', everybody's in the play And don't you know It's a beautiful new day? Hey"
top_n = 10
recommended_songs = recommender(input_text, top_n=top_n)

# Display the recommended songs with similarity scores
print(f"\nRecommended Songs for Input: '{input_text}'")
print(recommended_songs)


Recommended Songs for Input: 'Sun is shinin' in the sky There ain't a cloud in sight It's stopped rainin', everybody's in the play And don't you know It's a beautiful new day? Hey'
                     artist                              title  \
159910             Al Green                Rainin' In My Heart   
23198      Carrie Underwood  Tears Of Gold (with David Bisbal)   
159685          Ray Charles             Rainy Night in Georgia   
23578         George Strait          Beautiful Day For Goodbye   
24091          Brad Paisley                        Rainin' You   
177849  Grand Funk Railroad                         Shinin' On   
48234            Neil Young                Rainin' In Paradise   
145009                   U2                      One Tree Hill   
49688          Van Morrison                  Really Don't Know   
11854        Rolling Stones                Get Off Of My Cloud   

        similarity_score  
159910          0.433681  
23198           0.415143  
159685    

In [17]:
mr_blue_sky = df[(df['title'] == 'Mr. Blue Sky')
                 & (df['artist'] == 'Electric Light Orchestra')]
mr_blue_sky

Unnamed: 0,artist_id,artist,title,lyrics
148424,/electric-light-orchestra/,Electric Light Orchestra,Mr. Blue Sky,"""warning! todays forecast calls for blue skys""..."


Disimpulkan bahwa lagu yang berjudul "Mr. Blue Sky" oleh Electric Light Orchestra tidak dapat ditemukan dengan memasukkan setidaknya 1 paragraf lirik lagu.


## 5. Deployment
