Fath for Imports

In [None]:
import sys
import os

sys.path.append(os.path.abspath(".."))


Import Required Libraries

In [11]:
import pandas as pd
import numpy as np
import tensorflow as tf

from src.preprocess import clean_text
from src.tf_model import load_data, build_model
from src.similarity_model import build_similarity_model, predict_song_similarity


Load Dataset (Preview)

In [12]:
DATA_PATH = "../data/spotify_lyrics.csv"

df = pd.read_csv(DATA_PATH)
df.head()


Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


TensorFlow Model – Data Preparation

In [13]:
MAX_WORDS = 10000
MAX_LEN = 100

X, y, tokenizer, df_tf = load_data(
    DATA_PATH,
    max_words=MAX_WORDS,
    max_len=MAX_LEN
)

print("Input shape:", X.shape)
print("Number of classes:", len(df_tf))


Input shape: (2000, 100)
Number of classes: 2000


Build TensorFlow Model

In [14]:
model = build_model(
    vocab_size=MAX_WORDS,
    max_len=MAX_LEN,
    num_classes=len(df_tf)
)

model.summary()


Train TensorFlow Model (Demo Training)

In [15]:
model.fit(
    X,
    y,
    epochs=3,
    batch_size=32,
    validation_split=0.2
)


Epoch 1/3
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step - accuracy: 0.0000e+00 - loss: 7.6094 - val_accuracy: 0.0000e+00 - val_loss: 7.6238
Epoch 2/3
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.0113 - loss: 7.5752 - val_accuracy: 0.0000e+00 - val_loss: 7.8128
Epoch 3/3
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.0019 - loss: 7.3391 - val_accuracy: 0.0000e+00 - val_loss: 8.7761


<keras.src.callbacks.history.History at 0x28191083f90>

TensorFlow Prediction Function

In [16]:
def predict_song_tf(text):
    text = clean_text(text)
    seq = tokenizer.texts_to_sequences([text])
    padded = tf.keras.preprocessing.sequence.pad_sequences(seq, maxlen=MAX_LEN)
    
    pred = model.predict(padded)
    idx = pred.argmax()
    
    return df_tf.iloc[idx]["song"], df_tf.iloc[idx]["artist"]


Test TensorFlow Prediction

In [17]:
print(predict_song_tf("hello from the other side"))


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 73ms/step
('Marriage And Divorce', 'Hank Snow')


Build Similarity Model (TF-IDF)

In [18]:
df_sim, vectorizer, tfidf = build_similarity_model(DATA_PATH)


Test Similarity-Based Prediction

In [19]:
predict_song_similarity(
    "Look at her face, it's a wonderful face",
    df_sim,
    vectorizer,
    tfidf
)


('Face To Face', 'Foreigner', np.float64(0.656728393498878))

Comparison Explanation

### Model Comparison

- The TensorFlow model treats song identification as a multi-class classification problem.
- Due to thousands of unique song labels and short lyric snippets, accuracy is limited.
- The TF-IDF cosine similarity model performs better for short text queries by retrieving the most similar lyrics.

This comparison demonstrates why similarity-based approaches are preferred for lyric search tasks.
