# Imports 

In [10]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

from sklearn.model_selection import GridSearchCV

In [11]:
metadata = pd.read_csv('../data/processed_metadata.csv')

# Modelling and Model Evaluations - classifying decade from lyrics

In [12]:
X_train, X_test, y_train, y_test = train_test_split(metadata['processed_lyrics'], metadata['decade'], test_size=0.2, random_state=42, stratify = metadata['decade'])

tfidf = TfidfVectorizer(max_features=10000)
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

decade_model = LogisticRegression(max_iter=1000)
decade_model.fit(X_train_tfidf, y_train)

y_pred = decade_model.predict(X_test_tfidf)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

        1950       0.39      0.14      0.20       294
        1960       0.32      0.34      0.33       682
        1970       0.25      0.23      0.24       790
        1980       0.27      0.28      0.27       935
        1990       0.22      0.18      0.20       892
        2000       0.23      0.19      0.21       956
        2010       0.38      0.54      0.45      1126

    accuracy                           0.29      5675
   macro avg       0.29      0.27      0.27      5675
weighted avg       0.29      0.29      0.28      5675



Gridsearching over hyperparameters to optimise model performance. 

In [9]:
param_grid = {
    'C': [0.1, 1, 10, 100],
    'solver': ['newton-cg', 'lbfgs', 'liblinear']
}

grid_search = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train_tfidf, y_train)

print(f"Best parameters: {grid_search.best_params_}")
best_decade_model = grid_search.best_estimator_

y_pred_best = best_decade_model.predict(X_test_tfidf)
print(classification_report(y_test, y_pred_best))

Best parameters: {'C': 1, 'solver': 'lbfgs'}
              precision    recall  f1-score   support

        1950       0.34      0.14      0.20       282
        1960       0.31      0.34      0.32       646
        1970       0.27      0.25      0.26       804
        1980       0.27      0.29      0.28       952
        1990       0.22      0.16      0.19       917
        2000       0.25      0.22      0.23       946
        2010       0.38      0.53      0.44      1128

    accuracy                           0.30      5675
   macro avg       0.29      0.28      0.27      5675
weighted avg       0.29      0.30      0.29      5675



**Baseline Split of data to compare accuracy score**

In [13]:
metadata['genre'].value_counts(normalize = True)

genre
pop        0.248202
country    0.191915
blues      0.162273
rock       0.142182
jazz       0.135521
reggae     0.088045
hip hop    0.031862
Name: proportion, dtype: float64

**Key Points of Interpretation**
Imbalanced Classes and F1-Score since F1-Score is the harmonic mean of precision and recall, making it a good measure for evaluating models on imbalanced datasets. It balances the trade-off between precision and recall.

**Improvements Post-Tuning:**
There were slight improvements in accuracy and the weighted average F1-Score after hyperparameter tuning, however the macro average F1-Score remains unchanged (at 0.27 for both), indicating that the improvements are not uniform across all classes.

**Evaluating the model's performance By Decade**

**1950:** Both before and after tuning, the F1-Score for this class remains low, indicating difficulty in correctly classifying songs from the 1950s.

**1960 & 1970:** Slight improvements in F1-Scores after tuning, showing better classification performance for these decades.

**1980:** Minor improvement in F1-Score, indicating slight betterment in classification.

**1990:** The F1-Score remains low, suggesting difficulty in classifying this decade accurately.

**2000:** Minor improvement post-tuning, but still a low F1-Score.

**2010:** This decade shows the highest F1-Score, indicating the best classification performance, likely due to more distinctive lyrical features and potentially larger representation in the dataset.

**Overall Observations**
Overall Accuracy: Increased slightly from 0.29 to 0.30 after hyperparameter tuning, which is not a significant improvement.
Macro vs. Weighted Avg: The macro average remains the same, but the weighted average F1-Score shows a slight improvement. This suggests that while the model's performance improved for the most frequent classes, the less frequent classes still pose a challenge.
Class Imbalance: The dataset's class imbalance significantly impacts the model's performance. Classes with more data points (like 2010) perform better compared to those with fewer data points (like 1950).

**Next Steps for improvement**
Further Feature Engineering: Additional features could be engineered to capture more distinctive attributes of each decade.
Advanced Models: Experimenting with more sophisticated models such as Gradient Boosting, Neural Networks, or ensemble methods.
Resampling Techniques: Techniques such as SMOTE (Synthetic Minority Over-sampling Technique) could be applied to balance the dataset and improve performance on underrepresented classes.