# Music Genre Classification using multiple classifiers
Team Members: Lisa Korntheuer, Jan Birkert, Adrian Desiderato, Jan Wangerin, Spyridon Spyropoulos

## Imports

In [41]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, mean_squared_error
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler, LabelEncoder, StandardScaler
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier

# 0. Data understanding
Data describe (Features, Target etc.)
- filename and length irrelevant for ML
- 57 features -> PCA?
- only numerical data except for class labels ("label")

In [None]:
df = pd.read_csv('./data/features_30_sec.csv')
df.info()

In [None]:
df.head()

Correlations between features:

In [None]:
cor = df.iloc[:, 2:-2].corr()
fig, ax = plt.subplots(figsize=(12,12))
ax = sns.heatmap(cor, square = True, xticklabels=True, yticklabels=True) 
plt.show()

Since there are quite a few feature combinations with high correlations, PCA may be worth a try. (See Data Prep)

## 1. Data preparation
Jan W.

Data splitting

y = LabelEncoder() 

MinMax()
Das andere() 


In [45]:
LabelEnc = LabelEncoder()
y = df['label']
y = pd.DataFrame(LabelEnc.fit_transform(y))
df['label_enc'] = y

In [None]:
scaler_mms = MinMaxScaler()
scaler_ss = StandardScaler()
X = df.loc[:, 'chroma_stft_mean' : 'mfcc20_var']
X_scaled_array_mms = scaler_mms.fit_transform(X)
X_scaled_array_ss = scaler_ss.fit_transform(X)
X_scaled_mms = pd.DataFrame(X_scaled_array_mms, columns=X.columns)
X_scaled_ss = pd.DataFrame(X_scaled_array_ss, columns=X.columns)
print(X)
print(X_scaled_mms)
print(X_scaled_ss)

PCA: (copied from Material Notebook 04, probably has to be adjusted later on)

In [47]:
pca = PCA() # typically you add here as a parameter the nbr. of cmponents: i.e.: n_components=2
            # we leave it blank to get all!
pcs = pca.fit_transform(X_scaled_ss) # principle components

Eigenvalues:

In [None]:
print(pca.explained_variance_)
print(pca.explained_variance_ratio_)

Principal Components (Dot Product of Data and Eigenvectors):

In [None]:
print(pcs[:5])
print()
print(len(pcs))

Scree Plot with Kaiser Criteria

In [None]:
import matplotlib.ticker as ticker
fig = plt.figure()
ax = plt.axes()

pc_values = np.arange(pca.n_components_) + 1
ax.plot(pc_values, pca.explained_variance_, 'o-', linewidth=2, color='blue')
ax.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))

plt.title('Scree Plot')
plt.xlabel('Principal Component')
plt.ylabel('Eigenvalue')
plt.axhline(y=1, linewidth=1, color='r')
plt.show()

Potentially, a lot of dimensions could be removed according to the Kaiser criteria. The following enumeration shows how much "information" is contained in how many of the principal components:

In [None]:
for i in [10, 15, 30, 45]:
    print(np.sum(pca.explained_variance_ratio_[:i]))

To fight the curse of dimensionality, some dimensions could be removed, for example the last 12 to even 27 dimensions, since about 94% of "information" is contained in the first 30 PCs.

In [52]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0, stratify=y)
X_train_mms, X_test_mms, y_train_mms, y_test_mms = train_test_split(X_scaled_mms, y, test_size=0.2, random_state=0, stratify=y)
X_train_ss, X_test_ss, y_train_ss, y_test_ss = train_test_split(X_scaled_ss, y, test_size=0.2, random_state=0, stratify=y)

## 2. Model training 

Each Modell is trained and the quality of the classifier(accuracy) is displayed. 

### 2.1 Random Forests
Spyridon 

In this section Random Forest as a classifier will be tested. In the first step all important libraries will be imported.

In [173]:
from sklearn.ensemble import RandomForestClassifier
import sklearn.tree as tree

#### 2.1.1 Simple Hyperparameter tuning 
When training random forests, there is no heavy hyperparameter needed to get a good accuracy. The structure of the model is already decreasing Bias and Variance by injecting randomness on constructing the trees. By Random Feature selection and bagging the Risk of Overfitting is minimized, and by tuning the hyperparameters, the Underfitting risk is also minimized. So it is enough only to tune the numbers of trees in the ensemlbe "n_estimators" and the spliting criterion. All other hyperparameters will use the default values. 

In [None]:

rf = RandomForestClassifier(random_state=0, n_jobs=-1)
param_grid = {'n_estimators': np.array([ 100, 250, 500, 1000, 2000, 7000]), 
              'criterion':['gini','entropy', 'log_loss'],
              }
grid_search_rf_simple = GridSearchCV(rf, param_grid, n_jobs=-1, cv=2, scoring='accuracy', verbose=1, refit=True)
grid_search_rf_simple.fit(X_train, y_train.values.ravel())
y_pred_rf_simple = grid_search_rf_simple.predict(X_test)


The GridSearch found out the best model, the results: 

In [None]:
grid_search_rf_simple.score(X_test, y_test)
print("Best Score: %f" % grid_search_rf_simple.best_score_)
print("Optimal Hyperparameter Values: ", grid_search_rf_simple.best_params_)
print("Optimal Model: ", grid_search_rf_simple.best_estimator_)
print(f"Accuracy: {accuracy_score(y_test, y_pred_rf_simple)}")

So in ~20 seconds (on my machine), Gridsearch found a model with 76% accuracy. That's a really good result! 

#### 2.1.2 Heavy Hyperparameter tuning

But lets also try some heavy hyperparameter tuning to see what results can be achieved: (This takes some time....)

In [None]:

rf_heavy = RandomForestClassifier(random_state=0, n_jobs=-1)
param_grid_rf_heavy = {'n_estimators': np.array([ 100, 250, 500, 1000, 2000, 7000]), 
              'criterion':['gini','entropy', 'log_loss'],
              'max_depth': np.array([3,5, 7,10, None]),
                'min_samples_split': np.array([2, 5, 10]),
                'min_samples_leaf': np.array([1, 2, 4]),
                'max_features': np.array(['sqrt', 'log2'])
              }
grid_search_rf_heavy = GridSearchCV(rf_heavy, param_grid_rf_heavy, n_jobs=-1, cv=2, scoring='accuracy', verbose=1, refit=True)
grid_search_rf_heavy.fit(X_train, y_train.values.ravel())
y_pred_heavy = grid_search_rf_heavy.predict(X_test)


In [None]:
grid_search_rf_heavy.score(X_test, y_test)
print("Best Score: %f" % grid_search_rf_heavy.best_score_)
print("Optimal Hyperparameter Values: ", grid_search_rf_heavy.best_params_)
print("Optimal Model: ", grid_search_rf_heavy.best_estimator_)
print(f"Accuracy: {accuracy_score(y_test, y_pred_heavy)}")

#### 2.1.3 Best Random Forest

We see that with heavy hyperparameter tuning, that needed more than 15 minutes, the resulting forest is not really giving much more performance. There might be a better slightly better score of the found model, but the resulting accuracy of the model is worse. So our best Random Forest model is the following:

In [None]:
rf_simple_best = grid_search_rf_simple.best_estimator_
rf_simple_best 

Let's analyze the resulting model.

We start by looking into the feature importance of the model: 

In [None]:
rf_simple_best = grid_search_rf_simple.best_estimator_

importances = rf_simple_best.feature_importances_
indices = np.argsort(importances)[::-1]
feature_names = X.columns


top_n = 15
top_indices = indices[:top_n]
top_importances = importances[top_indices]

top_feature_names = [feature_names[idx] for idx in top_indices]
print("Feature ranking with names:")


plt.figure()
plt.title(f"Top {top_n} Feature Importances")
plt.bar(range(top_n), top_importances, align="center")
plt.xticks(range(top_n), top_feature_names, rotation=45, ha='right')
plt.xlabel('Feature Index')
plt.ylabel('Feature Importance')
plt.xlim([-1, top_n])
plt.show()


### 2.2 Decision trees

Jan W.

First try using post-pruning and the entire dataset. Post-pruning is done using hyperparameter-tuning with GridsearchCV.

In [None]:
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn import tree

clf = DecisionTreeClassifier(random_state=0) #maybe use variable for random state so that all classifiers can be adjusted at the same time
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities

fig, ax = plt.subplots()
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")

plt.show()

In [None]:
parameters = {'ccp_alpha':ccp_alphas[:-1].tolist()}
gs = GridSearchCV(DecisionTreeClassifier(random_state=0), parameters, cv=10, refit=True)
gs.fit(X_train,y_train)
tree_best = gs.best_estimator_
pred = tree_best.predict(X_test)
print('Accuracy', accuracy_score(y_test, pred))

In [None]:
rules = export_text(tree_best, feature_names=X.columns)
print(rules)
print()
print("Feature importance:\n")
feature_importance = {}
i = 0
for col in X.columns:
    feature_importance[col] = tree_best.feature_importances_[i]
    i += 1
features_sorted = sorted(feature_importance.items(), key=lambda x : x[1])
features_sorted.reverse()
for feature in features_sorted:
    print(feature)

In [None]:
fig = plt.figure(figsize=(10,10))
text = tree.plot_tree(tree_best, 
                   feature_names=X.columns.to_list(), 
                   filled=True)

plt.show()

Maybe try pre pruning with lower maximum height of tree, although that probably won't lead to better results. 

In [None]:
cls = DecisionTreeClassifier(random_state=0)

params = {'max_depth':np.arange(3,15),
#          'min_samples_leaf':[3,5,10,15,20],
#          'min_samples_split':[8,10,12,18,20,16],
          'criterion':['gini','entropy']}
gs = GridSearchCV(cls, params, scoring='accuracy', cv=10, verbose=3, n_jobs=-1)
gs.fit(X_train, y_train)
params_optimal = gs.best_params_

print("Best Score: %f" % gs.best_score_)
print("Optimal Hyperparameter Values: ", params_optimal)

In [None]:
tree_best = DecisionTreeClassifier(random_state=0, criterion='entropy', max_depth=11) #, min_samples_leaf=20, min_samples_split=8)
tree_best.fit(X_train, y_train)
pred = tree_best.predict(X_test)

print('Test accuracy',accuracy_score(y_test, pred))

In [None]:
fig = plt.figure(figsize=(10,10))
text = tree.plot_tree(tree_best, 
                   feature_names=X.columns.to_list(), 
                   filled=True)

plt.show()

Also try reduction of dimensions with PCA (only first 30 or so dimensions?)

### 2.3 KNN

Now the music genres are classified with the **k-Nearest Neighbors** algorithm. To enhance model performance, it can be useful to tune the following three hyperparameters via cross validation:
* ***n_neighbors***  :  number of neighbors $k$
* ***weights***  :  weights assigned to the nearest neighbors, especially relevant in case of ties
  - 'uniform'  :  all neighbors have equal weights
  - 'distance'  :  neighbors closer to the target point have higher weights
* ***metric***  :  method for distance computation
  - 'euclidean'  :  Euclidean distance
  - 'manhatten'  :  Manhatten distance

The first step is to define the possible values for each of these parameters in a dictionary.

In [62]:
# Define parameter combinations for hyperparameter tuning via cross validation 
params = {'n_neighbors': np.arange(1,20),               # parameter 'k' 
              'weights': ['uniform', 'distance'],       # parameter 'weights'
              'metric' : ['euclidean','manhattan']}     # parameter 'metric'

Then hyperparameter tuning is performed with the help of *GridSearchCV*, using 10-fold cross validation and accuracy as evaluation measure. The model is trained on the training data which have been normalized with the *MinMaxScaler*.

In [None]:

# Create KNN classifier
knn = KNeighborsClassifier()
# Use GridSearchCV to tune the chosen parameters
gs = GridSearchCV(knn, params, scoring='accuracy', cv=10, verbose=3, n_jobs=-1, refit=True)
# Train
gs.fit(X_train_mms, y_train_mms.values.ravel())    # Use training data scaled with MinMaxScaler

As can be seen from the optimal parameter set, choosing $k=3$ nearest neighbors, distance-dependent weights and Manhattan distance turns out to be the best combination in this experiment. Yet, it must be noted that distance-related weights are also computed if there are no ties, which might lead to overfitting.

In [None]:
params_optimal = gs.best_params_

print("Best score: %f" % gs.best_score_)
print("Optimal hyperparameters: ", params_optimal)

Finally, this optimal classifier is taken to predict the music genres in the corresponding test set.

In [73]:
# Choose optimal classifier to predict
knn_optimal = gs.best_estimator_
y_pred_optimal = knn_optimal.predict(X_test_mms)

The evaluation shows that the tuned kNN model performs with an accuracy of 74% on these training data. 

In [None]:
# Accuracy for tuned KNN
accuracy = accuracy_score(y_test_mms, y_pred_optimal)
print('Accuracy:', accuracy)  

### 2.4 Neural Networks

## 3. Comparing Models 

In this section the resulting best models will be compared. Let's import needed libraries: 

In [167]:
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score

As a first step, we create an array with the best models: 

In [168]:
models = [knn_optimal, tree_best, rf_simple_best]

### 3.1 Accuracy

In [None]:
for model in models: 
    model.fit(X_train_mms, y_train_mms)
    print(model)
    y_pred = model.predict(X_test_mms)
    accuracy = accuracy_score(y_test_mms, y_pred)
    print(f"Accuracy: {accuracy}")


### 3.2 Precision 

In [None]:
# Precision of each model 
for model in models: 
    y_pred = model.predict(X_test_mms)
    print(model)
    precision = precision_score(y_test_mms, y_pred, average='weighted')
    print(f"Precision: {precision}")

### 3.3 Recall 

In [None]:
# Recall of each model

for model in models: 
    y_pred = model.predict(X_test_mms)
    print(model)
    recall = recall_score(y_test_mms, y_pred, average='weighted')
    print(f"Recall: {recall}")

### 3.4 ROC, AUC Curve

In [None]:



# Binarize the output
y_test_bin = label_binarize(y_test_mms, classes=np.unique(y))
y_train_bin = label_binarize(y_train_mms, classes=np.unique(y))
n_classes = y_test_bin.shape[1]

plt.figure(figsize=(10, 10))
colors = ['red', 'blue', 'green']
linestyles = ['-', '--', '-.']
classifiers = models
labels = ['KNN', 'Decision Trees', 'Random Forest']

for clf, label, clr, ls in zip(classifiers, labels, colors, linestyles):
    classifier = OneVsRestClassifier(clf)
    y_score = classifier.fit(X_train_mms, y_train_bin).predict_proba(X_test_mms)    
    # Compute micro-average ROC curve and AUC
    fpr, tpr, _ = roc_curve(y_test_bin.ravel(), y_score.ravel())
    roc_auc = auc(fpr, tpr)
    
    # Plot the micro-average ROC curve
    plt.plot(fpr, tpr, color=clr, linestyle=ls, label='%s (AUC = %0.2f)' % (label, roc_auc))

# Add a diagonal line for reference
plt.plot([0, 1], [0, 1], linestyle='--', color='gray', linewidth=2)

plt.legend(loc='lower right')
plt.xlim([-0.1, 1.1])
plt.ylim([-0.1, 1.1])
plt.grid()
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')

plt.tight_layout()
# plt.savefig('./figures/roc.png', dpi=300)
plt.show()

### 3.4 Comparison of models

In [None]:
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# CELL INDEX: 73

# Create a table comparing our three main models: KNN, Decision Tree, and Random Forest

model_entries = []
model_list = [
    ("KNN", knn_optimal),
    ("Decision Tree", tree_best),
    ("Random Forest", rf_simple_best)
]

for model_name, model_obj in model_list:
    # Retrain each model on the (MinMax scaled) training set
    model_obj.fit(X_train_mms, y_train_mms)
    
    # Predict on the (MinMax scaled) test set
    y_pred = model_obj.predict(X_test_mms)
    
    # Calculate standard classification metrics
    acc = accuracy_score(y_test_mms, y_pred)
    prec = precision_score(y_test_mms, y_pred, average='weighted')
    rec = recall_score(y_test_mms, y_pred, average='weighted')
    f1 = f1_score(y_test_mms, y_pred, average='weighted')
    
    # For ROC AUC in multi-class, do a OneVsRest scheme (micro-average)
    classifier_ovr = OneVsRestClassifier(model_obj)
    y_score = classifier_ovr.fit(X_train_mms, y_train_bin).predict_proba(X_test_mms)
    fpr, tpr, _ = roc_curve(y_test_bin.ravel(), y_score.ravel())
    roc_auc_val = auc(fpr, tpr)
    
    # Collect metrics into a dictionary
    model_entries.append({
        "Model": model_name,
        "Accuracy": acc,
        "Precision (weighted)": prec,
        "Recall (weighted)": rec,
        "F1 Score (weighted)": f1,
        "ROC AUC (micro)": roc_auc_val
    })

# Create and display the comparison table
comparison_df = pd.DataFrame(model_entries)
comparison_df.style.format(precision=3)

| **Algorithm**       | **Description**                                                                 | **Advantages**                                                                                     | **Disadvantages**                                                                                  | **Use Cases**                                                                                   |
|---------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| **Random Forests**  | An ensemble method that combines multiple decision trees to improve performance | - High accuracy<br>- Robust to overfitting<br>- Handles large datasets well<br>- Feature importance | - Computationally intensive<br>- Less interpretable than single decision trees                    | - Classification and regression<br>- Feature selection<br>- Anomaly detection                   |
| **K-Nearest Neighbors (KNN)** | A simple, instance-based learning algorithm that classifies based on the majority vote of nearest neighbors | - Simple to implement<br>- No training phase<br>- Works well with small datasets                   | - Computationally expensive for large datasets<br>- Sensitive to irrelevant features and noise    | - Classification and regression<br>- Recommender systems<br>- Image and pattern recognition     |
| **Artificial Neural Networks (ANN)** | A computational model inspired by the human brain, consisting of interconnected nodes (neurons) | - High accuracy for complex problems<br>- Capable of learning non-linear relationships<br>- Versatile | - Requires large amounts of data<br>- Computationally intensive<br>- Difficult to interpret       | - Image and speech recognition<br>- Natural language processing<br>- Time series forecasting    |
| **Decision Trees**  | A tree-like model of decisions and their possible consequences                  | - Easy to interpret and visualize<br>- Handles both numerical and categorical data<br>- Non-parametric | - Prone to overfitting<br>- Can be unstable with small variations in data                         | - Classification and regression<br>- Feature selection<br>- Decision analysis and support       |

## 4. OPTIONAL: Song import and classify

## 5. References 