### Kaggle

Group 1

Jenewein Matthias - Jenewein Matthias

Kalarickal Dominic - Kalarickal Dominic

Leander Leirissa - Bitterzoet

Timmer Lars - laltir

# 0. Loading packages

If not all libraries are installed, uncomment the cell below

In [1]:
#%pip install -r requirements.txt

In [2]:
import warnings 

warnings.filterwarnings('ignore')

import functions as f

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA, NMF
from sklearn.preprocessing import normalize, MinMaxScaler
from scipy.cluster.hierarchy import dendrogram, linkage
import os
from sklearn.neighbors import KNeighborsClassifier

pd.set_option('display.max_columns', None)

# 1. Loading labeled data labels

In [3]:
labels = pd.read_csv('Datasets/labels_new.csv')

In [None]:
labels.head()

# 2. Feature Engineering

In [None]:
labeled_files = os.listdir('Datasets/labeled')
unlabeled_files = os.listdir('Datasets/unlabeled')

print("Labeled files:", labeled_files)
print("Unlabeled files:", unlabeled_files)

### Load and Process Features from Audio Files  
This cell handles the loading and processing of features extracted from labeled and unlabeled audio files:  
- An instance of the `DataLoader` class is initialized.  
- The `featureDataFrame` method is called twice to generate DataFrames containing features for labeled and unlabeled files:  
  - Labeled features are extracted from the `labeled_files` directory and saved in `labeled_features_df`.  
  - Unlabeled features are extracted from the `unlabeled_files` directory and saved in `unlabeled_features_df`.  
- The `labeled_features_df` DataFrame is merged with the `labels` DataFrame on the `filename` column to associate labels with the features.  

Finally, both the labeled and unlabeled DataFrames are displayed to preview their structure and contents.  


In [None]:
dl = f.DataLoader()

labeled_features_df = dl.featureDataFrame(labeled_files, 'Datasets/labeled')
unlabeled_features_df = dl.featureDataFrame(unlabeled_files, 'Datasets/unlabeled')

labeled_features_df = labeled_features_df.merge(labels, on='filename')

print("Labeled Features DataFrame")
display(labeled_features_df.head())

print("\nUnlabeled Features DataFrame")
display(unlabeled_features_df.head())

## Feature Explanations and Calculations in Machine Learning Audio Analysis

To extract different features from the audio files, we used the `librosa` library. The librosa library offers many different features to explore and is therefore particularly useful for a task like this.

For a reference please, have a look at the function.py or more specifically the "extract_features" method as well as the librosa documentation (https://librosa.org/doc/latest/index.html). All the methods used were taken from there.

### 1. Spectral Centroid

**Explanation:**  
The spectral centroid represents the center of gravity of the spectral energy distribution. A higher spectral centroid is indicative of a brighter sound. It quantifies the frequency where the majority of the signal's energy resides (Sable, 2021).  

**Calculation:**  
Using the `librosa` library, the spectral centroid is computed as the weighted mean of the frequencies, with the magnitudes serving as weights (Librosa, n.d.-a).  

**Mathematical Formula:**  
$$
\text{Spectral Centroid}[t] = \frac{\sum_{k} S[k, t] \cdot \text{freq}[k]}{\sum_{j} S[j, t]}
$$
Where:  
- \(s\): Magnitude spectogram  
- \(freq\): Array of frequency values. 

(Librosa, n.d.-a).

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the spectral centroid differs vastly between genres. -->

In [None]:
f.visualize_feature('spectral_centroid', labeled_features_df)

### 2. Spectral Bandwidth

**Explanation:**  
Spectral bandwidth represents the spread (distinction between high and low) frequencies. The bandwidth indicates a how noisy or pure a sound sounds (Jakeli, 2023).  

**Calculation:**  
The spectral bandwidth is computed as the weighted standard deviation of frequencies (Music Information Retrieval, n.d.)

**Mathematical Formula:**  
$$
\left( \sum_{k} S[k, t] \cdot \left( \text{freq}[k, t] - \text{centroid}[t] \right)^p \right)^{\frac{1}{p}}
$$ 
Where:  
- \(x(n)\): Magnitude of the frequency bin.  
- \(f(n)\): Frequency value.  

Librosa (n.d.-b)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the spectral bandwith differs vastly between genres. 

In [None]:
f.visualize_feature('spectral_bandwidth', labeled_features_df)

### 3. Zero Crossing Rate (ZCR)

**Explanation:**  
Zero Crossing Rate measures the rate at which a signal crosses the zero amplitude line (so the prefix changes from positive to negative or vice versa) (OpenAE, n.d.-a). ZCR is an important indicator to capture the smoothness of an audio file (Bäckström et al., 2022).  

**Calculation:**  
The ZCR is computed by summing up the zero crossings in a signal and the normalizing by the amount of consecutive samples (=N) (Bäckström et al., 2022).

**Mathematical Formula:**  
$$
ZCR_k = \frac 1 N \sum_{h=kM}^{kM+N} |\text{sign}(x_h) - \text{sign}(x_{h-1})|,
$$

where \( M \) is the step between analysis windows and \( N \) the analysis window length (Bäckström et al., 2022).

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the zero crossing rate differs vastly between genres.


In [None]:
f.visualize_feature('zero_crossing_rate', labeled_features_df)

### 4. Root Mean Square (RMS)

**Explanation:**  
Root Mean Square (RMS) quantifies the loudness or energy of an audio signal. Higher RMS values correspond to louder sounds (Miraglia, 2024).  

**Calculation:**  
The RMS is computed as the square root of the mean of the squared amplitudes. (Wikipedia, 2024-a)  

**Mathematical Formula:**  
$$
RMS = \sqrt{\frac{1}{N} \sum_{i}^{N-1} x_i^2}
$$

(OpenAE, n.d.-b)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the root mean square differs vastly between genres.


In [None]:
f.visualize_feature('rms', labeled_features_df)

### 5. Spectral Rolloff

**Explanation:**  
Spectral rolloff defines the frequency below which a specified percentage (e.g., 85%) of the total spectral energy is concentrated (OpenAE, n.d.-b). This feature has influence on the frequency of the sound (Librosa, n.d.)

**Calculation:**  
The spectral rolloff is calculated by the n% spectral roll off point which is the exact frequency that marks the specicified percentage (e.g., 85%) below the n% energy of all energy is stored (OpenAE, n.d.-c).

**Mathematical Formula:**  
$$
\sum_{m=0}^{r} X_p[m] \geq \frac{n}{100} \sum_{m=0}^{M-1} X_p[m]
$$ 

(OpenAE, n.d.-c)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the spectral rolloff differs vastly between genres.

In [None]:
f.visualize_feature('spectral_rolloff', labeled_features_df)

### 6. MFCC Means

**Explanation:**  
Mel-Frequency Cepstral Coefficients (MFCCs) characterize the tonal and textural qualities of an audio signal (perception of loudness or tempo for example). Nowadays, MFCCs are widely used to characterize sound (Sable, 2021).

**Calculation:**  
The MFCC is calculated by computing the cepstrum coefficient for each frame (Wikipedia, 2024-c). Computing the mean MFCC provides a summary of these features across an entire audio clip.

**Mathematical Formula:**

$$
c_i = \sum_{n=1}^{N_f} S_n \cos \left( (n - 0.5) \left( \frac{i \pi}{N_f} \right) \right), \quad i = 1, \dots, L
$$

(Wikipedia, 2024-c)

The mathematical equation of calcuting the mean MFCC values:

$$
\text{MFCC}_i = \frac{1}{n} \sum_{n=1}^{N} \text{MFCC}(i, n)
$$  

(Wikipedia, 2025-b).

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the MFCC mean differs vastly between genres.

In [None]:
f.visualize_feature('mfcc_mean_1', labeled_features_df)

To display not just one mean value, we can use a multi-line plot.

In [None]:
f.visualize_feature_multiline(df=labeled_features_df, feature_prefix='mfcc_mean_', num_sub_features=7, x_col='genre', x_label='Genre',y_label='Average MFCC Mean',title='MFCC Means per Genre')

### 7. Chroma Mean

**Explanation:**  
Chroma features capture the energy distribution across the 12 pitch classes (e.g., C, D, E, etc.) within an octave (Sable, 2022). The Chroma Mean represents the average intensity of these pitch classes over the entire audio clip.  

**Calculation:**  
To calculate "such chroma vectors all tones of different octave of the corresponding 12 half tones are mapped into one octave. This means that for example tone ”A” is added to a value, whose sum represents a component of the chroma vector, regardless of its respective octave" (Englmeier et al., 2023, p. 185).

**Mathematical Formula:**  
$$
CV(i) = \sum_{m=0}^{M-1} |X_{CQ}(i + 12m)|
$$

(Englmeier et al., 2023)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the chroma mean differs vastly between genres.

In [None]:
f.visualize_feature('chroma_mean_1', labeled_features_df)

To display not just one mean value, we can use a multi-line plot.

In [None]:
f.visualize_feature_multiline(df=labeled_features_df, feature_prefix='chroma_mean_', num_sub_features=7, x_col='genre', x_label='Genre',y_label='Average Chroma Mean',title='Chroma Means per Genre')

### 8. Tempo

**Explanation:**  
Tempo represents the speed of an audio signal, typically measured in beats per minute (BPM). It plays a crucial role in genre classification (Wikipedia, 2025).

**Calculation:**  
Librosa determines the BPM by finding a global (for the entire audio file) tempo first. This global tempo is then used to build a cost function and afterwards tries to find the best-fitting beat times. The times should present the tempo from the audio as well as possibly (Elis, 2007). 

**Mathematical Formula:**  
$$
\text{Tempo (BPM)} = \frac{60}{\text{Average Beat Interval (seconds)}}
$$

(OmniCalculator, 2024)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the tempp differs vastly between genres.

In [None]:
f.visualize_feature('tempo', labeled_features_df)

### 9. Spectral Contrast

**Explanation:**  
Spectral Contrast quantifies the amplitude difference between high-energy (peaks/top quartile) and low-energy (valleys/bottom quartile) regions within frequency bands (Sable, 2021). 

**Calculation:**  
The Spectral contrast can be calculated by dividing the peak (point with highest energy) through the valley (point with lowest energy). The 10 x log10(rate) is used to properly convert the quotient into decibels (Yuto, 2024). The mean spectral contrast provides an overall summary of these values across all frames.

**Mathematical Formula:**  
$$
\text{Spectral Contrast} = 10 \times \log_{10} \left( \frac{\text{Peak Value}}{\text{Valley Value}} \right)
$$

(Yuto, 2024)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the spectral contrast mean differs vastly between genres.

In [None]:
f.visualize_feature('spectral_contrast', labeled_features_df)

To display not just one mean value, we can use a multi-line plot.

In [None]:
f.visualize_feature_multiline(df=labeled_features_df,feature_prefix='contrast_mean_',num_sub_features=7,x_col='genre', x_label='Genre',y_label='Average Spectral Contrast Mean', title='Spectral Contrast Means per Genre')

### 10. Tonnetz Mean

**Explanation:**  
Tonnetz features represent harmonic relationships between pitches, such as intervals or chords (Wikipedia, 2024-b).   

**Calculation:**  
Librosa transforms chroma features into the Tonnetz space and maps the interval like major third onto two-dimensional coordinates (Librosa, n.d-d). The Tonnetz Mean summarizes these relationships over time and computes the average value.

**Mathematical Formula:**  
$$
\text{Tonnetz Mean}_i = \frac{1}{n} \sum_{n=1}^{N} \text{Tonnetz}(i, n)
$$

Unfortunately, we were not able to find a good formula for this feature. Hence, we decided to simple describe how the arithmetic mean of the observed Tonnetz values could be calculated (Wikipedia, 2025-b).

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the tonnetz mean differs vastly between genres. 


In [None]:
f.visualize_feature('tonnetz_mean_1', labeled_features_df)

To display not just one mean value, we can use a multi-line plot.

In [None]:
f.visualize_feature_multiline(df=labeled_features_df,feature_prefix='tonnetz_mean_', num_sub_features=6, x_col='genre', x_label='Genre',y_label='Average Tonnetz Mean',title='Tonnetz Means per Genre')

### 11. Spectral Flatness

**Explanation:**  
Spectral Flatness measures the resemblance of a sound to a pure tone. Lower values indicate purer tones, while higher values suggest noise-like signals (Wikipedia, 2024-c).  

**Calculation:**  
The spectral flatness is alculated by dividing the geometric mean to the arithmetic mean of the spectral magnitudes (Wikipedia, 2024-c).

**Mathematical Formula:**  
$$
\text{Flatness} = \frac{geometric} {arithmetic} = \frac{\sqrt[N]{\prod_{n=0}^{N-1} x(n)}}{\frac{\sum_{n=0}^{N-1} x(n)}{N}} 
= \frac{\exp\left(\frac{1}{N} \sum_{n=0}^{N-1} \ln x(n)\right)}{\frac{1}{N} \sum_{n=0}^{N-1} x(n)}
$$

(Wikipedia, 2024-c)

**Reason for including this feature:**  
The chart below shows that this feature is important to distinguish genres as the spectral flatness differs vastly between genres. 

In [None]:
f.visualize_feature('flatness_mean', labeled_features_df)

Especially the genres "pop" & "classical" often seem to differ a lot from the rest of the genres which makes them potential candidates to be one of the clusters. Of course this is only based now on the labeled dataset and has to be analysed thoroughly.  So we calcualte the average mean for all features per genre and map them agains the average feature value across ALL features and display them:

In [None]:
avg_features_by_genre = labeled_features_df.groupby('genre').mean().reset_index()

avg_features_by_genre['overall_mean'] = avg_features_by_genre.drop('genre', axis=1).mean(axis=1)

average_mean_across_genres = avg_features_by_genre.mean()

In [None]:
df_genre = labeled_features_df.groupby('genre').mean().reset_index()

df_genre['overall_mean'] = df_genre.drop('genre', axis=1).mean(axis=1)

average_mean_across_genres = df_genre['overall_mean'].mean()

plt.figure(figsize=(10, 6))

plt.scatter(df_genre['genre'], df_genre['overall_mean'], color='blue', s=100, label='Genre Overall Mean')

plt.axhline(y=average_mean_across_genres, color='red', linestyle='--', linewidth=2, label='Average Mean Across Genres')

plt.xlabel('Genre')
plt.ylabel('Overall Mean')
plt.title('Overall Genre Means\nMapped around the Average Mean Across Genres')

plt.xticks(rotation=45)
plt.legend()

plt.tight_layout()
plt.show()

Plotting the average values of all features per genre against the total average value across all features and genres, supports our first guess that the genres "classical" and "pop" are good candidates for potential clusters based on the labeled dataset. A third potential cluster could either be metal or hiphop because they seemingly display a decent average of the values of the other clusters (except classical and pop). 

Of course, this is only for the labeled data and needs more in-depth analysis for the unlabeled dataset. 

### Scale Numeric Features for NMF Compatibility  
This cell processes numeric features from the labeled and unlabeled feature DataFrames to prepare them for Non-Negative Matrix Factorization (NMF):  
- Numeric columns are extracted from both `unlabeled_features_df` and `labeled_features_df` using `select_dtypes(include=[np.number])`.  
- If a `cluster` column exists, it is dropped from both DataFrames to ensure only relevant features are included.  
- A `MinMaxScaler` is initialized with a range of 1 to 2. This scaling range was chosen to avoid issues with non-negative numbers encountered during NMF when using `StandardScaler` or a `MinMaxScaler` with a 0 to 1 range.  
- The scaler is fitted on the unlabeled numeric features (`unlabeled_numeric`) and applied to transform both unlabeled and labeled numeric features.  

The resulting scaled feature arrays (`unlabeled_scaled` and `labeled_scaled`) are now ready for dimensionality reduction or further analysis.


In [24]:
unlabeled_numeric = unlabeled_features_df.select_dtypes(include=[np.number])
labeled_numeric = labeled_features_df.select_dtypes(include=[np.number])

if 'cluster' in unlabeled_numeric.columns:
    unlabeled_numeric.drop('cluster', axis=1, inplace=True)
    
if 'cluster' in labeled_numeric.columns:
    labeled_numeric.drop('cluster', axis=1, inplace=True)
    
scaler = MinMaxScaler((1, 2))

unlabeled_scaled = scaler.fit_transform(unlabeled_numeric)
labeled_scaled = scaler.transform(labeled_numeric)

In [25]:
unlabeled_knn = pd.DataFrame(unlabeled_scaled, columns=unlabeled_numeric.columns)
labeled_knn = pd.DataFrame(labeled_scaled, columns=labeled_numeric.columns)

# 3. Unsupervised Learning

### K-Means Clustering: Overview and Workflow  

K-Means Clustering is an **unsupervised learning algorithm** designed to partition an unlabeled dataset into distinct clusters. The parameter \( K \) specifies the number of clusters, e.g., $ K = 5 $ creates **five clusters**, and $ K = 10 $ forms **ten clusters**.  

#### Objectives of K-Means Clustering:  
1. Identify the optimal positions of $ K $ cluster centers (centroids).  
2. Assign each data point to the closest cluster based on **distance metrics**.  

---

### Workflow of the K-Means Algorithm  

#### **Step 1: Choose the Number of Clusters ($ K $)**  
- Determine $ K $ using methods like the **Elbow Method** or domain knowledge.  

#### **Step 2: Initialize $ K $ Cluster Centers**  
- Randomly select $ K $ data points as initial cluster centers.  

#### **Step 3: Compute Distances**  
- Calculate the distance between each data point and all cluster centers using the **Euclidean distance**:  
  $$  
  d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}  
  $$  

#### **Step 4: Assign Data Points to Clusters**  
- Allocate each data point to the cluster with the nearest center based on the computed distances.  

#### **Step 5: Update Cluster Centers**  
- Recalculate the cluster centers as the **mean** of the data points assigned to each cluster:  
  $$  
  c_j = \frac{1}{N_j} \sum_{i \in C_j} x_i  
  $$  
  Where:  
  - $ c_j $: New centroid of cluster $ j $.  
  - $ N_j $: Number of points in cluster $ j $.  

#### **Step 6: Repeat Until Convergence**  
- Iterate **Steps 3 to 5** until one of the following occurs:  
  - Cluster assignments **stabilize**.  
  - A **maximum number of iterations** is reached.  
  - The cluster centers become **unchanged**.  

---

### Key Notes:  
- The algorithm assumes clusters are spherical and equally sized.  
- Performance may depend on the initial choice of cluster centers, which is why multiple initializations are often used.  

#### Reference  
Dharmaraj, 2022  


### Hierarchical Clustering  

Hierarchical clustering is an **unsupervised learning algorithm** that groups data points based on similarity. Unlike **K-Means clustering**, hierarchical clustering does not require specifying the number of clusters beforehand. Instead, it generates a **tree-like structure (dendrogram)** that visualizes the relationships between data points.  

---

### Workflow of Hierarchical Clustering  

1. **Compute Distance Matrix**  
   - Measure the similarity or distance between all pairs of data points using metrics like **Euclidean distance**.

2. **Initialize Each Data Point as a Separate Cluster**  
   - Begin with $ n $ clusters, where each cluster contains a single data point.

3. **Merge the Closest Clusters**  
   - Combine the two clusters that are the most similar.

4. **Repeat Until One Cluster Remains**  
   - Continue merging clusters iteratively until all points are grouped into a **single cluster**.

---

### Dendrogram: Visual Representation of Clusters  

A **dendrogram** is a hierarchical tree structure that illustrates how clusters are formed.  
- The **height of each merge** indicates the **dissimilarity** between clusters.  
- A **horizontal cut** at a specific height determines the final number of clusters.  

---

### Types of Hierarchical Clustering  

#### **1. Agglomerative Clustering (Bottom-Up)**  
- Begins with **each data point as its own cluster**.  
- Clusters are **iteratively merged** based on similarity.  
- This is the **most common approach** to hierarchical clustering.  

#### **2. Divisive Clustering (Top-Down)**  
- Starts with **one large cluster** containing all data points.  
- The cluster is **recursively split** into smaller clusters.  
- Computationally more intensive and less commonly used.  

---

### Applications and Key Notes  
- **Applications**: Gene expression analysis, market segmentation, and text analysis.  
- **Advantages**: No need to predefine $ K $, and dendrograms provide a clear visualization of relationships.  
- **Disadvantages**: Computationally expensive for large datasets.  

---

### **Reference**  
Batra, 2022  


### Gaussian Mixture Model (GMM)

The **Gaussian Mixture Model (GMM)** is a **probabilistic model** used for **unsupervised clustering**. Unlike **K-Means**, which assigns each data point to a single cluster, GMM is a **soft clustering method**, meaning each data point has a **probability of belonging to multiple clusters**.

GMM assumes that data is generated from a mixture of multiple **Gaussian distributions**, where each distribution represents a **cluster** in the dataset.

---

#### **How GMM Works**

A **Gaussian Mixture Model** consists of **$K$ Gaussian distributions**, where each Gaussian is defined by three parameters:

1. **Mean ($\mu$)** – The center of the Gaussian distribution.
2. **Covariance ($\Sigma$)** – Defines the spread and shape of the data.
3. **Mixing Coefficient ($\pi$)** – The probability of each Gaussian component.

Each data point is assigned a probability of belonging to a Gaussian component based on the **Gaussian probability density function (PDF)**:

$$
p(x | \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \exp \left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right)
$$

where:
- $x$ is a data point,
- $d$ is the number of dimensions,
- $\mu$ is the mean vector,
- $\Sigma$ is the covariance matrix.

---

#### **Expectation-Maximization (EM) Algorithm**

The **Expectation-Maximization (EM) algorithm** is an iterative optimization method used to find **maximum-likelihood estimates** for model parameters when the data is **incomplete** or contains **hidden variables**. EM helps estimate missing data points and refines model parameters iteratively until convergence.

##### **Step 1: Expectation (E-Step)**  
- Initialize the **model parameters**:  
  - Mean ($\mu_k$)  
  - Covariance matrix ($\Sigma_k$)  
  - Mixing coefficients ($\pi_k$)  
- For each data point, calculate the **posterior probabilities** (i.e., the probability that a point belongs to a specific Gaussian component), represented by the **latent variables $\gamma_k$**.

##### **Step 2: Maximization (M-Step)**  
- Update the **parameters** using the computed probabilities:
  - **Mean ($\mu_k$)**: Update using the **weighted average** of data points.
  - **Covariance Matrix ($\Sigma_k$)**: Update using the **weighted variance**.
  - **Mixing Coefficients ($\pi_k$)**: Update using the **average of latent probabilities**.

##### **Step 3: Iterate Until Convergence**  
- Repeat the **E-step and M-step** iteratively until:
  - The **log-likelihood function stabilizes**.
  - The **parameter updates become minimal**.

---

#### **Reference**
- (Carrasco, 2024)


#### Why K-Means is the Best Choice

##### **1. Computational Efficiency**
- K-Means is significantly **faster** than hierarchical clustering and GMM.  
- It operates in **$O(n \times k \times t)$** time complexity, where:  
  - $n$ = number of data points  
  - $k$ = number of clusters  
  - $t$ = number of iterations  

##### **2. Scalability**
- Performs **efficiently on large datasets**, making it ideal for clustering **music genres** with **hundreds or thousands** of feature vectors.

##### **3. Interpretability & Simplicity**
- The **Elbow Method** helps determine the optimal number of clusters.  
- Cluster assignments are **clear and deterministic**, ensuring each data point belongs to **only one cluster**.

##### **4. Suitable for Music Genre Classification**
- While GMM provides **probabilistic assignments**, K-Means is **better suited for distinct genre separation**.  
- Most **music features** are naturally **separable**, making **hard clustering** an effective and efficient choice.


Although we decided to use KMeans, we still prove the existence of clusters using a Dendogram (Hierarchical Clustering). We use hierachical clustering to quickly show this because in dendograms are useful tools to visualise clusters and determine if there actaully appropriate clusters that can be worked with (Wilson, n.d.). 

In [None]:
group_cols = [col for col in labeled_features_df.columns if col not in ['filename']]
genre_agg = labeled_features_df.groupby('genre')[group_cols].mean().reset_index()

display(genre_agg)

feature_columns = [col for col in genre_agg.columns if col != 'genre']
X = genre_agg[feature_columns].values

Z = linkage(X, method='ward')

plt.figure(figsize=(12, 8))

labels = labeled_features_df['genre'].values

dendrogram(
    Z,
    labels=genre_agg['genre'].values,
    leaf_rotation=90,  
    leaf_font_size=10,
    color_threshold=0.7 * max(Z[:, 2]) 
)

plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample (Genre)')
plt.ylabel('Distance')
plt.tight_layout()
plt.show()

The dendogram clearly shows that clusters exist. Based on the structure of the dendogram, there are most likely three clusters in this dataset. 

1. Cluster: (Classical, Blues, Jazz)
2. Cluster: (Pop, Country)
3. Cluster (Disco, Hiphop, Reggae, Metal, Rock)

## 3.1 KMeans 

## Mathematical operations of KMeans

Below is an explanation of KMeans using a small fictious dataset of four datapoints (A,B,C,D) and 2 features (Feature 1 and Feature 2). To calcualte the distances, the Euclidean distance will be used. For simplicity we assume that the number of clusters is 2. 

| Point | Feature 1 (x₁) | Feature 2 (x₂) |
|-------|----------------|----------------|
| A     | 1.0            | 2.0            |
| B     | 2.0            | 3.0            |
| C     | 6.0            | 7.0            |
| D     | 7.0            | 8.0            |

At first, two random points will be selected to act as the centroids for the clusters. In our case we we will select Point A for cluster 1 and Point D cluster 2. 

<br>

Then, we will calculate the distance from each point to the centroids using the Euclidean distance (theory see above). 

Point A: The distance towards the first centroid is 0 since its the same point. Point A therefore belongs to cluster 1.

Point B: 
The distance from Point B towards the first centroid can be calculated by: $ \sqrt{(2 - 1)^2 + (3 - 2)^2} = 1.41 $

The distance from Point B towards the second centroid can be calculated by: $ \sqrt{(2 - 7)^2 + (3 - 8)^2} = 7.07 $

**It is quite clear, that Point B belongs to Cluster 1.**

Point C:
The distance from Point C towards the first centroid can be calculated by: $ \sqrt{(6 - 1)^2 + (7 - 2)^2} = 7.07 $

The distance from Point C towards the second centroid can be calculated by: $ \sqrt{(7 - 7)^2 + (7 - 8)^2} = 1 $

**It is quite clear, that Point C belongs to Cluster 2.**

Point D: The distance towards the second centroid is 0 since its the same point. Point D therefore belongs to cluster 2.

<br>
  
After that, the centroids of the newly formed clusters are calculated again (mean of all datapoints):

For Cluster 1: Point A + B = $ ((1, 2) + (2, 3))/2 = (1,5/2,5) $

For Cluster 2: Point C + D = $ ((6, 7) + (7, 8))/2 = (6,5/7,5) $

Based on the newly calculated centroids the calculations from step 1 for all points (A,B,C,D):

Point A: 
The distance from Point A towards the first centroid can be calculated by: $ \sqrt{(1 - 1,5)^2 + (2 - 2,5)^2} = 0.71 $  

The distance from Point A towards the second centroid can be calculated by: $ \sqrt{(1 - 6,5)^2 + (2 - 7,5)^2} = 7.78 $

**It is quite clear, that Point A belongs to Cluster 1.**

Point B: 
The distance from Point B towards the first centroid can be calculated by: $ \sqrt{(2 - 1,5)^2 + (3 - 2,5)^2} = 0.71 $

The distance from Point B towards the second centroid can be calculated by: $ \sqrt{(2 - 6,5)^2 + (3 - 7,5)^2} = 6.36 $

**It is quite clear, that Point B belongs to Cluster 1.**

Point C:
The distance from Point C towards the first centroid can be calculated by: $ \sqrt{(6 - 1,5)^2 + (7 - 2,5)^2} = 6.36$

The distance from Point C towards the second centroid can be calculated by: $ \sqrt{(6 - 6,5)^2 + (7 - 7,5)^2} = 0.71 $

**It is quite clear, that Point C belongs to Cluster 2.**

Point D: 
The distance from Point D towards the first centroid can be calculated by: $ \sqrt{(7 - 1,5)^2 + (8 - 2,5)^2} = 7.78 $

The distance from Point D towards the second centroid can be calculated by: $ \sqrt{(7 - 6,5)^2 + (8 - 7,5)^2} = 0.71 $

**It is quite clear, that Point D belongs to Cluster 2.**

<br>

Based on the calculations, KMeans would put Point A and B into the first cluster and Point C and D into the second cluster (Sena, 2024).

### Determine Optimal Number of Clusters for K-Means  
This cell uses the `KMeansClustering` class to identify the optimal number of clusters for the unlabeled dataset:  
- An instance of the `KMeansClustering` class is initialized with the scaled unlabeled feature data (`unlabeled_scaled`) and the original `unlabeled_features_df`.  
- The `finding_k` method is called with a range of cluster numbers (`[2, 11]`).  

The range `[2, 11]` was selected based on domain knowledge:  
- The labeled dataset contains 10 genres, making it impossible for the unlabeled dataset to contain more than 10 meaningful clusters.  
- A minimum of 2 clusters was chosen because having only 1 cluster would indicate no clustering, defeating the purpose of the analysis.  

This process helps determine the ideal number of clusters for the dataset using metrics like the elbow method or silhouette scores.


In [None]:
kmc = f.KMeansClustering(unlabeled_scaled, unlabeled_features_df)

kmc.finding_k([2, 11])

### Determining the Optimal Number of Clusters

An iterative approach was applied to determine the optimal number of clusters for the KMeans algorithm. Cluster values ranging from 2 to 10 were evaluated. For each cluster count:

- A KMeans model was trained using the scaled features of the "unlabeled" dataset.
- The **inertia score** (sum of squared distances of samples to their closest cluster center) was recorded to quantify the model's performance.

The results were visualized using an **Elbow Method plot** to identify the ideal number of clusters, where the inertia score shows a significant decrease before plateauing (GeeksforGeeks, 2024).



### Optimal Number of Clusters

Based on the **Elbow Plot**, the optimal number of clusters was determined to be **three**. This conclusion is drawn from the point where the inertia score shows a significant decrease and begins to plateau, indicating diminishing returns for higher cluster counts.


In [28]:
clustered_df = kmc.create_kmeans(3)

### Prepare Data for Clustering and Classification  
This cell prepares the data for further analysis and classification:  
- The `cluster` column from the clustered DataFrame (`clustered_df`) is added to the `unlabeled_knn` DataFrame.  
- The unlabeled dataset is split into features (`unlabeled_X`) and labels (`unlabeled_y`), where the `cluster` column serves as the label.  
- The `labeled_knn` DataFrame is assigned to `labeled_X` for use in comparison or model training.  

These steps ensure that both labeled and unlabeled datasets are properly structured for clustering and classification tasks.


In [29]:
unlabeled_knn['cluster'] = clustered_df['cluster']

unlabeled_X, unlabeled_y = unlabeled_knn.drop('cluster', axis=1), unlabeled_knn['cluster']
labeled_X = labeled_knn

### 3.1.1 Determining genres

In [None]:
pcv = f.PostClusteringVisualizations(clustered_df, labeled_features_df)

pcv.scatter_plot('spectral_centroid', 'spectral_bandwidth')

#### Clustering Analysis: Spectral Bandwidth vs. Spectral Centroid  

From the analysis of **spectral bandwidth** and **spectral centroid**, distinct clustering patterns were identified. The mapping of clusters to genres is as follows:  

| **Cluster** | **Genres** |  
| --- | --- |  
| 0 | Pop |  
| 1 | Classical |  
| 2 | Hip-Hop, Metal, Rock |  

This mapping highlights that certain genres, such as **Pop**, form a clearly defined cluster due to unique spectral properties, while others, such as **Hip-Hop**, **Metal**, and **Rock**, share overlapping features, grouping them into a single cluster. These results align with the expectation that genres with similar acoustic characteristics often exhibit overlapping clusters.  


In [None]:
pcv.scatter_plot('spectral_centroid', 'zero_crossing_rate')

#### Clustering Analysis: Zero Crossing Rate vs. Spectral Centroid  

Analyzing the plots of **zero crossing rate** and **spectral centroid** revealed the following genre distributions across clusters:  

| **Cluster** | **Genres** |  
| --- | --- |  
| 0 | Pop |  
| 1 | Classical |  
| 2 | Hip-Hop, Metal |  

#### Key Insights:  
- **Cluster 0**: Strongly associated with the **Pop** genre due to its distinct audio features, making it a well-defined cluster.  
- **Cluster 1**: Clearly aligned with the **Classical** genre, indicating its unique spectral properties.  
- **Cluster 2**: Encompasses genres such as **Hip-Hop** and **Metal**, suggesting overlapping characteristics in terms of zero crossing rate and spectral centroid.  

This clustering analysis establishes a clear mapping for genres with distinct audio profiles, while highlighting the need for additional features or techniques to better separate overlapping genres within Cluster 2.  

In [None]:
pcv.scatter_plot('spectral_centroid', 'contrast_mean_2')

#### Clustering Analysis: Contrast Mean 2 vs. Spectral Centroid

By analyzing the plots of **Contrast Mean 2** and **Spectral Centroid**, the following genre distributions across clusters were identified:

| **Cluster** | **Genres** |
| --- | --- |
| 0 | Pop |
| 1 | Classical | 
| 2 | Metal |

#### Final Genre Assignments
From all previous visualizations, we can conclude the following:
- **Cluster 0**: Pop
- **Cluster 1**: Classical
- **Cluster 2**: Metal

#### Next Steps
To validate these assumptions, a **K-Nearest Neighbors (KNN)** model will be trained to predict the clusters in the labeled dataset. This will help confirm the accuracy of the clustering results and the genre assignments.


In [33]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(unlabeled_X, unlabeled_y)

predicted_labels = knn.predict(labeled_X)

In [None]:
labeled_features_df['cluster'] = predicted_labels

clusters_genres = labeled_features_df.groupby(['cluster', 'genre']).size().reset_index().sort_values(by=['cluster', 0], ascending=False)

cluster_0 = clusters_genres[clusters_genres['cluster'] == 0]
cluster_1 = clusters_genres[clusters_genres['cluster'] == 1]
cluster_2 = clusters_genres[clusters_genres['cluster'] == 2]

display(cluster_0.head())
display(cluster_1.head())
display(cluster_2.head())

#### Validation Results: KNN Model

After running the KNN model, we can confirm that both the visual analysis and the mathematical operations arrive at the same conclusion regarding genre assignments:

| **Cluster** | **Genres** |
| --- | --- |
| 0 | Pop |
| 1 | Classical | 
| 2 | Metal |

This alignment between visual insights and predictive modeling validates the clustering results, confirming the accuracy of the genre assignments.

### 3.1.2 Mapping clusters to genres

In [35]:
c0_g = 'pop'
c1_g = 'classical'
c2_g = 'metal'

cluster_genre_mapping = {0: c0_g, 1: c1_g, 2: c2_g}

kmc.cluster_to_genre(cluster_genre_mapping)

kmc.create_submission()

## 3.2 PCA

In [None]:
pcar = f.PCAReduction(unlabeled_scaled)

pcar.find_n()

#### Principal Component Analysis (PCA) Feature Selection  

From the PCA plot, it is evident that there is a significant decline in explained variance between the 0th and 1st components, as well as between the 1st and 2nd components. However, beyond the 2nd component, the decline in explained variance becomes negligible.  

Based on this observation, we will select **2 PCA features** for further analysis.  


In [37]:
pca_features = pcar.reduction(2)
pca_features_labeled = pcar.reduce_labeled(labeled_scaled)

In [38]:
pca_features_df = pd.DataFrame(pca_features, columns=['PCA 1', 'PCA 2'])
pca_features_labeled_df = pd.DataFrame(pca_features_labeled, columns=['PCA 1', 'PCA 2'])
pca_features_labeled_df['genre'] = labeled_features_df['genre']

### 3.2.1 Clustering with PCA

In [39]:
kmc_pca = f.KMeansClustering(pca_features, unlabeled_features_df)

clustered_df_pca = kmc_pca.create_kmeans(3)

In [40]:
pca_features_df['cluster'] = clustered_df_pca['cluster']

In [None]:
pcv = f.PostClusteringVisualizations(pca_features_df, pca_features_labeled_df)
pcv.scatter_plot('PCA 1', 'PCA 2')

#### PCA Results  

The PCA reduction has effectively created two features that distinctly separate the three clusters, as observed in the plot. This clear separation indicates that the two principal components are sufficient to capture the underlying structure of the data and distinguish between the clusters.


### 3.2.2 Determining genres

In [None]:
pd.crosstab(clustered_df_pca['cluster'], clustered_df['cluster'])

#### Cluster Consistency  

From the crosstab, we can observe that the clusters are identical, with no differences between them. Therefore, we will apply the same cluster-to-genre mapping as we did for the standard KMeans clustering approach.


In [43]:
kmc_pca.cluster_to_genre(cluster_genre_mapping)

kmc_pca.create_submission()

### 3.2.3 Theory PCA

PCA (principal component analysis) is a way of dimension reduction. This means that using PCA you can analyse the dimension of the datset and reduce these to make a dataset which is smaller, which would improve runtime and makes models less complicated. A downside of PCA is that the resulting dataset is hard to interpret (Jaadi, 2024).

PCA divided over 5 steps:
1. Scaling the dataset: before any analysis can be run on a dataset it must be scaled.
2. Calculate the covariance matrix: since the PCA reduction uses variance to determine which components need to be kept it is important to calculate to covariance 
3. Calculate the eigenvectors and eigenvalues of the covariancematrix: the values for the newly computed components are the eigenvalues. Therefore it is important to calculate these.
4. Sort the eigenvectors from high to low and than sort the eigenvalues in the same order.
5. Filter to the amount of components chosen.
- (Jaadi, 2024)

When would you use PCA?
1. When working with linear data: other techniques like t-SNE and UMAP are best suited for non-linear data.
2. Computaion: PCA is very computationally efficient.
3. Information preservation: PCA preserves the maximum amount of variance in the dataset, which means that information is preserved most efficiently.
- (Ibm, 2024)

## 3.3 NMF

In [None]:
unlabeled_scaled

In [45]:
nmfr = f.NMFReduction(unlabeled_scaled)
df_nmf_unlabeled = nmfr.reduction()
df_nmf_labeled = nmfr.reduce_labeled(labeled_scaled)

In [None]:
df_nmf_labeled['genre'] = labeled_features_df['genre']

display(df_nmf_unlabeled.head())
display(df_nmf_labeled.head())

### 3.3.1 Clustering with NMF

In [47]:
kmc_nmf = f.KMeansClustering(df_nmf_unlabeled, unlabeled_features_df)

clustered_df_nmf = kmc_nmf.create_kmeans(3)

In [48]:
df_nmf_unlabeled['cluster'] = clustered_df_nmf['cluster']

In [None]:
pcv = f.PostClusteringVisualizations(df_nmf_unlabeled, df_nmf_labeled)
pcv.scatter_plot(0, 1)

### 3.3.2 Determining genres

In [None]:
pd.crosstab(df_nmf_unlabeled['cluster'], clustered_df['cluster'])

#### Cluster Consistency  

From the crosstab, we can observe that the clusters are identical, with no differences between them. Therefore, we will apply the same cluster-to-genre mapping as we did for the standard KMeans clustering approach.


In [51]:
kmc_pca.cluster_to_genre(cluster_genre_mapping)

kmc_pca.create_submission()

### 3.3.3 Theory NMF

NMF (non-negative matric factorization) is a method of factorizing a non-negative matrix. It is an unsupervised learning algorithm which reduces dimensionality. It is used for recommendation systems, text mining, and image analysis applications (Eunus, 2025). 

# 4. Conclusion


### 1. Important Features for Clustering
After conducting feature engineering and unsupervised learning, we observed that the following sound features were crucial for effective clustering of music genres:

- **Spectral Centroid**
- **Spectral Bandwidth**
- **Zero-Crossing Rate (ZCR)**
- **Mel-Frequency Cepstral Coefficients (MFCCs)**
- **Chroma Features**
- **Tempo (BPM)**
- **Spectral Contrast**
- **Tonnetz (Tonal Centroid Features)**

These features collectively improved the accuracy of clustering and genre classification.

### 2. Effect and Usefulness of Dimensionality Reduction
We applied two dimensionality reduction techniques:

- **Principal Component Analysis (PCA)**: Reduced the feature space while retaining the most variance in data. The results showed that PCA helped in visualizing genre clusters effectively and improved classification performance by reducing noise.
- **Non-negative Matrix Factorization (NMF)**: Provided a parts-based representation of data, which was particularly useful for uncovering underlying patterns in frequency distributions.

### 3. Additional Data for Better Recommendations
To enhance the clustering and recommendation system, we identified additional data that could improve results:

- **Lyrics Analysis**: Lyrics-based features could enhance genre classification by providing contextual meaning to songs.
- **Instrumentation Tags**: Identifying dominant instruments in each song (e.g., guitar for rock, synthesizers for electronic) could refine clustering.
- **User Listening History**: Incorporating listener preferences and interaction data would make recommendations more personalized.

# 5. Sources

- Batra, R. (2022, February 12). The math behind the K-Means and hierarchical clustering algorithm! Medium. https://medium.com/@rohit_batra/the-math-behind-the-k-means-and-hierarchical-clustering-algorithm-1d9a36a56c08

- Carrasco, O. C. (2024, March 1). Gaussian mixture models explained - towards data science. Medium. https://towardsdatascience.com/gaussian-mixture-models-explained-6986aaf5a95 

- Dharmaraj. (2022, January 29). The math behind K-Means Clustering - Dharmaraj - Medium. Medium. https://medium.com/@draj0718/the-math-behind-k-means-clustering-4aa85532085e 

- Harpale, V. K., & Bairagi, V. K. (2021). Seizure detection methods and analysis. In Elsevier eBooks (pp. 51–100). https://doi.org/10.1016/b978-0-32-391120-7.00008-6 

- Ibm. (2024, December 19). PCA. Think. https://www.ibm.com/think/topics/principal-component-analysis 

- Jaadi, Z. (2024, February 23). Principal Component Analysis (PCA): A Step-by-Step Explanation. Built In. https://builtin.com/data-science/step-step-explanation-principal-component-analysis 

- Librosa (n.d.-a) Spectral Centroid. Librosa. Retrieved January 3rd, 2025 from https://librosa.org/doc/main/generated/librosa.feature.spectral_centroid.html

- Librosa (n.d.-b) Spectral Bandwidth. Librosa. Retrieved January 3rd, 2025 from https://librosa.org/doc/main/generated/librosa.feature.spectral_bandwidth.html 

- Librosa (n.d.-c). Zero-crossing rate. Librosa. (n.d.). https://librosa.org/doc/main/generated/librosa.feature.zero_crossing_rate.html 

- Librosa (n.d.-k). Spectral Rolloff. Librosa. Retrieved January 6th, 2025 from https://librosa.org/doc/main/generated/librosa.feature.spectral_rolloff.html

- Selecting the number of clusters with silhouette analysis on KMeans clustering. (n.d.). Scikit-learn. https://scikit-learn.org/1.5/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

- So, N. L., Edwards, J. A., & Woolley, S. M. (2019). Auditory selectivity for spectral contrast in cortical neurons and behavior. Journal of Neuroscience, 40(5), 1015–1027. https://doi.org/10.1523/jneurosci.1200-19.2019

- Wikipedia. (2025, January 5). Tempo. Wikipedia, Retrieved, January 5th, 2025, from https://en.wikipedia.org/wiki/Tempo

- Wikipedia contributors. (2024, October 28). Spectral centroid. Wikipedia. https://en.wikipedia.org/wiki/Spectral_centroid

- ZalaRushirajsinh. (2023, November 4). The elbow method: finding the optimal number of clusters. Medium. https://medium.com/@zalarushirajsinh07/the-elbow-method-finding-the-optimal-number-of-clusters-d297f5aeb189

- Sena, M. (2024, May 22). Mastering K-means clustering: Implement the K-Means algorithm from scratch with this step-by-step Python tutorial. Towards Data Science. Retrieved January, 25th, 2025, from https://towardsdatascience.com/mastering-k-means-clustering-065bc42637e4

- GeeksforGeeks. (2024, November 2). Elbow-Methode für den optimalen k-Wert in KMeans. GeeksforGeeks. Retrieved January 10th, 2025, from https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/

- Wilson, B. (n.d.). Visualization with hierarchical clustering and t-SNE [Video]. DataCamp. Retrieved, January 20th, 2025, from https://campus.datacamp.com/courses/unsupervised-learning-in-python/visualization-with-hierarchical-clustering-and-t-sne?ex=1

- Sable, A. (2021). Introduction to audio analysis and synthesis. Paperspace Blog. Retrieved January 6th, 2025, from https://blog.paperspace.com/introduction-to-audio-analysis-and-synthesis/

- Sable, A. (2022). An Introduction to Audio Analysis and Processing: Music Analysis. Paperspace Blog. Retrieved January 6th, 2025, from https://blog.paperspace.com/audio-analysis-processing-maching-learning/

- Wikipedia. (2024, November 14-a). Spectral Flatness. Wikipedia. Retrieved, January 5th, 2025 from https://en.wikipedia.org/wiki/Spectral_flatness

- OpenAE (n.d.-a). Zero-crossing rate. OpenAE. Retrieved January 6th, 2025 from https://openae.io/features/latest/zero-crossing-rate/

- OpenAE (n.d.-b). RMS. OpenAE. Retrieved January 6th, 2025 from https://openae.io/features/latest/rms/

- OpenAE (n.d.-c). Spectral Rolloff. OpenAE. Retrieved January 6th, 2025 from https://openae.io/features/latest/spectral-rolloff/

- Bäckström, T., Räsänen, O., Zewoudie, A., Pérez Zarazaga, P., Koivusalo, L., Das, S., Gómez Mellado, E., Bouafif Mansali, M., Ramos, D., Kadiri, S., Alku, P., & Vali, M. H. (2022).  *Introduction to Speech Processing* (2nd ed.). Retrieved Janaury 7th, 2025 from https://speechprocessingbook.aalto.fi (doi: https://doi.org/10.5281/zenodo.6821775)

- Miraglia, D. (2024, January 18-a). What is RMS in audio? The absolute BEST beginner’s guide. Unison Audio. Retrieved Janaury 7th, 2025 from https://unison.audio/what-is-rms-in-audio/

- Wikipedia. (2024, November 2-b). Tonnetz. Wikipedia. Retrieved, January 5th, 2025 from https://en.wikipedia.org/wiki/Tonnetz

- Ellis, D. P. W. (2007, July 16). Beat tracking by dynamic programming. LabROSA, Columbia University. Retrieved, January 14th, 2025 from https://www.ee.columbia.edu/~dpwe/pubs/Ellis07-beattrack.pdf

- Wikipedia. (2024, Decmber 15-d). RMS. Retrieved 10th January, 2025 from https://en.wikipedia.org/wiki/Root_mean_square

- OmniCalculator. (2024, July 28). OmniCalculator. Retrieved 7th January, 2025 https://www.omnicalculator.com/other/bpm

- Englmeier, D., Hubig, N., Goebl, S., & Bohm, C. (2015). Musical similarity analysis based on chroma features and text retrieval methods. University of Munich; Helmholtz Center Munich.  Retrieved January, 28th from https://www.medien.ifi.lmu.de/pubdb/publications/pub/englmeier2015btw/englmeier2015btw.pdf

- Yuto (2024, April 21). Musical similarity analysis based on chroma features and text retrieval methods. Zenn. Retrieved, January 14th, 2025 https://zenn.dev/yuto_mo/articles/7413ca2ed4eb5f

- Wikipedia. (2025, January 8-b). Arithmetic mean. Wikipedia. Retrieved, January 28th, 2025 from https://en.wikipedia.org/wiki/Arithmetic_mean

- Wikipedia. (2024, November 10). Mel-frequency cepstrum. Wikipedia. Retrieved, January 28th, 2025 from https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

- Spectral features. (n.d.). Music Information Retrieval. Retrieved, January 28th, 2025 from https://musicinformationretrieval.com/spectral_features.html

- Jakeli, N. (2023, April 25). Clustering audio features. The MCT Blog. Retrieved, January 28th, 2025 from https://mct-master.github.io/blog/