# Modeling and Evaluation

Train and evaluate a series of KMeans models to find the best performing model by choosing a value for **k**.

## Steps:

1. **Load the Clean, Combined Dataset**
   - Load the preprocessed and combined dataset containing the audio features.

2. **Select Audio Features Based on Description**
   - Choose the relevant audio features from the dataset for clustering.

3. **Scale the Dataset**
   - Apply scaling (e.g., StandardScaler) to normalize the features before training the model.

4. **Train a Range of Models with Different k Values**
   - Train multiple KMeans models using different values for **k** (e.g., k=2, 3, 4, ..., 10).

5. **Evaluate and Select the Top 2 Values for k**
   - Use the **Elbow Method** to visually inspect the optimal number of clusters.
   - Use the **Silhouette Score** to evaluate how well-defined the clusters are.
   
6. **Try a Live Test with the Selected Models**
   - Test the two top-performing models (based on the Elbow Method and Silhouette Score) in a live setting.
   - Select the best performing value of **k** based on the test results.

In [2]:
import matplotlib.pyplot as plt
import os
import pandas as pd
import numpy as np
import pickle
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.cluster import DBSCAN
from sklearn.mixture import GaussianMixture
from scipy.spatial import distance_matrix