---
title: "Unsupervised Learning"
format:
    html: 
        code-fold: false
---

<!-- After digesting the instructions, you can delete this cell, these are assignment instructions and do not need to be included in your final submission.  -->

{{< include instructions.qmd >}} 

# Code 

Provide the source code used for this section of the project here.

If you're using a package for code organization, you can import it at this point. However, make sure that the **actual workflow steps**—including data processing, analysis, and other key tasks—are conducted and clearly demonstrated on this page. The goal is to show the technical flow of your project, highlighting how the code is executed to achieve your results.

If relevant, link to additional documentation or external references that explain any complex components. This section should give readers a clear view of how the project is implemented from a technical perspective.

Remember, this page is a technical narrative, NOT just a notebook with a collection of code cells, include in-line Prose, to describe what is going on.

In [1]:
# import required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering, SpectralClustering
from sklearn.metrics import silhouette_score, calinski_harabasz_score
from sklearn.neighbors import NearestNeighbors

In [14]:
import warnings
warnings.filterwarnings("ignore")

In [9]:
# load the data
df = pd.read_csv("../../data/processed-data/race_track_features.csv")
df.head()

Unnamed: 0,Year,Grand Prix,Track Length (m),Max Speed (km/h),Full Throttle (%),Number of Corners,Number of Straights,Unnamed: 7
0,2020,Pre-Season Test 1,-1.000607,-0.11567,1.059667,-0.789651,-0.938394,
1,2020,Pre-Season Test 2,-1.000607,-0.11567,1.059667,-0.789651,-0.938394,
2,2020,Austrian Grand Prix,-1.000607,-0.11567,1.059667,-0.789651,-0.938394,
3,2020,Styrian Grand Prix,-1.024865,-1.84098,-1.757479,-0.275003,-0.037811,
4,2020,Hungarian Grand Prix,-0.957039,-0.490737,-0.407433,-1.3043,-0.037811,


In [10]:
# clustering with K-Means

In [15]:
# hyperparameter tuning
req_cols = ["Track Length (m)", "Max Speed (km/h)", "Full Throttle (%)", "Number of Corners", "Number of Straights"]

# Initialize lists to store evaluation metrics
em = []  # For inertia (WCSS)
ss = []  # For silhouette scores
for i in range(2,10):
    kmeans = KMeans(n_clusters = i).fit(df[req_cols])
    
    # intertia = within cluster sum of sqaures (WCSS)
    em.append(kmeans.inertia_)
    # silhouette scores
    score = silhouette_score(df[req_cols], kmeans.labels_)
    ss.append(score)