# Clustering Car parking spaces for analysis : Unsupervised Learning


## Fetch data

In [30]:
import pandas as pd
import numpy as np
from path import Path
from sklearn.cluster import KMeans
import hvplot

# Read in data
file_path = Path('../Resources/carspaces_data.csv')
df_res = pd.read_csv(file_path, parse_dates=True, index_col='Timestamp')
df_res = df_res.drop('Unnamed: 0',axis=1)
df_res.head()

Unnamed: 0_level_0,Available
Timestamp,Unnamed: 1_level_1
2022-03-11 00:00:46,148
2022-03-11 00:10:47,148
2022-03-11 00:20:49,148
2022-03-11 00:30:51,148
2022-03-11 00:40:52,148


Find the best number of clusters using the Elbow Curve.

In [31]:
inertia = []
k = list(range(1, 11))
 
# Calculate the inertia for the range of k values
for i in k:
    km = KMeans(n_clusters=i, random_state=0)
    km.fit(df_res)
    inertia.append(km.inertia_)

# Create the Elbow Curve using hvPlot
elbow_data = {"k": k, "inertia": inertia}
df_elbow = pd.DataFrame(elbow_data)
df_elbow.hvplot.line(x="k", y="inertia", xticks=k, title="Elbow Curve")

Create a function called `get_clusters(k, data)` that finds the `k` clusters using K-Means on `data`. The function should return a DataFrame copy of `Data` that should include a new column containing the clusters found.

In [25]:
def get_clusters(k, data):
    # Initialize the K-Means model
    model = KMeans(n_clusters=k, random_state=0)

    # Fit the model
    model.fit(data)

    # Predict clusters
    predictions = model.predict(data)

    # Create return DataFrame with predicted clusters
    data["class"] = model.labels_

    return data

**Analyzing Clusters with the First Best Value of `k`**

In [32]:
# Looking for clusters the first best value of k
two_clusters = get_clusters(2, df_res)

# Plotting the 2D-Scatter with x="Annual Income" and y="Spending Score (1-100)"
two_clusters.hvplot.scatter(x="Timestamp", y="Available", by="class")

**Analyzing Clusters with the Second Best Value of `k`**

In [33]:
# Looking for clusters the second best value of k
three_clusters = get_clusters(3, df_res)

# Plotting the 2D-Scatter with x="Annual Income" and y="Spending Score (1-100)"
three_clusters.hvplot.scatter(y="Available", x="Timestamp", by="class")