An easy method for determining the best number for K is the elbow curve. Elbow curves get their names from their shape: they turn on a specific value, which looks a bit like an elbow!

To create an elbow curve, we'll plot the clusters on the x-axis and the values of a selected objective function on the y-axis.

Inertia is one of the most common objective functions to use when creating an elbow curve. While what it's actually doing can get into some pretty complicated math, basically the inertia objective function is measuring the amount of variation in the dataset.

In [7]:
# Initial imports
import pandas as pd
from sklearn.cluster import KMeans
import plotly.express as px
import hvplot.pandas

In [8]:
file_path = "/Users/itr/Desktop/Class Folder/Cryptocurrencies/Resources/new_iris_data.csv"
df_iris = pd.read_csv(file_path)
df_iris.head()

Unnamed: 0,sepal_length,petal_length,sepal_width,petal_width
0,5.1,1.4,3.5,0.2
1,4.9,1.4,3.0,0.2
2,4.7,1.3,3.2,0.2
3,4.6,1.5,3.1,0.2
4,5.0,1.4,3.6,0.2


## Store Values of K to Plot

Create an empty list to hold inertia values. Also store a range of K values we want to test here.

In [9]:
inertia =[]
k = list(range(1, 11))

## Loop through K Values and Find Inertia

Loop through each K value, find the inertia, and store it into list

In [18]:
# Looking for the best K
for i in k:
    km = KMeans(n_clusters=i, random_state=0)
    km.fit(df_iris)
    inertia.append(km.inertia_)

## Create a DataFrame and Plot the Elbow Curve

Create a DataFrame that stores K values and their appropriate inertia values. This will allow for a plot of the results with `hvplot`.

In [19]:
# Define the dataframe to plot the elbow curve using hvPlot
elbow_data = {"k": k, "inertia": inertia}
df_elbow = pd.DataFrame(elbow_data)
df_elbow.hvplot.line(x="k", y="inertia", title="Elbow Curve", xticks=k)