### **K-Means Algorithm Implementation**

This notebook demonstrates the implementation of the K-Means clustering algorithm using the Iris dataset. We will explore the steps involved, visualize the clusters, and use the Elbow Method to determine the optimal number of clusters.

#### **1. Import Libraries**

In [None]:
from sklearn.cluster import KMeans
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris

#### **2. Load the Iris Dataset**

In [None]:
iris = load_iris()

#### **3. Create a DataFrame**
We load the Iris dataset into a pandas DataFrame for easier manipulation and visualization.

In [None]:
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

#### **4. Add Target Labels (Flower Types)**
The target column is added to the DataFrame to compare the actual labels with clusters later.

In [None]:
df['flower'] = iris.target
df.head()

#### **5. Drop Unnecessary Columns**
We will drop some columns (e.g., `sepal length`, `sepal width`, and target label) to focus on clustering based on `petal length` and `petal width`. These features are more discriminative for clustering.

In [None]:
df.drop(['sepal length (cm)', 'sepal width (cm)', 'flower'], axis='columns', inplace=True)
df.head(3)

#### **6. Apply K-Means Clustering**
Using the K-Means algorithm, we group the data into 3 clusters and predict cluster assignments.

In [None]:
km = KMeans(n_clusters=3)
yp = km.fit_predict(df)
yp

#### **7. Verify Cluster Assignments**

In [None]:
yp[50:100]

#### **8. Add Clusters to the DataFrame**
Add a `cluster` column to the DataFrame to analyze cluster assignments.

In [None]:
df['cluster'] = yp
df.head()

#### **9. Check Unique Clusters**
Verify the unique clusters created by the algorithm.

In [None]:
df.cluster.unique()

#### **10. Separate Clusters for Visualization**
Create separate DataFrames for each cluster to plot them individually.

In [None]:
df1 = df[df.cluster == 0]
df2 = df[df.cluster == 1]
df3 = df[df.cluster == 2]

#### **11. Visualize Clusters**
Visualize the clusters using a scatter plot with `petal length` and `petal width`.

In [None]:
plt.scatter(df1['petal length (cm)'], df1['petal width (cm)'], color='blue', label='Cluster 0')
plt.scatter(df2['petal length (cm)'], df2['petal width (cm)'], color='green', label='Cluster 1')
plt.scatter(df3['petal length (cm)'], df3['petal width (cm)'], color='yellow', label='Cluster 2')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.legend()
plt.title('K-Means Clustering')

### **Elbow Method for Optimal K**

The Elbow Method is used to determine the optimal number of clusters by evaluating the Sum of Squared Errors (SSE) for different values of `k`. The point where SSE starts to diminish significantly is considered the optimal `k`. 

#### **1. Calculate Sum of Squared Errors (SSE)**

In [None]:
sse = []
k_rng = range(1, 10)
for k in k_rng:
    km = KMeans(n_clusters=k)
    km.fit(df)
    sse.append(km.inertia_)

#### **2. Plot Elbow Curve**
Visualize the SSE values for different numbers of clusters to identify the optimal `k`. 

In [None]:
plt.xlabel('K')
plt.ylabel('Sum of Squared Errors')
plt.plot(k_rng, sse)
plt.title('Elbow Method for Optimal K')