# KMeans Clustering: Advanced Tutorial

**KMeans** is an unsupervised learning algorithm that partitions data into `k` clusters by minimizing within-cluster variance.
It is one of the most widely used clustering algorithms due to its simplicity and efficiency.

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

sns.set(style='whitegrid')


## 2. Generate Synthetic Data

In [None]:
X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.60, random_state=0)
X = StandardScaler().fit_transform(X)

plt.scatter(X[:, 0], X[:, 1], s=40)
plt.title("Synthetic Dataset for Clustering")
plt.show()


## 3. Fit KMeans Model

In [None]:
kmeans = KMeans(n_clusters=4, random_state=42)
labels = kmeans.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X')
plt.title("KMeans Clustering Results")
plt.show()


## 4. Evaluate Clustering Quality

In [None]:
print("Inertia:", kmeans.inertia_)
print("Silhouette Score:", silhouette_score(X, labels))


## 5. Elbow Method for Optimal k

In [None]:
inertia = []
k_range = range(1, 10)

for k in k_range:
    model = KMeans(n_clusters=k, random_state=42)
    model.fit(X)
    inertia.append(model.inertia_)

plt.plot(k_range, inertia, marker='o')
plt.title("Elbow Method for Choosing k")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.show()


## 6. Summary

- KMeans is efficient and simple
- Use `inertia` and `silhouette_score` to evaluate clustering
- Use Elbow Method to choose the optimal number of clusters
- Sensitive to initialization and outliers