# K-Means Clustering

K-Means is an **unsupervised learning algorithm** used for clustering data into groups.

### Key Concepts:
- K-Means partitions the data into **K clusters**.
- Each cluster is represented by its **centroid** (mean position).
- Algorithm works iteratively:
  1. Choose K initial centroids.
  2. Assign each data point to nearest centroid.
  3. Update centroids (recompute means).
  4. Repeat until convergence.

### Pros:
- Simple and fast.
- Works well with spherical, well-separated clusters.

### Cons:
- Must specify K in advance.
- Sensitive to outliers and scaling.
- Assumes clusters are spherical.


In [1]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans


In [2]:
# Generate synthetic dataset
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=42)

plt.scatter(X[:, 0], X[:, 1], s=30, cmap='viridis')
plt.title("Synthetic Data for K-Means")
plt.show()

In [3]:
# Train K-Means
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
kmeans.fit(X)

# Predicted cluster labels
y_kmeans = kmeans.predict(X)

In [4]:
# Visualize clusters
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X', label='Centroids')
plt.title("K-Means Clustering Results")
plt.legend()
plt.show()

In [5]:
# Elbow Method to choose optimal K
inertia = []
K_range = range(1, 10)

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X)
    inertia.append(km.inertia_)

plt.plot(K_range, inertia, 'bo-')
plt.xlabel('Number of Clusters K')
plt.ylabel('Inertia (Within-cluster sum of squares)')
plt.title('Elbow Method for Optimal K')
plt.show()

### Key Takeaways:
- K-Means is simple and effective for clustering tasks.
- Must choose the number of clusters **K** (use Elbow method or Silhouette score).
- Works well when clusters are round and equally sized.
- Sensitive to scaling and initialization.
