# K-Means Clustering Implementation from Scratch

This notebook demonstrates how to implement the K-Means clustering algorithm using only NumPy. K-Means is one of the most popular unsupervised learning algorithms used for partitioning a dataset into K distinct, non-overlapping clusters.

## How K-Means Works
1. **Initialize centroids**: Randomly pick K data points as the initial cluster centers (centroids).
2. **Assign labels**: Assign each data point to the nearest centroid based on Euclidean distance.
3. **Update centroids**: Calculate the mean of all data points assigned to each cluster and set it as the new centroid.
4. **Repeat**: Continue steps 2 and 3 until the centroids no longer change significantly (convergence).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from k_means_scratch import KMeans
%matplotlib inline

# Set random seed for reproducibility
np.random.seed(42)

## Generating Synthetic Data

We will use `scikit-learn` to generate some synthetic blobs for clustering.

In [None]:
from sklearn.datasets import make_blobs

# Create 300 samples with 4 centers
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.title("Synthetic Dataset")
plt.show()

## Fitting our Scratch Implementation

Now we apply our `KMeans` class to the data.

In [None]:
kmeans = KMeans(k=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')

centers = kmeans.centroids
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X', label='Centroids')
plt.title("K-Means Clustering Result")
plt.legend()
plt.show()