Clustering is an unsupervised learning technique that groups similar data points into clusters based on their features. The goal is to identify patterns or structures in the data without prior knowledge of class labels. Examples:
- **Customer Segmentation:** Group customers by demographics, behavior, or preferences to target marketing efforts.
- **Image Segmentation:** Cluster pixels in an image to identify objects or regions.
- **Gene Expression Analysis:** Cluster genes based on their expression levels to identify co-regulated genes.
- **Anomaly Detection:** Identify outliers or unsual patterns in data using clustering algorithms.

**K-Means Algorithm** a popular clustering algorithm that partitions the data into $K$ clusters based on their mean distances. It iteratively updates the cluster centroids (mean points) and reassigns data points to the closest cluster until convergence.

**How it Works:**
1. Initialize $K$ cluster centroids randomly.
2. Assign each data point to the closest centroid based on Euclidean distance.
3. Update the centroid of each cluster by calculating the mean of all data points assigned to it.
4. Repeat steps 2-3 until the centroids converge or a stopping criterion is reached.

Increasing the number of clusters ($K$) allows for more detailed clustering, but may lead to overfitting r noisy clusters. Decreasing $K$ may result in underfitting or loss of important patterns. The centroids represent the mean points of each cluster. As the number of clusters changes, the centroids adjust to reflect the new cluster assignments.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import datasets
from ipywidgets import interact, IntSlider
from IPython.display import display, clear_output

In [14]:
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # We'll use only the first two features for simplicity

In [15]:
# Create a function for interactive clustering visualization
def plot_kmeans_clusters(num_clusters=2):
    kmeans = KMeans(n_clusters=num_clusters, random_state=0)
    labels = kmeans.fit_predict(X)
    
    plt.figure(figsize=(10, 6))
    # Plot data points for each cluster
    for i in range(num_clusters):
        plt.scatter(X[labels == i, 0], X[labels == i, 1], label=f'Cluster {i + 1}')
    # Plot all centroids once
    plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
                s=200, c='black', marker='X', label='Centroids')
    
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(f'K-Means Clustering (Number of Clusters: {num_clusters})')
    plt.legend()
    plt.grid(True)
    plt.show()

In [16]:
# Create a slider for adjusting the number of clusters
num_clusters_slider = IntSlider(value=2, min=2, max=5, description='Number of Clusters')

# Create an interactive widget
interact(plot_kmeans_clusters, num_clusters=num_clusters_slider)

interactive(children=(IntSlider(value=2, description='Number of Clusters', max=5, min=2), Output()), _dom_clas…

<function __main__.plot_kmeans_clusters(num_clusters=2)>

### Observations
- The model groups data points into clusters based on feature similarity without using labels (unsupervised learning).
- Increasing the number of clusters results in smaller, more focused groups but can lead to over-segmentation.
- The cluster centroids represent the mean position of points in each cluster, helping visualize cluster centers.

### Analogy for Better Understanding
- Students enter a classroom through different doors and sit freely where they feel most comfortable—usually near friends or familiar faces—forming one big natural group.
- The teacher observes this seating to understand how students naturally cluster by affinity, noting strengths and weaknesses within the group.
- Using these observations, the teacher divides the class into smaller groups, arranging desks into clusters that balance and optimize learning—similar to calculating centroids in K-Means clustering.
- Over time, the teacher adjusts these groups based on ongoing observation and class dynamics, allowing clusters to stabilize into effective teams.
- This process mirrors how K-Means starts with data points together, then iteratively forms clusters by assigning points to centroids, recalculating them, and refining groupings for cohesion.
- Like building a strong team, clustering seeks to find natural, balanced groups through repeated refinement.

---
1. Multivariate Classification (Supervised) - Understand the complexity of labeled input features.
2. MLP (Deep) - See how the model handles that complexity to predict correctly.
3. K- Means (Unsepervised) - Group data by similarity to discover natural clusters without labels.