<a href="https://colab.research.google.com/github/JdMohamed/machine-learning/blob/main/unsupervised_machine_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# K-**Means**

📊 What is K-Means Clustering?
K-Means is an unsupervised machine learning algorithm used to group data points into clusters based on similarity.

It’s one of the simplest and most widely used clustering algorithms.

🔍 Definition:
K-Means Clustering partitions data into K clusters, where each data point belongs to the cluster with the nearest centroid (mean of the cluster).

🧠 How it Works (Step-by-Step):
Choose K = number of clusters.

Initialize K random centroids.

Assign each data point to the nearest centroid.

Update centroids by calculating the mean of the points in each cluster.

Repeat steps 3-4 until:

Points stop changing clusters (convergence), or

A maximum number of iterations is reached.

📌 Example:
Imagine clustering customers based on:

Age	Annual Income
22	15,000
35	70,000
60	90,000
25	20,000
40	60,000

K-Means can group them into, for example, 3 clusters: low, medium, and high-income customers.

🧪 Python Example (with scikit-learn):
python
Copier
Modifier
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Example data
X = np.array([
    [22, 15000],
    [35, 70000],
    [60, 90000],
    [25, 20000],
    [40, 60000]
])

# Train KMeans with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot clusters
for i in range(3):
    plt.scatter(X[labels == i][:, 0], X[labels == i][:, 1], label=f'Cluster {i+1}')

plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='black', marker='X', label='Centroids')
plt.xlabel('Age')
plt.ylabel('Income')
plt.legend()
plt.title('K-Means Clustering')
plt.show()
✅ When to Use K-Means:
Customer segmentation

Market basket analysis

Image compression

Document classification

⚠️ Limitations:
Issue	Description
Need to choose K	You must specify number of clusters in advance
Sensitive to scale	Features with different units can skew results
Sensitive to outliers	Can pull centroids away from real groups
Assumes spherical clusters	Doesn’t perform well if clusters have irregular shapes

✅ Use StandardScaler to scale features before clustering.

📚 Summary
Feature	Value
Algorithm type	Unsupervised learning
Goal	Group similar data points
Input	Unlabeled data
Output	Cluster assignments (labels)
Popular use cases	Segmentation, grouping, analysis