# K-Means Clustering

K-Means Clustering is a popular **unsupervised machine learning algorithm** used to group data points into distinct clusters. Unlike supervised learning, it does not rely on labeled data; instead, it identifies patterns or structures in the dataset. The letter **k** in K-Means represents the number of clusters the algorithm divides the data into. Each cluster is defined by its centroid, which is the central point of that cluster.

## What Does K-Means Do?
K-Means works by finding groups in the data and ensuring that:
- Points within the same cluster are as similar as possible.
- Points in different clusters are as different as possible.

The number of clusters, **k**, is a parameter you choose before running the algorithm. To determine the best value for **k**, you may try several values and evaluate the results using techniques like the **Elbow Method**.

---

## Step-by-Step Explanation of K-Means Clustering

Let’s understand the algorithm with an example, explained in simple steps:

### Step 1: Start with Your Data
We begin with a dataset containing multiple points. These points could represent anything, such as customer preferences, geographical locations, or measurements. For example, consider the following graph showing a set of data points:  

![Step 1: Initial Data](images/Data.png)

---

### Step 2: Choose Initial Centroids Randomly
The next step is to select **k** random points from the data as the initial centroids. These centroids will act as the centers of our clusters. For instance, if **k = 3**, we pick three points randomly and mark them as the centroids:

![Step 2: Random Centroids](images/Random_Centroids.png)

Initially, these centroids may not be close to the actual cluster centers, but they will move as the algorithm progresses.

---

### Step 3: Assign Each Data Point to the Nearest Centroid
Now, we calculate the distance between each data point and the centroids using the **Euclidean Distance Formula**. Based on these distances, each point is assigned to the cluster of the nearest centroid. For example:
- Points closest to the green centroid are assigned to the green cluster.
- Points closest to the red centroid are assigned to the red cluster.
- Points closest to the blue centroid are assigned to the blue cluster.

![Step 3: Assign Points](images/Setting_Points.png)

At this stage, every data point belongs to one of the three clusters.

---

### Step 4: Update the Centroids
For each cluster, we calculate the average position (mean) of all the points in that cluster. This new average becomes the updated position of the centroid. The centroids move closer to the actual center of their respective clusters:

![Step 4: Updated Centroids](images/Final_Points.png)

These updated centroids better represent the clusters they are associated with.

---

### Step 5: Repeat Steps 3 and 4
Steps 3 and 4 are repeated in a loop. In each iteration:
1. Points are reassigned to the nearest centroid.
2. Centroids are updated based on the new cluster assignments.

The algorithm continues this process until the centroids stabilize and stop moving significantly. This means the clusters have been formed, and the algorithm has converged.

---

## When Does K-Means Stop?
K-Means stops when:
1. The centroids no longer move significantly between iterations.
2. A maximum number of iterations is reached (this is a safeguard to prevent the algorithm from running indefinitely).

---
