### K-Means Clustering

K-Means clustering is an **unsupervised learning** algorithm used to group similar data points into clusters. Since it is unsupervised, it does not require labeled data; instead, the algorithm automatically identifies patterns and creates clusters based on the features of the data points. The goal of K-Means is to partition the data into **K distinct, non-overlapping clusters** in which each data point belongs to the cluster with the nearest mean (or centroid).

#### Principles of K-Means Clustering
The basic principles of K-Means clustering are:
- It attempts to minimize the **within-cluster variance** (the distance between points and their assigned centroid).
- The number of clusters, **K**, must be predefined by the user.
- Data points are assigned to the cluster whose centroid has the minimum distance to the point.
- Centroids are updated iteratively to minimize the distance between points and their cluster centroid.

### Steps of the K-Means Algorithm:

1. **Assume the number of clusters (K)**:
   - Before starting the clustering process, you need to specify the number of clusters, **K**. This is usually based on prior knowledge or determined using methods like the **Elbow Method**.

2. **Initialize the centroids**:
   - Randomly initialize **K centroids**. These centroids represent the center of each cluster. They are randomly chosen among the data points at the start of the algorithm.

3. **Assign clusters**:
   - Each data point is assigned to the nearest centroid based on the chosen distance metric, typically **Euclidean distance**. This assigns data points to their closest cluster.

4. **Move centroids**:
   - Once all points are assigned to clusters, the centroids are updated by computing the **mean** of all points assigned to each cluster. This step recalculates the centroids to reflect the center of the points within each cluster.

5. **Repeat until convergence**:
   - Steps 3 and 4 are repeated iteratively. Data points are reassigned to the nearest centroids, and centroids are recalculated after each assignment. The process stops when the centroids no longer move or when they move very little, indicating convergence.

### Elbow Method

The **Elbow Method** is a technique used to determine the optimal number of clusters (K) for K-Means clustering. It involves running the K-Means algorithm for a range of K values and plotting the **sum of squared distances** from each point to its assigned centroid (known as **inertia**).

- The goal is to find the value of **K** at which the **inertia** starts decreasing at a slower rate, forming an "elbow" shape in the plot.
- The point where the elbow appears is often considered the optimal K, as adding more clusters beyond this point does not significantly improve the model's performance.



In [4]:
import pandas as pd
import numpy as np
import random

In [6]:
class KMeans:
    
    def __init__(self, n_cluster = 2, max_iter = 20):
        self.n_cluster = n_cluster
        self.max_iter = max_iter
        slf.centroids = None

    # assigning cluster to each points wrt to the centroids
    def assign_cluster(self, X):
        
        clustor_group = []
        distances = []
        
        for row in X:
            for centroid in self.centroids:
                
                # euclidean distance
                eucd_dist = np.sqrt(np.dot((row - centroid), (row - centroid)))
                distances.append(eucd_dist)
            #minimum distace for that point to the centroid
            min_dist = min(distances)
            #the centroid which is nearest
            index_pos = distances.index(min_dist)
            clustor_group.append(index_pos)
        
        return np.array(clustor_group)
               
    
    #moving centroids function
    def move_centroid(self, X, clustor_group):
        
        new_centroids = []
        #total no of clustor possible
        tot_clustor = np.unique(clustor_group)
        
        for type in tot_clustor:
            #the clustor of the same type and new centroid will be their mean
            new_centroid.append(X[clustor_group == type].mean(axis = 0))
        
        return new_centroid
            
        
    # running the unsupervised model    
    def fit_predict(self, X):
        
        random_idx = random.sample(0, X.shape[0], self.n_cluster)
        self.centroids = X[random_idx]
        
        #it will run till the total number of clustor
        for i in range(self.max_iter):
            
            #assign the points to each cluster
            cluster_group = self.assign_cluster(X)
            old_centroid = self.centroids
            
            self.centroids = self.move_centroids(X, cluster_group)
            
            #if elbow steepness is very less
            if (old_centroid == self.centroids).all():
                break;
            
        
        return clustor_group