### Algorithm

1. Decide the Number of Clusters
2. Select random centroids:

    Suppose our dataset has 100 points then What is the number of centroids   
    that I require?
    
    Ans: We will require one centroid for each clusters so the no. of centroids 
    to be selected will be equal to the no. of clusters.
3. Assign Clusters:

    We will have to create a list called cluster_group. We have to calculate t
    he distance of each point from each of the centroids i.e. 2*100 = 200 
    values of distance calculated.
    We will have to check for each data point for the minimum distance with 
    which of the 2 clusters. 
4. Move Centroids
5. Check finish

In [1]:
import random
import numpy as np

In [2]:
class Kmeans(object):
  def __init__(self,n_clusters=2,max_iteration=100): #By default the number of clusters=2 and the maximum iterations=100
    #Declaring the variables
    self.n_clusters = n_clusters
    self.max_iteration = max_iteration
    self.centroids = None
  
  def fit_predict(self,X):
    
    random_index = random.sample(range(0,X.shape[0]), self.n_clusters) #It will randomly select any 2 rows because the no. of clusters=2 by default  
    self.centroids = X[random_index] #Assigning those 2 observations as centroids

    for i in range(self.max_iteration):
      #We have to perform 3 things throughout this loop
      #1. Assign Clusters
      cluster_group = self.assign_clusters(X)
      old_centroids = self.centroids
      #2. Move Centroids
      new_centroids = self.move_centroids(X,cluster_group)
      #3, Check Finish
      if (old_centroids == self.centroids).all():
        break
    
    return cluster_group

  def assign_clusters(self, X):       #Performing the first task
    cluster_group = []
    distances =[]

    for row in X:    
      for centroid in self.centroids:       #Checking for each data-point with each cluster
        distances.append(np.sqrt(np.dot(row-centroid,row-centroid)))
      min_distance = min(distances)
      index_pos = distances.index(min_distance)
      cluster_group.append(index_pos)
      distances.clear()        
    
    return np.array(cluster_group)

  def move_centroids(self,X,cluster_group):
    new_centroids = []

    cluster_type = np.unique(cluster_group)

    for types in cluster_type:
      new_centroids.append(X[cluster_group == type].mean(axis=0))

    return np.array(new_centroids)
  

### Checking the above algorithm on a dataset:

In [3]:
import pandas as pd
customer  = pd.read_csv("https://raw.githubusercontent.com/Halaarav/Some-Projects/main/customer_data.csv")

In [4]:
customer.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344,1,1,08-03-2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646,1,0,10-02-2014,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293,1,0,19-01-2014,94,173,...,5,0,0,0,0,0,0,3,11,0


In [5]:
customer = customer.drop(['ID', 'Education', 'Marital_Status', 'Year_Birth', 'Dt_Customer'], axis=1)

In [23]:
X = customer.Income #The column on the basis of which clustering will be performed

In [24]:
km = Kmeans(n_clusters = 4)

In [25]:
y_means = km.fit_predict(X)
y_means

array([1, 3, 1, 3, 1, 1, 1, 3, 3, 0, 0, 1, 1, 0, 2, 3, 3, 1, 0, 1, 1, 3,
       0, 1, 3, 2, 0, 3, 3, 3, 1, 1, 1, 1, 2, 0, 0, 0, 2, 1, 1, 2, 3, 2,
       1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 0, 3, 1, 1, 1, 1, 3, 3, 1, 1, 3, 3,
       2, 3, 3, 1, 2, 1, 3, 0, 3, 1, 1, 2, 1, 2, 0, 1, 1, 3, 2, 2, 1, 1,
       2, 3, 1, 2, 1, 3, 0, 3, 2, 1, 2, 1, 1, 1, 3, 1, 0, 3, 3, 3, 2, 1,
       2, 1, 3, 1, 0, 3, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 0, 1, 2, 3, 1, 1,
       1, 3, 3, 3, 1, 1, 1, 2, 2, 3, 3, 3, 3, 0, 3, 1, 1, 3, 0, 1, 1, 3,
       1, 2, 2, 1, 1, 1, 1, 2, 3, 0, 3, 1, 3, 1, 3, 2, 3, 3, 1, 3, 3, 1,
       2, 3, 1, 3, 1, 1, 2, 3, 3, 0, 1, 2, 0, 1, 0, 1, 1, 3, 2, 1, 1, 1,
       3, 3])

In [28]:
X = pd.DataFrame(X)

In [29]:
X["Clusters"] = y_means

In [30]:
X

Unnamed: 0,Income,Clusters
0,58138,1
1,46344,3
2,71613,1
3,26646,3
4,58293,1
...,...,...
195,75027,1
196,67546,1
197,65176,1
198,31160,3
