write a python function that implements the k-means clustering algorithm. This function should take specific inputs and produce a list of final centroids. k-means clustering is a method used to partition n points into k clusters. The goal is to group similar points together and represent each group by its center (called the centroid)
the centroids get updated to the mean of the points in the each cluster

In [None]:
import numpy as np

def euclidean_distance(a,b):
  return np.sqrt(((a-b)**2).sum(axis=1))

def k_means_clustering(points, k, initial_centroids,max_iterations):
  centroids=np.array(initial_centroids)
  points=np.array(points)

  for _ in range(max_iterations):
      distances=np.array([euclidean_distance(centroid,points) for centroid in centroids])
      #assign points to the nearest centroid
      assignments=np.argmin(distances, axis=0)
      new_centroids=np.array([points[assignments==i].mean(axis=0) if len(points[assignments==i])>0 else centroids[i] for i in range(k)])
      #check for convergence
      if np.all(centroids==new_centroids):
        break
      centroids=new_centroids
      centroids=np.round(centroids,4)


  return [tuple(centroid) for centroid in centroids]

Boolean Indexing:

The condition assignments == i produces a Boolean 1D array of shape assignments, where each element is True if the corresponding value in assignments equals i.
When you use this Boolean array to index points, NumPy interprets it as a request to select rows where the Boolean array is True.
NumPy applies the Boolean array along the first axis (rows) of points. This is because the number of elements in the Boolean array matches the number of rows in points.
so lets say the assignments==i is [True, True, False, True, False] then the rows with indexs [0,1,3] will be selected
and .mean(axis=0): axis=0 Means we are aggregating along the rows, producing one mean value per column.

centroids == new_centroids:

Performs an element-wise comparison between centroids and new_centroids
np.all():

Evaluates whether all elements in the Boolean array are True.
If every element is True, np.all() returns True; otherwise, it returns False.

In [None]:
points=[(1,2),(1,4),(1,0),(10,2),(10,4),(10,0)]
k=2
initial_centroids=[(1,1),(10,1)]
max_iterations=10
print(k_means_clustering(points, k, initial_centroids,max_iterations))

[(1.0, 2.0), (10.0, 2.0)]
