### K-Mean

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

The k-means algorithm divides a set of 
N
 samples 
X
 into 
K
 disjoint clusters 
C
, each described by the mean 
μ
j
 of the samples in the cluster. The means are commonly called the cluster “centroids”; note that they are not, in general, points from 
X
, although they live in the same space.

The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:


![image.png](attachment:image.png)

In [1]:
from sklearn import cluster, datasets

In [2]:
iris = datasets.load_iris()

In [3]:
X_iris = iris.data

In [4]:
y_iris = iris.target

In [5]:
k_means = cluster.KMeans(n_clusters=3)

In [6]:
k_means.fit(X_iris) 

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

In [7]:
print(k_means.labels_[::10])

[1 1 1 1 1 0 0 0 0 0 2 2 2 2 2]


In [8]:
print(y_iris[::10])

[0 0 0 0 0 1 1 1 1 1 2 2 2 2 2]
