## Unsupervised learning 
### K-Means clustering

In [3]:
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

In [5]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3)
y_pred = kmeans.fit_predict(X)

In [10]:
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [7]:
y_pred is kmeans.labels_

True

In [9]:
kmeans.cluster_centers_

array([[1.462     , 0.246     ],
       [5.59583333, 2.0375    ],
       [4.26923077, 1.34230769]])

In [13]:
import numpy as np

X_new = np.array([[0, 2], [3, 2], [-3, 3], [-3, 2.5]])
kmeans.predict(X_new)

array([0, 2, 0, 0], dtype=int32)

In [14]:
kmeans.transform(X_new)

array([[2.28340973, 5.59595898, 4.31959379],
       [2.33280089, 2.59610419, 1.42951248],
       [5.24346832, 8.64955241, 7.45584735],
       [4.9989959 , 8.60826678, 7.36084013]])

Instead of assigning each instance to a single cluster, which is called hard
clustering, it can be useful to give each instance a score per cluster, which
is called soft clustering. The score can be the distance between the
instance and the centroid, this transformation can be a very
efficient nonlinear dimensionality reduction technique. 

In [17]:
kmeans.score(X)

-31.37135897435897

In [18]:
kmeans.inertia_

31.37135897435897

### Accelerated K-Means and mini-batch K-Means
Instead of using the full dataset at each
iteration, the algorithm is capable of using mini-batches, moving the
centroids just slightly at each iteration. This speeds up the algorithm
typically by a factor of three or four and makes it possible to cluster huge
datasets that do not fit in memory. Although the Mini-batch K-Means algorithm is much faster than the
regular K-Means algorithm, its inertia is generally slightly worse,
especially as the number of clusters increases.

In [19]:
from sklearn.cluster import MiniBatchKMeans

minibatch_kmeans = MiniBatchKMeans(n_clusters=3)
minibatch_kmeans.fit(X)