# 9. Unsupervised Learning

Although most applications today are in supervised learning, most of the data available is actually unlabeled. 

Here is where unsupervised learning shines. In this chapter, we will look at three unsupervised learning tasks:

1. **Clustering**: group similar instances in classes
2. **Anomaly detection**: learn what is normal data to detect abnormal instances
3. **Density estimation**: estimating the probability density function (PDF) of the random process that generated the dataset

### 1. Clustering

Examples of clustering algorithms include:* 

* Segmentation
* Data analysis
* Dimensionality reduction
* Anomaly detection
* Semi-supervised learning
* Search engines
* Image compression

Let's now look at two particular algorithms.

#### K-Means

K-means is a relatively simple yet powerful algorithm that will try to find each cluster’s center and assign each instance to the closest cluster.

Let's try it out on built-in blobs:

In [4]:
from sklearn.datasets import make_blobs
import numpy as np

blob_centers = np.array(
    [[ 0.2,  2.3],
     [-1.5 ,  2.3],
     [-2.8,  1.8],
     [-2.8,  2.8],
     [-2.8,  1.3]])
blob_std = np.array([0.4, 0.3, 0.1, 0.1, 0.1])

In [5]:
X, y = make_blobs(n_samples=2000, centers=blob_centers,
                  cluster_std=blob_std, random_state=7)

In [6]:
from sklearn.cluster import KMeans

k = 5
kmeans = KMeans(n_clusters=k)
y_pred = kmeans.fit_predict(X)