#### Agglomerative clustering tends to create uneven cluster sizes
* What this example shows us is the behavior “rich getting richer” of agglomerative clustering that tends to create uneven cluster sizes.

* This behavior is pronounced for the average linkage and complete linkage strategy, that ends up with a couple of clusters with few datapoints.

* The case of single linkage is we can see a very large cluster covering most digits, an intermediate size (clean) cluster with most zero digits and all other clusters being drawn from noise points around the fringes.

The ward linkage lead to more evenly distributed clusters that are therefore likely to be less sensible to a random resampling of the dataset.

In [38]:
import numpy as np
from matplotlib import pyplot as plt

from sklearn import manifold, datasets, cluster

### Data

In [39]:
digits = datasets.load_digits()
X, y = digits.data, digits.target
n_samples, n_features = X.shape
print(f'Number of samples is {n_samples} and number of features is {n_features}')

Number of samples is 1797 and number of features is 64


#### Manifold learning
* Manifold learning is an approach to non-linear dimensionality reduction.
* Manifold learning is  based on the idea that the dimensionality of many data sets is only artificially high.

#### Spectral Embedding
* Project the sample on the first eigenvectors of the graph Laplacian.
* The adjacency matrix is used to compute a normalized graph Laplacian whose spectrum (especially the eigenvectors associated to the smallest eigenvalues) has an interpretation in terms of minimal number of cuts necessary to split the graph into comparably sized components.
* This embedding can also 'work' even if the adjacency variable is not strictly the adjacency matrix of a graph but more generally an affinity or similarity matrix between samples

**However care must taken to always make the affinity matrix symmetric so that the eigenvector decomposition works as expected.**



In [40]:
X_reduced = manifold.SpectralEmbedding(n_components=2).fit_transform(X)
print('Shape of reduced X', X_reduced.shape)

Shape of reduced X (1797, 2)


### Agglomerative Clustering on a 2D embedding of digits

In [41]:
def agglomerative_clustering(linkage='ward', metric='euclidean', n_clusters=10):
    agglomerative = cluster.AgglomerativeClustering(n_clusters=n_clusters, linkage=linkage, metric=metric)
    agglomerative.fit(X_reduced, y)
    return agglomerative

In [42]:

def get_colors(agglomerative):
    colors= plt.cm.nipy_spectral(agglomerative.labels_.astype('float')/10)
    return colors


In [45]:
def plot_results(agglomerative, color, linkage, metric):
    plt.figure(figsize=(15,10))
    for label in np.unique(agglomerative.labels_):
        mask = agglomerative.labels_ == label

        plt.scatter(X_reduced[mask, 0], X_reduced[mask, 1], marker=f"${label}$", color=color[mask], s=50)
    plt.title(f'Agglomerative Clustering | linkage:{linkage} | Metric:{metric}')

In [44]:
def cluster_digits(linkage='ward', metric='euclidean', n_clusters=10):
    agglomerative = agglomerative_clustering(linkage, metric, n_clusters)
    colors = get_colors(agglomerative)
    plot_results(agglomerative, colors, linkage, metric)


#### Single Linkage | uses the minimum of the distances between all observations of the two sets.

In [55]:
cluster_digits(linkage='single')

<img src='./plots/single-linkage-digits-dataset.png'>

#### Complete linkage | uses the maximum distances between all observations of the two sets

In [56]:
cluster_digits(linkage='complete')

<img src='./plots/complete-linkage-digits-dataset.png'>

#### Average linkage | uses the average of the distances of each observation of the two sets

In [57]:
cluster_digits(linkage='average')

<img src='./plots/average-linkage-digits-dataset.png'>

#### Ward linkage | minimizes the variance of the clusters being merged.

In [58]:
cluster_digits(linkage='ward')

<img src='./plots/ward-linkage-digits-dataset.png'>