#### Adding connectivity constraints
* An interesting aspect of AgglomerativeClustering is that connectivity constraints can be added to this algorithm 
* Connectivity constraints implies that only adjacent clusters can be merged together
* We can impose connectivity constraints through a connectivity matrix that defines for each sample the neighboring samples following a given structure of the data. 
* These constraint are useful to impose a certain local structure, but they also make the algorithm faster, especially when the number of the samples is high.

**The connectivity constraints are imposed via an connectivity matrix: a scipy sparse matrix that has elements only at the intersection of a row and a column with indices of the dataset that should be connected.**

* This matrix can be constructed from a-priori information:<br> for instance, you may wish to cluster web pages by only merging pages with a link pointing from one to another. 
* It can also be learned from the data <br> for instance 
    * using `sklearn.neighbors.kneighbors_graph` to restrict merging to nearest neighbors 
    * using `sklearn.feature_extraction.image.grid_to_graph` to enable only merging of neighboring pixels on an image.


**For instance, in the swiss-roll example below, the connectivity constraints forbid the merging of points that are not adjacent on the swiss roll, and thus avoid forming clusters that extend across overlapping folds of the roll.**

In [4]:
import matplotlib.pyplot as plt
from sklearn.neighbors import kneighbors_graph
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_swiss_roll

#### Swiss Roll dataset

In [9]:
# The Swiss Roll dataset.

from sklearn.datasets import make_swiss_roll

n_samples = 1500
noise = 0.05
X, _ = make_swiss_roll(n_samples, noise=noise)
# Make it thinner
X[:, 1] *= 0.5
print('Shape of X :',X.shape)

Shape of X : (1500, 3)


#### Clustering without any connectivity constraints

In [36]:
agglomerative = AgglomerativeClustering(n_clusters=6, linkage='ward', metric='euclidean')
agglomerative.fit(X)
labels_without_connectivity = agglomerative.labels_

In [38]:
plt.figure(figsize=(10,10))
ax = plt.subplot(projection='3d', elev=7, azim=-80)
for i in range(6):
    ax.scatter(
        X[agglomerative.labels_==i, 0],
        X[agglomerative.labels_==i, 1],
        X[agglomerative.labels_==i, 2],
        edgecolors='k'
        )

plt.title('Agglomerative clustering without connectivity constraints');


<img src='./plots/Agglomerative-clustering-without-connectivity-constraints.png'>

#### Clustering with connectivity constraints

In [31]:
# k-Nearest Neighbors with 10 neighbors
connectivity = kneighbors_graph(X, p=2, metric='minkowski', mode='connectivity', n_neighbors=10)
print('Shape of connectivity :',connectivity.shape)

Shape of connectivity : (1500, 1500)


In [32]:
agglomerative = AgglomerativeClustering(n_clusters=6, linkage='ward', metric='euclidean', connectivity=connectivity)
agglomerative.fit(X)
labels_connectivity = agglomerative.labels_

In [39]:
plt.figure(figsize=(10,10))
ax = plt.subplot(projection='3d', elev=7, azim=-80)
for i in range(6):
    ax.scatter(
        X[agglomerative.labels_==i, 0],
        X[agglomerative.labels_==i, 1],
        X[agglomerative.labels_==i, 2],
        edgecolors='k'
        )
plt.title('Agglomerative clustering with connectivity constraints');

<img src='./plots/Agglomerative-clustering-with-connectivity-constraints.png'>