## Types of Clustering
There are many types of clustering algorithms of which here are the top 4 well-known ones:

* Connectivity-based Clustering
* Centroid-based Clustering
* Distribution-based Clustering
* Density-based Clustering

### Clustering Principles:
All clustering algorithms try to group data points based on similarities between the data. What does this actually mean?


It is often spoken of, in terms of **`inter-cluster heterogeneity`** and **`intra-cluster homogeneity`**. 

* `Inter-cluster heterogeneity`<br> This means that the clusters are as different from one another as possible. The characteristics of one cluster are very different from another cluster. This makes the clusters very stable and reliable.
* `Intra-cluster homogeneity`<br> This talks about how similar are the characteristics of all the data within the cluster. The more similar, the more cohesive is the cluster and hence more stable. 

* **Hence the objective of clustering is to maximise the inter-cluster distance (Inter-cluster heterogeneity) and minimise the intra-cluster distance (intra-cluster homogeneity )**

## Hierarchical Clustering
* Here the distance is calculated between points themselves and not any centroid. Hence it is called connectivity-based clustering algorithm. 

The process of Hierarchical Clustering can be acheived in two ways
* Agglomerative <br>
 This is an iterative process in which the datapoints that are closest to each other keep getting fused till we get one large cluster containing all the points. This is called Agglomerative Clustering. This process follows bottom-up approach<br>
* Divisive <br>
This process follows Top-down approach where we start with a single cluster, then iterative divide into sub-clusters 
    * Start with all data as one cluster and iteratively break down to many clusters depending on similarity criteria

#### This iterative process leads to the formation of a tree-like structure called the dendrogram. The height of the dendrogram is a measure of the dissimilarity between the clusters

####  During both the types of hierarchical clustering, the distance between two sub-clusters needs to be computed. The different types of linkages describe the different approaches to measure the distance between two sub-clusters of data points. 

The different types of linkages are
* Single linkage
* Complete linkage
* Average linkage
* Ward linkage

In [10]:
import time
import warnings

import numpy as np
import matplotlib.pyplot as plt

from sklearn import cluster, datasets
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.image import grid_to_graph
from itertools import cycle, islice

from tempfile import mkdtemp
from joblib import memory



#### Generate datasets. 
* We choose the size big enough to see the scalability of the algorithms, but not too big to avoid too long running times

In [85]:
n_samples = 1500

# Make a large circle containing a smaller circle in 2d.
noisy_circles = datasets.make_circles(n_samples=n_samples, factor=0.5, noise=0.05)
# Make two interleaving half circles.
noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05)
#  # Generate isotropic Gaussian blobs for clustering.
blobs = datasets.make_blobs(n_samples=n_samples, random_state=8, centers=3)

no_structure = np.random.rand(n_samples, 2), None

# Anisotropicly distributed data
random_state = 170
X, y = datasets.make_blobs(n_samples=n_samples, random_state=random_state)
transformation = [[0.6, -0.6], [-0.4, 0.8]]
X_aniso = np.dot(X, transformation)
aniso = (X_aniso, y)

# blobs with varied variances
varied = datasets.make_blobs(
    n_samples=n_samples, cluster_std=[1.0, 2.5, 0.5], random_state=random_state
)


data = {
    'Noisy Circles': noisy_circles,
    'Noisy Moons': noisy_moons,
    'Blobs': blobs,
    'No structure': no_structure,
    'Anisotropic': aniso,
    'Different Variance': varied
}

#### Clustering util

In [100]:
def find_cluster(ax, data, n_cluster=2, linkage='ward', metric='euclidean', title='Agglomerative Clustering'):
    
    agglomerative_cluster = cluster.AgglomerativeClustering(
        n_clusters=n_cluster, metric=metric, linkage=linkage)


    x, y = data
    agglomerative_cluster.fit(x, y)

    colors = plt.cm.nipy_spectral(
        (agglomerative_cluster.labels_.astype(float)/len(np.unique(agglomerative_cluster.labels_))))

    ax.scatter(x[:, 0], x[:, 1], c=colors)
    ax.set(title=f'{title}\nLinkage :{linkage}, Metric :{metric}')

#### Single Linkage | uses the minimum of the distances between all observations of the two sets.

<img src='./notes/single-linkage.PNG'>

Single Linkage is the way of defining the distance between two clusters as the minimum distance between the members of the two clusters. 

If you calculate the pair-wise distance between every point in `cluster A` with every point in `cluster-B`, the smallest distance is taken as the distance between the clusters.

This leads to the generation of very loose clusters which also means that the `intra-cluster variance` is very high. 

In [122]:
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(20,4), constrained_layout=True)
ax = ax.ravel()


for i, (frame, (k, v)) in enumerate(zip(ax, data.items())):
    n_cluster = 3 if k in ['Blobs', 'Different Variance', 'Anisotropic'] else 2
    find_cluster(ax=frame, data=v, title=k, linkage='single', n_cluster=n_cluster)

<img src='./plots/single-linkage.png'>

### Average linkage | uses the average of the distances of each observation of the two sets

<img src='./notes/average-linkage.PNG'>

In Average linkage, the distance between two clusters is the average of all distances between members of two clusters.<br> i.e the distance of a point from every other point in the other cluster is calculated and the average of all the distances is taken.  

In [123]:
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(20,4), constrained_layout=True)
ax = ax.ravel()


for i, (frame, (k, v)) in enumerate(zip(ax, data.items())):
    n_cluster = 3 if k in ['Blobs', 'Different Variance', 'Anisotropic'] else 2
    find_cluster(ax=frame, data=v, title=k, linkage='average', n_cluster=n_cluster)

<img src='./plots/average-linkage.png'>

### Complete linkage | uses the maximum distances between all observations of the two sets

<img src='./notes/complete-linkage.PNG'>

In Complete Linkage, the distance between two clusters is defined by the maximum distance between the members of the two clusters. This leads to the generation of stable and close-knit clusters. 



In [124]:
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(20,4), constrained_layout=True)
ax = ax.ravel()


for i, (frame, (k, v)) in enumerate(zip(ax, data.items())):
    n_cluster = 3 if k in ['Blobs', 'Different Variance', 'Anisotropic'] else 2
    find_cluster(ax=frame, data=v, title=k, linkage='complete', n_cluster=n_cluster)

<img src='./plots/complete-linkage.png'>

### Ward linkage

* Ward’s method is also known as Minimum variance method or Ward’s Minimum Variance Clustering Method
* ‘ward’ minimizes the variance of the clusters being merged.
* Ward linkage creates compact, even-sized clusters
* If linkage is `“ward”`, only `“euclidean”` is accepted as metric.

In [125]:
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(20,4), constrained_layout=True)
ax = ax.ravel()


for i, (frame, (k, v)) in enumerate(zip(ax, data.items())):
    n_cluster = 3 if k in ['Blobs', 'Different Variance', 'Anisotropic'] else 2
    find_cluster(ax=frame, data=v, title=k, linkage='ward', n_cluster=n_cluster)

<img src='./plots/ward-linkage.png'>