# Models based on distances: Clustering
Remembering, clustering consists of group samples based on the features.
<img src="images/4_clustering.png">

# Hierarchical Clustering
Unsupervised learning: Clustering
Variable type: all

Hierarchical clustering works in an iterative way. The algorithm is:

    Input: samples
    Begin
        Each sample is a cluster
        Repeat until there is only one cluster
        Join the nearest two clusters
    End

<img src="images/4_hierarchical_clustering.png">

We can represent the clustering process with a __dendrogram__ that is a binary tree where the length of the branch represents the distance where the samples were joined. It can be used to analyze the number of clusters. By pruning the tree, the clusters can be found. [See examples](https://www.google.com/search?q=hierarchical+clustering&safe=strict&rlz=1C1SQJL_enMX896MX896&sxsrf=ALeKk02baKxOqwwmU5bdgI_nNPd5XKnDYg:1615495969868&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjV0627j6nvAhXihK0KHa18CoQQ_AUoAXoECAkQAw&biw=1517&bih=631).

__Disadvantage__: It is very complex because the distances among all clusters (at the beginning of all samples) need to be calculated.

# k-Means
Unsupervised learning: Clustering
Variable type: all

k-means finds k groups in the unlabeled data. The algorithm is:


    Input: samples and k (the number of clusters)
    Begin
        Randomly select k prototypes
        Repeat until the prototypes don’t move
            Assign the samples to the nearest prototype
            Update the prototypes as the centroid of the samples
    End

<img src="images/4_kmeans.png">

__Disadvantage__: You need to set the number of groups __k__.

# Gaussian Mixture Models (GMM)
Unsupervised learning: Clustering
Variable type: all

In the next figure we can observe the Normal distribution, represented as a Gaussian function.
<img src="images/4_gmm_gaussian.png">

The probability density function of a normal distribution centered in $u$ with covariance $\sigma^2$ is:
$ P(x|u,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-u)^2}{2\sigma^2}} $

The probability density function of a multivariate normal distribution centered in $\mu \in \mathbb{R}^d$, with covariance matrix $\sum \in \mathbb{R}^{dxd}$ is:
$ P(X|\mu,\sum)= \frac{1}{\sqrt{ (2\pi)^d |\sum|}} e^{-\frac{1}{2} (X-\mu)^T \sum^{-1}(X-\mu)} $

A GMM can be used for finding k groups in unlabeled data. The main difference with k-means is that it searches groups that belong to multivariate normal distributions instead of spheres. Its algorithm is:

    Input: X (training set nSamples x nFeatures) and k (number of Gaussians)
    Randomly calculate K prototypes (centroids and covariance matrices)
    Repeat until convergence
        Assign the samples to the Gaussian with more likelihood
        Calculate the parameters of the K Gaussians based on the samples assigned to them