# 1. Hierarchical Clustering
We are now going to talk about a different technique for building clustering known as: **Agglomerative Clustering**. If you have ever studied algorithms, you will recognize this as a greedy algorithm. We are going to be purposefully short sighted, and make what appears to be the best decision at the time.

The basic idea looks something like this:
> * **Start with a set of points**
* **Merge the 2 closest**
* **Repeat until you only have 1 group of all the points**

## 1.1 Agglomerative Clustering 

An awesome video to help explain this can be seen <a href="https://www.youtube.com/watch?v=XJ3194AmH40">here</a>.

In order to work with the outcome of agglomerative clustering, we can use what is referred to as a **dendrogram**. In a dendrogram, the height of a cluster *(A,B)* is proportional to the height of the line that joins $A$ and $B$. 

## 1.2 How to calculate distance?
So far, we have assumed that we are using **Euclidean Distance** are our measure of the closeness of the clusters. However, this does not need to be the case. There are other methods that can be used as well, seen <a href="https://numerics.mathdotnet.com/distance.html">here</a>. Note, there are certain things that make a distance valid, which are discussed more <a href="https://en.wikipedia.org/wiki/Metric_(mathematics)">here</a>.

## 1.3 How do we join clusters together?
Since we did have not gone too in depth regarding hierarchical clustering, we still don't fully know how to define the distance between 2 clusters. We have several options:

> * **The mean distance between two clusters**
* **The distance between the 2 closest cluster points**
* **The distance between the 2 furthest cluster points**

<img src="images/linkages.jpg">

### 1.3.1 Single-Linkage
This is where we look at each point in cluster 1, and find the closest point in cluster 2. This looks like:
> `d(clusterA, clusterB) = min distance between any 2 points, 1 from A, 1 from B`

The pseudocode may look something like this:

**Pseudocode**<br>
```
min_dist = Infinity
for p1 in cluster1:
    for p2 in cluster2:
        min_dist = min( d(p1, p2), min_dist)
```

A down side to this method is that we may get something called **the chaining effect**. This is where we just keep choosing the thing beside our current cluster. However, we end up choosing something where the total points end up very far apart. 

### 1.3.2 Complete-Linkage
The opposite of single linkage clustering is **complete linkage clustering**. This is where we look at each point in cluster 1, and find the furthest point in cluster 2. This looks like:

> `d(clusterA, clusterB) = max distance between any 2 points, 1 from A, 1 from B`

The pseudocode may look something like this:

**Pseudocode**<br>
```
max_dist = Infinity
for p1 in cluster1:
    for p2 in cluster2:
        min_dist = max( d(p1, p2), max_dist)
```

### 1.3.3 Mean Distance
The third type of clustering, which is probably the most intuitive, is to just take the mean distance. 

**Pseudocode**<br>
```
dist = 0
for p1 in cluster1:
    for p2 in cluster2:
        dist += d(p1, p2)
    dist = dist / (len(cluster1)*len(cluster2))
```