### Hierarchical clustering

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is often used when the user wants to understand the relationships between clusters at various levels of granularity. Unlike K-means clustering, hierarchical clustering does not require the number of clusters to be specified in advance.

**There are two main types of hierarchical clustering: agglomerative (bottom-up) and divisive (top-down).**

***Agglomerative (Bottom-Up) Hierarchical Clustering***
1. Initialization:
    * Start with each data point as its own cluster. If you have 𝑛 data points, you start with 𝑛 clusters.

2. Merge Clusters:
    * At each step, merge the two closest clusters. The distance between clusters can be defined in several ways (e.g., single linkage, complete linkage, average linkage, or Ward’s method).

3. Repeat:
    * Repeat the merging steps until only one cluster remains, forming a hierarchical tree (dendrogram).

<img src="agglomerative-clustering.png" width="450" height="350">

***Divisive (Top-Down) Hierarchical Clustering***
1. Initialization:
    * Start with all data points in a single cluster.

2. Split Clusters:
    * At each step, split the cluster into the two least similar clusters.

3. Repeat:
    * Repeat the splitting steps until each data point is in its own cluster.

<img src="Divisive.png" width="450" height="350">

<img src="Divisive-vs-Agglomerative-Clustering.png" width="450">

### Dendrogram

A dendrogram is a tree-like diagram that records the sequences of merges or splits. The height of the branches indicates the distance or dissimilarity between clusters. Cutting the dendrogram at a specific height can provide different numbers of clusters

<img src="dendogram.png">

## Difference B/W K means and Hierarchal

Certainly! Here's the updated table including the handling of categorical and numerical data:

| Feature                     | K-means Clustering                                    | Hierarchical Clustering                               |
|-----------------------------|-------------------------------------------------------|------------------------------------------------------|
| **Initialization**          | Requires the number of clusters \( K \) to be specified beforehand. | Does not require the number of clusters to be specified. |
| **Algorithm Type**          | Partition-based                                      | Hierarchical (can be agglomerative or divisive)      |
| **Cluster Formation**       | Iteratively updates centroids and assigns points to nearest centroid. | Builds a tree of clusters by successive merging (agglomerative) or splitting (divisive). |
| **Distance Metric**         | Typically uses Euclidean distance, but others can be used. | Multiple linkage criteria (single, complete, average, Ward's method) with various distance metrics. |
| **Time Complexity**         | Generally \( O(n \cdot k \cdot t) \), where \( n \) is the number of points, \( k \) is the number of clusters, and \( t \) is the number of iterations. | Generally \( O(n^3) \) for agglomerative methods; more computationally intensive than K-means. |
| **Scalability**             | Scalable to large datasets.                          | Not as scalable to large datasets due to higher computational complexity. |
| **Data Size**               | Suitable for large datasets.                         | More suitable for smaller datasets (typically less than a few thousand data points). |
| **Handling of Data Types**  | Primarily suitable for numerical data. Extensions like K-prototypes exist for mixed data types. | Can handle both numerical and categorical data, depending on the distance metric used. |
| **Output**                  | Flat partition of clusters.                          | Dendrogram representing nested clusters.             |
| **Cluster Shape**           | Assumes spherical clusters of similar size.           | Can capture clusters of arbitrary shape and size.    |
| **Stability**               | Sensitive to initial centroid placement; multiple runs with different initializations recommended. | More stable results as it does not depend on initial parameters. |
| **Handling of Noise**       | Less robust to noise and outliers.                    | More robust to noise and outliers.                   |
| **Hierarchical Structure**  | Does not provide hierarchical relationships between clusters. | Provides a hierarchy of clusters, useful for exploring data at different levels of granularity. |
| **Memory Usage**            | Typically lower memory usage.                        | Higher memory usage due to the storage of the distance matrix. |
