## 1


Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Unlike partition-based methods such as K-means, hierarchical clustering does not require specifying the number of clusters in advance and produces a dendrogram, a tree-like diagram that records the sequences of merges or splits

Differences from Other Clustering Techniques
K-means Clustering:

Cluster Structure: K-means produces a flat partition of clusters without any hierarchy, whereas hierarchical clustering produces a dendrogram.
Number of Clusters: K-means requires the number of clusters to be specified beforehand. Hierarchical clustering does not require this; clusters can be formed by cutting the dendrogram at the desired level.
Distance Metric: K-means uses Euclidean distance by default, while hierarchical clustering can use various distance metrics and linkage criteria.
Centroids vs. Linkages: K-means assigns points to the nearest centroid, whereas hierarchical clustering uses linkages to determine the closeness of clusters.


## 2

Types of Hierarchical Clustering
Agglomerative (Bottom-Up) Clustering:

Process:
Start with each data point as its own cluster.
Iteratively merge the closest pairs of clusters.
Continue until all points are in a single cluster or a desired number of clusters is reached.
Characteristics:
The process is represented as a dendrogram, which can be cut at a desired level to obtain a specific number of clusters.
Common linkage criteria include single linkage (minimum distance), complete linkage (maximum distance), average linkage, and Ward's method (minimizes variance).
Divisive (Top-Down) Clustering:

Process:
Start with all data points in a single cluster.
Iteratively split clusters into smaller clusters.
Continue until each data point is in its own cluster or a desired number of clusters is reached.
Characteristics:
Less commonly used due to higher computational cost compared to agglomerative clustering.

## 3

1. Distance Between Clusters (Linkage Criteria)
The distance between two clusters 
𝐶
𝑖
C 
i
​
  and 
𝐶
𝑗
C 
j
​
  is typically based on the distances between their member data points. Several common linkage criteria or distance metrics are used to compute this distance:

2. Common Distance Metrics (Linkage Criteria)
Single Linkage (Minimum Linkage):

Definition: Measures the shortest distance between clusters based on the closest pair of points (one from each cluster).
Formula: 
𝑑
(
𝐶
𝑖
,
𝐶
𝑗
)
=
min
⁡
𝑥
∈
𝐶
𝑖
,
𝑦
∈
𝐶
𝑗
dist
(
𝑥
,
𝑦
)
d(C 
i
​
 ,C 
j
​
 )=min 
x∈C 
i
​
 ,y∈C 
j
​
 
​
 dist(x,y)
Characteristics: Tends to form elongated clusters and is sensitive to outliers and noise.
Complete Linkage (Maximum Linkage):

Definition: Measures the longest distance between clusters based on the farthest pair of points (one from each cluster).
Formula: 
𝑑
(
𝐶
𝑖
,
𝐶
𝑗
)
=
max
⁡
𝑥
∈
𝐶
𝑖
,
𝑦
∈
𝐶
𝑗
dist
(
𝑥
,
𝑦
)
d(C 
i
​
 ,C 
j
​
 )=max 
x∈C 
i
​
 ,y∈C 
j
​
 
​
 dist(x,y)
Characteristics: Produces more compact clusters and is less sensitive to outliers compared to single linkage.
Average Linkage:

Definition: Measures the average distance between all pairs of points (one from each cluster).
Formula: 
𝑑
(
𝐶
𝑖
,
𝐶
𝑗
)
=
1
∣
𝐶
𝑖
∣
⋅
∣
𝐶
𝑗
∣
∑
𝑥
∈
𝐶
𝑖
,
𝑦
∈
𝐶
𝑗
dist
(
𝑥
,
𝑦
)
d(C 
i
​
 ,C 
j
​
 )= 
∣C 
i
​
 ∣⋅∣C 
j
​
 ∣
1
​
 ∑ 
x∈C 
i
​
 ,y∈C 
j
​
 
​
 dist(x,y)
Characteristics: Balances between single and complete linkage, providing a compromise between cluster compactness and sensitivity to noise.

## 4

Determining the optimal number of clusters in hierarchical clustering can be approached in several ways, leveraging the dendrogram and other metrics. Here are some common methods used for determining the optimal number of clusters:

1. Inspecting the Dendrogram:
Method: Visual examination of the dendrogram.
Explanation: The dendrogram visually displays how clusters are merged as you move up from the leaves to the root. The optimal number of clusters can often be identified by looking for the largest vertical distance that doesn't have a horizontal line passing through it. This distance indicates the greatest dissimilarity at which clusters are merged.
2. Using the Elbow Method:
Method: Analyzing the rate of change of within-cluster dissimilarities.
Explanation: Calculate the dissimilarity measure (e.g., distance) at each merge step and plot it against the number of clusters. Look for a point where the rate of change sharply decreases (forming an "elbow"), suggesting that further merging provides diminishing returns in terms of reducing 

## 5

Dendrograms in hierarchical clustering are tree-like structures that visually represent the merging (agglomerative) or splitting (divisive) of clusters at each step of the clustering process. They are fundamental to hierarchical clustering as they provide a detailed and intuitive way to understand the relationships and hierarchy among clusters and data points. Here’s how dendrograms are structured and why they are useful in analyzing clustering results:

Structure of Dendrograms:
Vertical Axis: Represents the distance or dissimilarity between clusters or data points. The height of each fusion (or split) in the dendrogram indicates how dissimilar (or similar) the clusters being merged (or split) are.

Horizontal Axis: Represents individual data points or clusters. Each data point starts as its own cluster, and clusters are progressively merged (agglomerative clustering) or split (divisive clustering) as you move from left to right along the horizontal axis.

Branches and Nodes: Connections between branches (or nodes) indicate where clusters merge or split. The length of these connections typically represents the distance or dissimilarity at which the merge or split occurs.

## 6

Yes, hierarchical clustering can be used for both numerical and categorical data. However, the choice of distance metrics or similarity measures differs depending on the type of data being clustered:

Numerical Data:
For numerical data, distance metrics typically measure the distance between data points in a continuous space. Common distance metrics used include:

Euclidean Distance:

Definition: Measures the straight-line distance between two points in a Euclidean space.
Formula: 
𝑑
(
𝑥
,
𝑦
)
=
∑
𝑖
=
1
𝑛
(
𝑥
𝑖
−
𝑦
𝑖
)
2
d(x,y)= 
∑ 
i=1
n
​
 (x 
i
​
 −y 
i
​
 ) 
2
 
​
 
Characteristics: Suitable for data where the magnitude and scale of differences between numerical attributes matter.
Manhattan Distance (City Block Distance):

Definition: Measures the sum of absolute differences between corresponding attributes.
Formula: 
𝑑
(
𝑥
,
𝑦
)
=
∑
𝑖
=
1
𝑛
∣
𝑥
𝑖
−
𝑦
𝑖
∣
d(x,y)=∑ 
i=1
n
​
 ∣x 
i
​
 −y 
i
​
 ∣
Characteristics: Useful when the data attributes are measured in different units or scales.
Cosine Similarity:

Definition: Measures the cosine of the angle between two vectors in a multi-dimensional space.
Formula: 
similarity
(
𝑥
,
𝑦
)
=
𝑥
⋅
𝑦
∥
𝑥
∥
∥
𝑦
∥
similarity(x,y)= 
∥x∥∥y∥
x⋅y
​
 
Characteristics: Suitable for high-dimensional data where the magnitude of vectors is more important than their absolute values.
Categorical Data:
For categorical data, different distance metrics that account for the discrete nature of attributes are used:

Hamming Distance:

Definition: Measures the number of positions at which corresponding elements are different between two vectors.

## 7

Perform Hierarchical Clustering:

Apply hierarchical clustering to your dataset using an appropriate distance metric (e.g., Euclidean distance for numerical data, Hamming distance for categorical data).
Choose a linkage method (e.g., single linkage, complete linkage) that suits your data and clustering objectives.
Construct the Dendrogram:

Visualize the resulting dendrogram to understand how clusters are formed and how data points are grouped together.
The dendrogram provides insights into the hierarchical structure of clusters and helps in identifying outliers based on their distance from other clusters.
Identify Outlier Branches or Singletons:

Look for branches in the dendrogram that are isolated or have a large height (distance) compared to others.
Points that do not merge with any other cluster until a high distance in the dendrogram can indicate potential outliers.