# Assignment

## Q1. What is hierarchical clustering, and how is it different from other clustering techniques?
Hierarchical clustering is a method of clustering that builds a hierarchy of clusters. It either begins with each data point as its own cluster and iteratively merges them (agglomerative) or starts with all data points in a single cluster and recursively splits them (divisive). Unlike other clustering techniques like K-means, hierarchical clustering doesn't require the number of clusters to be predefined.

Key Differences:

No need for a predefined number of clusters: Unlike K-means, hierarchical clustering doesn't require the number of clusters 
𝐾
K to be specified in advance.
Hierarchical structure: It provides a dendrogram, which is a tree-like diagram showing how clusters are merged or split at each step.
Flexibility in distance metrics: Hierarchical clustering allows various distance metrics to determine how clusters are merged.

## Q2. What are the two main types of hierarchical clustering algorithms? Describe each in brief.
Agglomerative Hierarchical Clustering:

Approach: A "bottom-up" approach where each data point starts in its own cluster. At each step, the closest clusters are merged based on a chosen distance metric until all points belong to one large cluster.
Steps:
Treat each data point as an individual cluster.
Compute the distance between all clusters and merge the closest two clusters.
Repeat until all data points are merged into a single cluster.
Divisive Hierarchical Clustering:

Approach: A "top-down" approach that starts with all data points in one large cluster and splits clusters recursively until each data point is its own cluster.
Steps:
Treat all data points as one cluster.
Recursively split the clusters based on a distance metric until each data point forms its own cluster.

## Q3. How do you determine the distance between two clusters in hierarchical clustering, and what are the common distance metrics used?
The distance between two clusters in hierarchical clustering can be determined using different linkage criteria:

Single Linkage:

The distance between two clusters is the minimum distance between any two points in the clusters.
Effect: Can result in elongated, chain-like clusters.
Complete Linkage:

The distance between two clusters is the maximum distance between any two points in the clusters.
Effect: Tends to produce more compact, spherical clusters.
Average Linkage:

The distance between two clusters is the average distance between all pairs of points in the clusters.
Effect: Provides a balance between single and complete linkage.
Centroid Linkage:

The distance between two clusters is the distance between their centroids (mean points of the clusters).
Effect: Works well when clusters have similar shapes and sizes.
Common distance metrics include:

Euclidean Distance: Most commonly used, measures straight-line distance between points.
Manhattan Distance: Measures the sum of the absolute differences between coordinates.
Cosine Distance: Measures the cosine of the angle between two vectors (often used in text mining).
Mahalanobis Distance: Accounts for the correlations of data points and scales with covariance.

## Q4. How do you determine the optimal number of clusters in hierarchical clustering, and what are some common methods used for this purpose?
To determine the optimal number of clusters in hierarchical clustering, the following methods are commonly used:

Dendrogram Cutting:

Visualize the dendrogram and "cut" it at the level where the vertical distance between merges is the largest (indicating the greatest dissimilarity between clusters). The number of clusters is determined by the number of branches that remain below the cut.
Elbow Method:

Plot the within-cluster sum of squares (or other evaluation metrics) as a function of the number of clusters. The "elbow" point, where the rate of improvement significantly decreases, indicates the optimal number of clusters.
Silhouette Score:

Measures how similar a data point is to its own cluster compared to other clusters. A higher silhouette score indicates better-defined clusters, and the number of clusters with the highest score is optimal.
Gap Statistic:

Compares the within-cluster dispersion of the hierarchical clustering solution to that of a random distribution. The optimal number of clusters is the one that maximizes the gap between the two.

## Q5. What are dendrograms in hierarchical clustering, and how are they useful in analyzing the results?
A dendrogram is a tree-like diagram that illustrates the arrangement of clusters produced by hierarchical clustering. The branches represent the data points, and the vertical axis represents the distance or dissimilarity at which clusters are merged.

Usefulness:

Visualizing the clustering process: The dendrogram shows the sequence of cluster merges or splits, helping to visualize how clusters form.
Choosing the number of clusters: By cutting the dendrogram at different levels, you can explore various clustering solutions.
Interpreting the similarity between clusters: The height of the branches reflects the dissimilarity between merged clusters. Shorter branches indicate more similar clusters.

## Q6. Can hierarchical clustering be used for both numerical and categorical data? If yes, how are the distance metrics different for each type of data?
Yes, hierarchical clustering can be applied to both numerical and categorical data, but the distance metrics differ:

Numerical Data:

Common metrics include Euclidean distance, Manhattan distance, and Mahalanobis distance.
Categorical Data:

Specialized metrics like Hamming distance (measures the proportion of mismatched attributes) or Jaccard similarity (for binary data) are used.
Mixed Data:

When dealing with both numerical and categorical data, a combined distance measure like Gower's distance can be used, which handles mixed data types by normalizing the distances for each type.

## Q7. How can you use hierarchical clustering to identify outliers or anomalies in your data?
Outliers or anomalies in hierarchical clustering can be identified by analyzing the structure of the dendrogram:

Large distances between data points and clusters: In the dendrogram, data points that merge with clusters at a much larger distance than other points can be considered outliers. These points are visually far from other points in terms of their vertical height in the dendrogram.

Small singleton clusters: Points that remain in their own clusters until the very end of the clustering process may be outliers.

By carefully examining how certain points behave during the clustering process (such as joining clusters at distant stages), hierarchical clustering can help detect anomalies