# Questions

In [None]:
Q1. What is hierarchical clustering, and how is it different from other clustering techniques?

Q2. What are the two main types of hierarchical clustering algorithms? Describe each in brief.

Q3. How do you determine the distance between two clusters in hierarchical clustering, and what are the
common distance metrics used?

Q4. How do you determine the optimal number of clusters in hierarchical clustering, and what are some
common methods used for this purpose?

Q5. What are dendrograms in hierarchical clustering, and how are they useful in analyzing the results?

Q6. Can hierarchical clustering be used for both numerical and categorical data? If yes, how are the
distance metrics different for each type of data?

Q7. How can you use hierarchical clustering to identify outliers or anomalies in your data?


## Solutions

In [None]:
#Sol1...

#Hierarchical clustering is a clustering technique that builds a hierarchy of clusters by either:

Agglomerative (Bottom-Up): Starts with each data point as its own cluster and merges them iteratively based on similarity until all points 
are in one cluster.
    
Divisive (Top-Down): Starts with all points in one cluster and splits them iteratively into smaller clusters.

Key Differences:
No predefined k required (unlike K-means).
Produces a dendrogram for visualizing clusters at different levels.
Slower on large datasets compared to K-means or DBSCAN.
Handles complex shapes better than K-means, which assumes spherical clusters.
    
It's useful for exploring data hierarchically but less efficient for large datasets.

In [None]:
#Sol2...

#The two main types of hierarchical clustering algorithms are:

### 1. Agglomerative Hierarchical Clustering (Bottom-Up)
- **Process**: This is the most common type. It starts with each data point as its own individual cluster. The algorithm then iteratively merges the 
    closest clusters based on a similarity metric (e.g., Euclidean distance) until all points are merged into a single cluster or a stopping criterion
    is met.
- **Output**: The result is a dendrogram, which visually represents the merging process and can be cut at different levels to form clusters of varying 
    sizes.

### 2. Divisive Hierarchical Clustering (Top-Down)
- **Process**: This method begins with all data points in a single cluster. It then recursively splits the cluster into smaller clusters, focusing on 
   the most dissimilar points to separate. This continues until each data point is in its own cluster or a stopping criterion is reached.

- **Output**: Similar to agglomerative clustering, divisive clustering can also produce a dendrogram showing the splitting process.

Both methods create a hierarchical structure but approach the clustering process from opposite directions.


In [None]:
#Sol3...

#In hierarchical clustering, the distance between two clusters can be determined using various methods:

### **Common Distance Metrics**:
1. **Single Linkage**: Distance between the closest points in two clusters.
2. **Complete Linkage**: Distance between the furthest points in two clusters.
3. **Average Linkage**: Average distance between all pairs of points in the two clusters.
4. **Centroid Linkage**: Distance between the centroids (mean points) of the clusters.
5. **Ward’s Method**: Minimizes within-cluster variance during merging.

### **Distance Measures**:
- **Euclidean Distance**: Straight-line distance.
- **Manhattan Distance**: Sum of absolute differences.
- **Cosine Similarity**: Cosine of the angle between two vectors.

These metrics quantify similarity or dissimilarity between clusters in hierarchical clustering.
    

In [None]:
#Sol4...

Determining the optimal number of clusters in hierarchical clustering can be approached using several methods:

#Common Methods:

Dendrogram Analysis:
Visual Inspection: Examine the dendrogram to find a suitable point to cut the tree, which indicates the number of clusters. Look for large 
vertical gaps between merges.
    
Silhouette Score:
Calculate the silhouette score for different numbers of clusters. A higher silhouette score indicates better-defined clusters.
                        
Gap Statistic:

Compare the within-cluster dispersion for the actual clustering with that of a random reference dataset. The optimal number of clusters is where
the gap is largest.
                        
Elbow Method:

While less common in hierarchical clustering, you can plot the total within-cluster variance against the number of clusters and look for an "elbow" 
point where the variance decreases significantly.
                                                                                                                                    

In [None]:
#Sol5...

Dendrograms are tree-like diagrams used in hierarchical clustering to visually represent the arrangement of clusters formed during the clustering 
process.
Each leaf of the dendrogram represents an individual data point, while the branches illustrate the merging of clusters based on their 
similarity or distance.

### Usefulness of Dendrograms:
1. **Visual Representation**: They provide a clear visual representation of the clustering structure, making it easier to understand how clusters are
    formed.
2. **Determining the Number of Clusters**: By observing the height at which clusters merge, one can determine an appropriate number of clusters by 
    setting a threshold.
3. **Identifying Relationships**: They help identify relationships and similarities between data points and clusters.
4. **Data Exploration**: Dendrograms can facilitate data exploration by highlighting potential subgroups within the data.

Overall, dendrograms are essential for interpreting and validating the results of hierarchical clustering.


In [None]:
#Sol6...

Yes, hierarchical clustering can be used for both numerical and categorical data, but the distance metrics used to measure 
similarity or dissimilarity differ between the two types.

### Distance Metrics

1. **Numerical Data**:
   - **Euclidean Distance**: Commonly used for continuous data, measuring the straight-line distance between points in a multidimensional space.
   - **Manhattan Distance**: Measures the sum of absolute differences between coordinates, often used when dealing with high-dimensional data or 
                             outliers.

2. **Categorical Data**:
   - **Hamming Distance**: Measures the proportion of differing attributes between two categorical variables, useful for binary data.
   - **Jaccard Distance**: Used for comparing the similarity of two sets, defined as the size of the intersection divided by the size of the union, 
                           applicable for nominal data.
   - **Gower’s Distance**: A flexible metric that can handle mixed data types, combining both categorical and numerical features by normalizing the 
                           contributions from each attribute.

### Summary
In summary, hierarchical clustering is applicable to both numerical and categorical data, but the choice of distance metric varies based on the
data type to effectively capture the underlying relationships.

In [None]:
#Sol7...

Hierarchical clustering identifies outliers by:

1. **Clustering**: Grouping similar data points together.
2. **Dendrogram Analysis**: Examining the dendrogram to find isolated branches, which indicate potential outliers.
3. **Setting Thresholds**: Cutting the dendrogram at a height that reveals clusters, leaving distant points as outliers.
4. **Cluster Size Examination**: Identifying small clusters that may represent outliers.

This process helps pinpoint data points that deviate significantly from the norm.
