# question 1

In [1]:
# Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is distinct from other clustering techniques like K-means clustering in several ways, including its approach to forming clusters, its flexibility in determining the number of clusters, and its ability to produce a nested sequence of clusters.

# Types of Hierarchical Clustering
# There are two main types of hierarchical clustering:

# Agglomerative Hierarchical Clustering (AHC)

# Bottom-up approach: Starts with each data point as a single cluster and iteratively merges the closest pairs of clusters until only one cluster remains or a stopping criterion is met.
# Steps:
# Compute the distance (similarity) matrix between all data points.
# Merge the two closest clusters.
# Update the distance matrix to reflect the distance between the new cluster and the remaining clusters.
# Repeat steps 2 and 3 until the desired number of clusters is reached or all points are in a single cluster.
# Divisive Hierarchical Clustering (DHC)

# Top-down approach: Starts with all data points in one cluster and iteratively splits the most dissimilar clusters until each data point is its own cluster or a stopping criterion is met.
# Steps:
# Start with a single cluster containing all data points.
# Split the cluster into two based on the greatest dissimilarity.
# Repeat step 2 for each resulting cluster until the desired number of clusters is reached or each point is in its own cluster.
# Comparison with Other Clustering Techniques
# K-means Clustering:

# Approach: K-means is a partitional clustering method that partitions the data into 
# 𝐾
# K clusters by minimizing the sum of squared distances between data points and their corresponding cluster centroids.
# Number of Clusters: Requires the number of clusters 
# 𝐾
# K to be specified in advance.
# Cluster Shape: Assumes spherical clusters of similar size.
# Initial Centroids: Results can vary based on the initial choice of centroids.
# DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

# Approach: Forms clusters based on the density of data points, identifying regions of high density separated by regions of low density.
# Number of Clusters: Does not require the number of clusters to be specified in advance.
# Cluster Shape: Can find clusters of arbitrary shape.
# Handling Noise: Explicitly identifies noise points (outliers).
# Key Characteristics of Hierarchical Clustering
# Dendrogram: Hierarchical clustering results are often visualized using a dendrogram, a tree-like diagram that shows the order in which clusters are merged or split. The height of the branches represents the distance or dissimilarity at which clusters are merged or split.
# Flexibility: Does not require the number of clusters to be specified in advance. The number of clusters can be chosen by cutting the dendrogram at the desired level.
# Cluster Shape: Can handle clusters of various shapes and sizes.
# Computational Complexity: Generally more computationally intensive than K-means, especially for large datasets, due to the need to compute and update the distance matrix.


# question 2

In [2]:
# The two main types of hierarchical clustering algorithms are Agglomerative Hierarchical Clustering (AHC) and Divisive Hierarchical Clustering (DHC). Here’s a brief description of each:

# 1. Agglomerative Hierarchical Clustering (AHC)
# Description: Agglomerative hierarchical clustering is a bottom-up approach. It starts with each data point as a single cluster and then successively merges the closest pairs of clusters until all points are in a single cluster or the desired number of clusters is achieved.

# Steps:

# Initialization: Begin with 
# 𝑛
# n clusters (each data point is its own cluster).
# Compute Distances: Calculate the distance (or dissimilarity) matrix for all pairs of clusters.
# Merge Closest Clusters: Identify the two clusters that are closest to each other and merge them to form a new cluster.
# Update Distances: Recalculate the distance matrix to reflect the distances between the new cluster and the remaining clusters.
# Repeat: Repeat steps 3 and 4 until only one cluster remains or the desired number of clusters is reached.
# Key Characteristics:

# Linkage Criteria: Various methods to measure the distance between clusters, such as single linkage (minimum distance), complete linkage (maximum distance), average linkage (mean distance), and Ward's method (minimizes the total within-cluster variance).
# Dendrogram: The results are often visualized using a dendrogram, which shows the order in which clusters are merged and the distances at which merges occur.
# 2. Divisive Hierarchical Clustering (DHC)
# Description: Divisive hierarchical clustering is a top-down approach. It starts with all data points in a single cluster and then recursively splits the most heterogeneous clusters until each data point is in its own cluster or the desired number of clusters is achieved.

# Steps:

# Initialization: Begin with a single cluster containing all data points.
# Split Clusters: Identify the cluster to split based on some criterion of heterogeneity (e.g., the cluster with the highest variance).
# Compute Distances: Calculate the distance matrix for the data points within the cluster to be split.
# Form Subclusters: Split the selected cluster into two smaller clusters based on the chosen distance measure and splitting criterion.
# Repeat: Repeat steps 2 to 4 until each data point is its own cluster or the desired number of clusters is reached.

# question 3

In [3]:

# In hierarchical clustering, the distance between two clusters is a key factor in deciding which clusters to merge or split at each step. The method used to compute this distance is known as a linkage criterion. There are several common linkage criteria, each with its own way of defining the distance between clusters. Here are the most commonly used linkage criteria:

# 1. Single Linkage (Minimum Linkage)
# Definition: The distance between two clusters is defined as the minimum distance between any single point in the first cluster and any single point in the second cluster.

# Formula:

# D(A,B)=min{d(a,b)∣a∈A,b∈B}

# Characteristics:

# Tends to create long, "chain-like" clusters.
# Sensitive to noise and outliers.
# Useful when the clusters are naturally elongated.
# 2. Complete Linkage (Maximum Linkage)
# Definition: The distance between two clusters is defined as the maximum distance between any single point in the first cluster and any single point in the second cluster.

# D(A,B)=max{d(a,b)∣a∈A,b∈B}

# Characteristics:

# Tends to create compact, spherical clusters.
# Less sensitive to noise and outliers compared to single linkage.
# Useful when the clusters are compact and well-separated.
# 3. Average Linkage (Mean Linkage)
# Definition: The distance between two clusters is defined as the average of all pairwise distances between points in the first cluster and points in the second cluster.


# D(A,B)= ∣A∣⋅∣B∣

# a∈A
# b∈B
# d(a,b)

# Characteristics:

# Provides a balance between single and complete linkage.
# Tends to create clusters with moderate compactness and separation.
# 4. Centroid Linkage
# Definition: The distance between two clusters is defined as the distance between their centroids (mean points).

# Characteristics:

# Sensitive to the shape and size of clusters.
# Can lead to distortions if clusters vary greatly in size.
# 5. Ward's Method
# Definition: The distance between two clusters is defined as the increase in the total within-cluster variance when they are merged. This method aims to minimize the total within-cluster variance.



# question 4

In [4]:
# Determining the optimal number of clusters in hierarchical clustering is an important step in the clustering process. Since hierarchical clustering does not require the number of clusters to be specified a priori, several methods can be employed to decide on the optimal number. Here are some common techniques used for this purpose:

# 1. Dendrogram
# Description: A dendrogram is a tree-like diagram that shows the arrangement of the clusters produced by hierarchical clustering. The vertical axis represents the distance or dissimilarity between clusters.

# Method:

# Visual Inspection: Cut the dendrogram at a height where there is a significant increase in the distance between successive merges. This point often represents a natural division in the data.
# Largest Gap: Identify the largest vertical gap (height difference) in the dendrogram and cut at this point.
# 2. Elbow Method
# Description: The elbow method involves plotting the total within-cluster variance (also known as inertia) against the number of clusters and identifying the "elbow" point where the rate of decrease sharply slows down.

# Method:

# Perform hierarchical clustering and compute the within-cluster variance for different numbers of clusters.
# Plot the within-cluster variance against the number of clusters.
# Identify the point where the curve starts to flatten, resembling an "elbow."
# 3. Silhouette Analysis
# Description: The silhouette score measures how similar an object is to its own cluster compared to other clusters. The average silhouette score over all the data points indicates the quality of the clustering.

# Method:

# Perform hierarchical clustering for different numbers of clusters.
# Compute the silhouette score for each number of clusters.
# Plot the average silhouette score against the number of clusters.
# The optimal number of clusters is the one that maximizes the average silhouette score.
# 4. Gap Statistic
# Description: The gap statistic compares the total within-cluster variance for different numbers of clusters with the expected within-cluster variance under a null reference distribution.

# Method:

# Perform hierarchical clustering for different numbers of clusters.
# Compute the within-cluster variance for each number of clusters.
# Generate reference datasets (e.g., by sampling uniformly from the data space) and compute the within-cluster variance for these reference datasets.
# Calculate the gap statistic, which is the difference between the log of the observed within-cluster variance and the expected within-cluster variance.
# The optimal number of clusters is the one that maximizes the gap statistic.
# 5. Cross-Validation
# Description: Cross-validation can be used to evaluate the stability and robustness of the clustering for different numbers of clusters.

# Method:

# Perform hierarchical clustering on subsets of the data (e.g., using k-fold cross-validation) for different numbers of clusters.
# import numpy as np
# import matplotlib.pyplot as plt
# from sklearn.datasets import load_iris
# from sklearn.metrics import silhouette_score
# from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
# from sklearn.preprocessing import StandardScaler

# # Load the Iris dataset
# iris = load_iris()
# X = iris.data

# # Standardize the data
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(X)

# # Perform hierarchical clustering
# linked = linkage(X_scaled, method='ward')

# # Plot the dendrogram
# plt.figure(figsize=(10, 7))
# dendrogram(linked, truncate_mode='level', p=5)
# plt.title('Dendrogram')
# plt.xlabel('Sample index')
# plt.ylabel('Distance')
# plt.show()

# # Determine the optimal number of clusters using silhouette analysis
# silhouette_scores = []
# range_n_clusters = list(range(2, 11))

# for n_clusters in range_n_clusters:
#     cluster_labels = fcluster(linked, n_clusters, criterion='maxclust')
#     silhouette_avg = silhouette_score(X_scaled, cluster_labels)
#     silhouette_scores.append(silhouette_avg)

# # Plot silhouette scores
# plt.figure(figsize=(10, 7))
# plt.plot(range_n_clusters, silhouette_scores, marker='o')
# plt.title('Silhouette Analysis')
# plt.xlabel('Number of clusters')
# plt.ylabel('Silhouette Score')
# plt.show()

# # Optimal number of clusters is the one with the highest silhouette score
# optimal_n_clusters = range_n_clusters[np.argmax(silhouette_scores)]
# print(f'Optimal number of clusters: {optimal_n_clusters}')


# question 5

In [None]:
# Dendrograms in Hierarchical Clustering
# A dendrogram is a tree-like diagram that records the sequences of merges or splits in hierarchical clustering. It provides a visual representation of the hierarchical relationships between clusters formed during the clustering process. The vertical axis of the dendrogram represents the distance or dissimilarity between clusters, while the horizontal axis represents the data points.

# Structure of a Dendrogram
# Leaves: The leaves (bottom nodes) of the dendrogram represent individual data points.
# Branches: The branches (edges) of the dendrogram represent clusters formed by merging or splitting data points or other clusters.
# Heights: The height of each branch represents the distance (or dissimilarity) at which the merge or split occurs. Larger heights indicate greater dissimilarity.
# How to Read a Dendrogram
# Vertical Position: The vertical position of a merge indicates the distance between the clusters being merged. Merges that occur at lower heights indicate more similar clusters.
# Horizontal Lines: Each horizontal line represents a merge. The height of the line indicates the distance at which the merge occurs.
# Cutting the Dendrogram: By cutting the dendrogram at a certain height, you can obtain a desired number of clusters. The number of vertical lines that intersect the cut line corresponds to the number of clusters.
# Usefulness of Dendrograms
# Visualizing Cluster Hierarchies:

# Dendrograms provide a clear and intuitive visualization of the hierarchical structure of clusters.
# They show how clusters are nested within one another and the sequence of merges or splits.
# Determining the Number of Clusters:

# By inspecting the dendrogram, you can decide where to cut the tree to form clusters.
# A significant increase in the height of merges suggests natural divisions in the data, helping to determine the optimal number of clusters.
# Understanding Cluster Similarity:

# The height of merges indicates the similarity between clusters. Clusters merged at lower heights are more similar than those merged at higher heights.
# This helps in understanding the relative similarity between different clusters.
# Identifying Outliers:

# Data points that merge with others at very high heights (distances) may be outliers.
# Dendrograms can help identify these outliers visually.