In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
print("Loading dataset...")
iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
target = pd.Series(iris.target, name='target')

# Display first few rows of the dataset
print("First few rows of the dataset:")
print(data.head())

# Standardize the features
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Perform Hierarchical Clustering
print("Performing Hierarchical Clustering...")
Z = linkage(data_scaled, method='ward')

# Plot the dendrogram
plt.figure(figsize=(12, 8))
dendrogram(Z, labels=target.values, leaf_rotation=90, leaf_font_size=12)
plt.title('Dendrogram')
plt.xlabel('Sample index')
plt.ylabel('Distance')
plt.show()

# Determine the optimal number of clusters using a cut-off on the dendrogram
max_d = 7.5  # Example cutoff distance for clusters
clusters = fcluster(Z, max_d, criterion='distance')
data['Cluster'] = clusters

# Summary of clustering results
print("Cluster distribution:")
print(data['Cluster'].value_counts())

print("""
Q1. Hierarchical Clustering:
Hierarchical clustering is a clustering technique that builds a hierarchy of clusters. Unlike K-Means, which requires specifying the number of clusters in advance, hierarchical clustering does not require this, and the number of clusters can be determined after the algorithm has run.

**Differences from Other Clustering Techniques**:
- **K-Means**: Partitional method that requires specifying K clusters. Hierarchical clustering creates a tree of clusters without needing a predefined number of clusters.
- **DBSCAN**: Density-based method that can find clusters of arbitrary shapes. Hierarchical clustering builds a nested hierarchy of clusters.

Q2. Types of Hierarchical Clustering Algorithms:
1. **Agglomerative Clustering**:
   - **Bottom-Up Approach**: Starts with each data point as its own cluster and merges the closest pairs of clusters until all points belong to a single cluster or a stopping criterion is met.
   - **Linkage Methods**: Determines the distance between clusters (e.g., single-linkage, complete-linkage, average-linkage).

2. **Divisive Clustering**:
   - **Top-Down Approach**: Starts with a single cluster containing all data points and recursively splits the cluster into smaller clusters until each point is in its own cluster or a stopping criterion is met.
   - **Less common** compared to agglomerative clustering.

Q3. Distance Metrics for Clustering:
- **Euclidean Distance**: Standard metric for continuous data, calculates the straight-line distance between two points.
- **Manhattan Distance**: Calculates the distance as the sum of absolute differences between coordinates.
- **Cosine Similarity**: Measures the cosine of the angle between two vectors; often used in text data.

Q4. Determining the Optimal Number of Clusters:
**Methods**:
1. **Dendrogram Analysis**: Examine the dendrogram and choose the number of clusters based on the largest vertical distance (cut-off point).
2. **Silhouette Score**: Measure how similar a point is to its own cluster compared to other clusters.
3. **Elbow Method**: Plot the within-cluster variance against the number of clusters and look for an "elbow" point.

Q5. Dendrograms:
- **Definition**: A dendrogram is a tree-like diagram that shows the arrangement of clusters produced by hierarchical clustering.
- **Usage**: Helps in visualizing the clustering process and determining the number of clusters by cutting the dendrogram at a certain height.

Q6. Hierarchical Clustering for Different Data Types:
- **Numerical Data**: Use Euclidean or Manhattan distance.
- **Categorical Data**: Use methods like Jaccard or Hamming distance. Convert categorical data into numerical formats using encoding techniques.

Q7. Identifying Outliers:
- **Outliers**: Can be identified as data points that form their own clusters or are far away from other points in the dendrogram.
- **Method**: Use the dendrogram to identify isolated clusters or clusters with very few members.
""")
