# Case Study Iris
Let's proceed with the tutorial on hierarchical clustering using the "Iris" dataset. We will demonstrate the agglomerative hierarchical clustering method, which is a popular hierarchical clustering approach.

## Setup
Dataset Loading and Exploration

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Convert to DataFrame for easier manipulation (optional)
df = pd.DataFrame(data=np.c_[X, y], columns=data.feature_names + ['target'])

# Explore the dataset
print(df.head())
print(df.describe())
print(df.info())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0     0.0  
1     0.0  
2     0.0  
3     0.0  
4     0.0  
       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000  

## Data Preprocessing
Before applying hierarchical clustering, we need to preprocess the data to standardize the features.

In [2]:
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Agglomerative Hierarchical Clustering

In [3]:
# Initialize the Agglomerative clustering algorithm with a specific number of clusters (n_clusters)
agglomerative = AgglomerativeClustering(n_clusters=3)

# Fit the model to the data
agglomerative.fit(X_scaled)

# Get the cluster assignments for each sample
agglomerative_labels = agglomerative.labels_

print(f"Agglomerative Hierarchical Clustering:")
print(f"Cluster Assignments: {agglomerative_labels}")

Agglomerative Hierarchical Clustering:
Cluster Assignments: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 2 1 1 1 1 1 1 1 1 0 0 0 2 0 2 0 2 0 2 2 0 2 0 2 0 2 2 2 2 0 0 0 0
 0 0 0 0 0 2 2 2 2 0 2 0 0 2 2 2 2 0 2 2 2 2 2 0 2 2 0 0 0 0 0 0 2 0 0 0 0
 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0]
