# Lab Exercise: Comparing Clustering Algorithms on the Digits Dataset

### Objective:
In this lab, you will compare different clustering algorithms—K-Means, DBSCAN, and Hierarchical Clustering—on the **Digits** dataset. You will evaluate the performance of these algorithms using clustering evaluation metrics such as **Silhouette Score**, **Dunn Index**, and **Davies-Bouldin Index**.

### Tasks:

1. **Data Preprocessing**:
   - Load the **Digits** dataset from `sklearn.datasets`.
   - Preprocess the data as needed (scaling, reshaping, etc.).

2. **Clustering**:
   - Apply the following clustering algorithms:
     - **K-Means**
     - **DBSCAN**
     - **Hierarchical Clustering**

3. **Evaluation**:
   - For each algorithm, compute the clustering evaluation metrics:
     - **Silhouette Score**
     - **Dunn Index**
     - **Davies-Bouldin Index**

4. **Comparison**:
   - Compare the clustering results and discuss which algorithm performs best on the Digits dataset based on the evaluation metrics.

# Discussions

1. **Which algorithm performed best based on the silhouette score, Davies-Bouldin score, and Dunn index?**

2. **How did the DBSCAN algorithm perform, given its sensitivity to parameters like eps and min_samples?**

3. **How does K-Means compare with Hierarchical Clustering, especially in terms of cluster structure?**


In [1]:
### Solution:
# Step 1: Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score, davies_bouldin_score
from scipy.spatial.distance import cdist
from sklearn.cluster import AgglomerativeClustering

In [2]:
# Step 2: Load and preprocess the Digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Standardize the data (important for clustering algorithms)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [3]:
# Step 3: Apply K-Means Clustering
kmeans = KMeans(n_clusters=10, random_state=42)
kmeans_labels = kmeans.fit_predict(X_scaled)

# Step 4: Apply DBSCAN Clustering
dbscan = DBSCAN(eps=5, min_samples=10)
dbscan_labels = dbscan.fit_predict(X_scaled)

# Step 5: Apply Hierarchical Clustering
hierarchical = AgglomerativeClustering(n_clusters=10)
hierarchical_labels = hierarchical.fit_predict(X_scaled)

In [4]:
# Step 6: Compute Clustering Evaluation Metrics

# Function to compute Dunn Index
def dunn_index(X, labels):
    distances = cdist(X, X, metric='euclidean')
    inter_cluster_distances = []
    intra_cluster_distances = []
    for label in np.unique(labels):
        cluster_points = X[labels == label]
        intra_cluster_distances.append(np.mean(cdist(cluster_points, cluster_points, metric='euclidean')))
        for other_label in np.unique(labels):
            if label != other_label:
                other_cluster_points = X[labels == other_label]
                inter_cluster_distances.append(np.min(cdist(cluster_points, other_cluster_points, metric='euclidean')))
    dunn = np.min(inter_cluster_distances) / np.max(intra_cluster_distances)
    return dunn

# Compute the metrics
silhouette_kmeans = silhouette_score(X_scaled, kmeans_labels)
davies_bouldin_kmeans = davies_bouldin_score(X_scaled, kmeans_labels)
dunn_kmeans = dunn_index(X_scaled, kmeans_labels)

silhouette_dbscan = silhouette_score(X_scaled, dbscan_labels) if len(set(dbscan_labels)) > 1 else -1
davies_bouldin_dbscan = davies_bouldin_score(X_scaled, dbscan_labels) if len(set(dbscan_labels)) > 1 else -1
dunn_dbscan = dunn_index(X_scaled, dbscan_labels) if len(set(dbscan_labels)) > 1 else -1

silhouette_hierarchical = silhouette_score(X_scaled, hierarchical_labels)
davies_bouldin_hierarchical = davies_bouldin_score(X_scaled, hierarchical_labels)
dunn_hierarchical = dunn_index(X_scaled, hierarchical_labels)

In [8]:
# Step 7: Display the Results
print("K-Means:")
print(f"Silhouette Score: {silhouette_kmeans}")
print(f"Davies-Bouldin Score: {davies_bouldin_kmeans}")
print(f"Dunn Index: {dunn_kmeans}")

print("\nDBSCAN:")
print(f"Silhouette Score: {silhouette_dbscan}")
print(f"Davies-Bouldin Score: {davies_bouldin_dbscan}")
print(f"Dunn Index: {dunn_dbscan}")

print("\nHierarchical Clustering:")
print(f"Silhouette Score: {silhouette_hierarchical}")
print(f"Davies-Bouldin Score: {davies_bouldin_hierarchical}")
print(f"Dunn Index: {dunn_hierarchical}")


K-Means:
Silhouette Score: 0.13558208876901615
Davies-Bouldin Score: 1.8060790632374897
Dunn Index: 0.15696004443168518

DBSCAN:
Silhouette Score: -0.029162417635556125
Davies-Bouldin Score: 3.4498585825016015
Dunn Index: 0.18592980647707277

Hierarchical Clustering:
Silhouette Score: 0.12532527779196986
Davies-Bouldin Score: 1.9671781554189507
Dunn Index: 0.10830374388193056
