**Lab: Unsupervised Learning and Clustering**

In this lab, you will explore clustering algorithms such as K-Means, Hierarchical Clustering, and DBSCAN in Python. By the end of this lab, you should be able to apply these clustering techniques to a dataset, visualize the results, and compare the performance of each algorithm.

**Lab Setup:**

**Required Libraries:**

Make sure you have the following Python libraries installed:
* numpy
* matplotlib
* pandas 
* scikit-learn
* scipy

You can install them using:

In [None]:
pip install numpy matplotlib pandas scikit-learn scipy

**Dataset:**

We'll use the **Iris dataset**, which is available in scikit-learn. This dataset contains measurements of 150 iris flowers across four features: sepal length, sepal width, petal length, and petal width.

**Load and Explore the Iris Dataset**

Step 1: Import libraries and load the dataset

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['target'] = iris.target  # Add target labels for later comparison

# Print the first few rows
print(data.head())

**Task:**

* Examine the dataset. What features are present?
* How many different species (targets) are there?

**K-Means Clustering**

Step 2: Implement K-Means Clustering

In [None]:
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Initialize and fit K-Means with 3 clusters
kmeans = KMeans(n_clusters=3)
kmeans.fit(data.iloc[:, :-1])  # Exclude the target column

# Add the K-Means cluster labels to the DataFrame
data['kmeans_labels'] = kmeans.labels_

# Visualize the clustering
plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=data['kmeans_labels'], cmap='viridis')
plt.title('K-Means Clustering on Iris Dataset')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

**Task:**

* Run the K-Means algorithm and visualize the clusters.
* Experiment with different numbers of clusters (__n_clusters__ parameter) and observe how the results change.

**Hierarchical Clustering**

Step 3: Perform Hierarchical Clustering

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage

# Perform hierarchical clustering (Ward's method)
linked = linkage(data.iloc[:, :-2], method='ward')

# Plot the dendrogram
plt.figure(figsize=(10, 7))
dendrogram(linked, labels=data['target'].values, leaf_rotation=90)
plt.title('Hierarchical Clustering Dendrogram (Ward\'s Method)')
plt.show()

**Task:**

* Interpret the dendrogram. How many clusters are evident from the plot?
* Where could you potentially "cut" the dendrogram to obtain meaningful clusters?

**DBSCAN Clustering**

Step 4: Implement DBSCAN Clustering

In [None]:
from sklearn.cluster import DBSCAN
import numpy as np

# Initialize and fit DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(data.iloc[:, :-2])

# Add DBSCAN cluster labels to the DataFrame
data['dbscan_labels'] = dbscan_labels

# Visualize DBSCAN clustering
plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=data['dbscan_labels'], cmap='plasma')
plt.title('DBSCAN Clustering on Iris Dataset')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

**Task:**

* Experiment with different values of **eps** and **min_samples** to see how the clustering changes.
* Identify any noise points (points that don't belong to any cluster).

**Comparison of Clustering Algorithms**

Step 5: Compare Clustering Results

In [None]:
# Create a table to compare clustering labels with true labels
comparison_table = data[['target', 'kmeans_labels', 'dbscan_labels']]

# Print the comparison
print(comparison_table.head(10))

**Task:**

* Compare the clustering results of K-Means and DBSCAN with the actual target labels.
* Which algorithm seems to best capture the underlying patterns of the data?

**Lab Questions:**
1) **K-Means Analysis:**

* Why did we use K = 3 for K-Means in this lab? What happens if you change it to 2 or 4 clusters?
* Discuss how K-Means clustering results compare with the true species labels.

2) **Hierarchical Clustering Analysis:**

* What insights can you gather from the dendrogram? How does hierarchical clustering differ from K-Means?
* Could you "cut" the dendrogram at a different point to obtain a different number of clusters?

3) **DBSCAN Analysis:**

* What effect does changing the eps and min_samples parameters have on DBSCAN clustering?
* How does DBSCAN handle noise compared to K-Means and Hierarchical Clustering?

4) **Overall Comparison:**

* Compare the results of K-Means, Hierarchical, and DBSCAN clustering. Which algorithm performed the best, and why?
* Which algorithm would you recommend for clustering the Iris dataset and why?