# DBSCAN Clustering on Iris Dataset
**Author:** Magudeshwaran and Senthilkumaran

**Goal:** Apply the DBSCAN clustering algorithm to group similar data points in the Iris dataset.

### Step 1: Import Libraries
We need `pandas` for data, `numpy` for numbers, `matplotlib` for plotting, and `sklearn` for DBSCAN and data preprocessing.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.cluster import DBSCAN

### Step 2: Load the Data
We load the famous Iris dataset from a URL. This dataset contains measurements of iris flowers.

In [None]:
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(url)
df.head()

### Step 3: Prepare the Data
We select two features (`petal_length` and `petal_width`) for clustering. We also use an imputer, which can help handle any missing values, though the Iris dataset is usually clean.

In [None]:
X = df[['petal_length', 'petal_width']].values

# Handle potential missing values (though Iris is usually clean)
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)

### Step 4: Apply DBSCAN Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together data points that are closely packed together, marking as outliers those points that lie alone in low-density regions.

- **`eps`:** The maximum distance between two samples for one to be considered as in the neighborhood of the other.
- **`min_samples`:** The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

In [None]:
# Initialize and fit DBSCAN model
dbscan = DBSCAN(eps=0.35, min_samples=30) 
clusters = dbscan.fit_predict(X)

### Step 5: Visualize the Clusters
We plot the data points, colored by their assigned cluster. Points labeled `-1` are considered noise (outliers) by DBSCAN.

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', marker='o', s=50, edgecolor='k')
plt.title('DBSCAN Clustering of Iris Petal Data')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.colorbar(label='Cluster Label (-1 for Noise)')
plt.grid(True)
plt.show()