# DBSCAN Clustering

**DBSCAN (Density-Based Spatial Clustering of Applications with Noise)** is an unsupervised clustering algorithm.

## Key Ideas:
- Groups together points that are **close** (high density).
- Points in low-density regions are treated as **noise/outliers**.
- Does not require specifying the number of clusters `k` in advance.

### Key Parameters:
- `eps`: Maximum distance between two samples for them to be considered neighbors.
- `min_samples`: Minimum number of points required to form a dense region.

## Import Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

# Load Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)

# Standardize the dataset (important for DBSCAN)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X[:5]

## Apply DBSCAN Clustering

In [None]:
# Create DBSCAN model
dbscan = DBSCAN(eps=0.6, min_samples=5)
clusters = dbscan.fit_predict(X_scaled)

# Add cluster labels to dataset
X['cluster'] = clusters
X['cluster'].value_counts()

## Visualize Clusters

In [None]:
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=clusters, cmap='plasma', alpha=0.7)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('DBSCAN Clustering on Iris Dataset')
plt.show()

## Observations
- DBSCAN automatically found clusters based on **density**.
- Points labeled as `-1` are considered **outliers/noise**.
- Unlike KMeans, we don’t need to specify the number of clusters.
- Works well for datasets with irregular cluster shapes.

## Key Notes:
- Choosing the right `eps` and `min_samples` is important.
- Can struggle with datasets of varying density.
- Very useful for anomaly detection!