# DBSCAN

[- Clustering with DBSCAN, Clearly Explained!!!](https://youtu.be/RDZUdRSDOok?t=293)

[ - Sklearn : DBSCAN ](https://scikit-learn.org/stable/modules/clustering.html#dbscan)

[-datascientest ](https://datascientest.com/machine-learning-clustering-dbscan)

[- Analyticsvidhya : DBSCAN](https://www.analyticsvidhya.com/blog/2020/09/how-dbscan-clustering-works/)

**DBSCAN (Density-Based Spatial Clustering of Applications with Noise)** is a popular clustering algorithm used for data analysis and **pattern recognition**. It groups data points based on their density, identifying clusters of high-density regions and classifying outliers as noise

![9951609.fig.003.jpg](attachment:9951609.fig.003.jpg)

![1_GZQsTGh1s3fAIQUx9QQntw.png](attachment:1_GZQsTGh1s3fAIQUx9QQntw.png)

![1_yT96veo7Zb5QeswV7Vr7YQ.png](attachment:1_yT96veo7Zb5QeswV7Vr7YQ.png)

______________

# DBSCAN Clustering with scikit-learn

### - Introduction

**DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm used in machine learning to group together data points based on their density. In this document, we'll provide a step-by-step explanation of DBSCAN, followed by an implementation using scikit-learn in Python.

### - Key Concepts

- **Epsilon (eps):**
  - Maximum distance between two samples for one to be considered in the neighborhood of the other.
  
- **MinPts (min_samples):**
  - Minimum number of samples in a neighborhood for a point to be considered a core point.
  
- **Core Points:**
  - Points with at least MinPts neighbors within epsilon distance.
  
- **Border Points:**
  - Points within epsilon distance of a core point but not core themselves are border points.
  
- **Noise:**
  - Points that are neither core nor border are considered noise.

## DBSCAN Algorithm Overview

1. **Initialization:**
   - Choose an arbitrary data point.
   - Set parameters: epsilon (ε) for distance and MinPts for the minimum number of points in a dense region.

2. **Core Points:**
   - Identify core points with at least MinPts neighbors within epsilon distance.

3. **Expand Cluster:**
   - Expand the cluster by finding all directly reachable points from a core point.

4. **Border Points:**
   - Points within epsilon distance of a core point but not core themselves are border points.

5. **Noise/Outliers:**
   - Points not core or border are considered noise.

6. **Repeat:**
   - Iterate until all points are visited.

## DBSCAN Implementation in Python (scikit-learn)

```python
# Import necessary libraries
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate synthetic data with blobs
X, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)

# Create DBSCAN instance
dbscan = DBSCAN(eps=0.5, min_samples=5)

# Fit the model and predict clusters
labels = dbscan.fit_predict(X)

# Visualize the results
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=60, edgecolors='k')
plt.title('DBSCAN Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()


![1689773520294.jpeg](attachment:1689773520294.jpeg)