<h1 align=center> DBSCAN In Depth </h1>

![dbscanprofile.png](attachment:dbscanprofile.png)

- DBSCAN stands for Density Based Spatial Clustering Of Applications With Noise
- Unsupervised learning algorithm
- Can handle outliers
- It can work without feature scaling

### **Key Concepts:**

![dbscan3.png](attachment:dbscan3.png)

- **Parameters Tune in DBSCAN:**
    - **Epsilon (ε):** A distance threshold that defines the neighborhood around a data point, or defines the max distance between points to be considered neighbors
    - **MinPts:** The minimum number of points within an ε-neighborhood for a point to be considered a core point
- **Core Points:** A point is considered a core point if there are at least MinPts points within its ε-neighborhood, including itself
- **Border Point:** A point is considered a border point if it's not a core point but falls within the ε-neighborhood of a core point
- **Noise Point:** Points that are neither core points nor border points are considered noise/outliers
- **Density:** Refers to the concentration or number of data points within a given region of space. Specifically, it measures how tightly packed the data points are in a neighborhood around each point in the dataset.

### **How DBSCAN Works:**

1. **Initialization:** DBSCAN starts by randomly selecting a point from the dataset.
2. **Neighborhood Search:** For each data point, DBSCAN finds all points within its ε-neighborhood.
3. **Core Point Identification:** A point is classified as a core point if it has at least MinPts neighbors within its ε-neighborhood. These points form the center of clusters.
4. **Cluster Formation:** DBSCAN starts from a core point and recursively visits its density-connected neighbors (points also within ε-neighborhoods of each other). All core points and their density-reachable neighbors are assigned to the same cluster.
5. **Border Point Assignment:** Border points are assigned to the same cluster as their associated core point.
6. **Noise Classification:** Any points that are neither core points nor border points are considered noise points or outliers. These points do not belong to any cluster.

### **Practical Example:**

- Below is my raw data

![dbscan1.png](attachment:dbscan1.png)

- For this example, min_Point=3 and imagine ε=0.5
- It will select data randomly
- For each data, it finds all points within its ε-neighborhood
- A point is classified as a core point, border point, and noise point and cluster is formed

![dbscan2.png](attachment:dbscan2.png)

- Red Points: consider core points
- Yellow Points: consider border points
- Purple Points: consider noise points

### **Advantages:**

- **No Predefined Cluster Count:** Unlike K-Means, DBSCAN doesn't require specifying the number of clusters, making it suitable for data with unknown cluster structures
- **Robust to Outliers:** DBSCAN can effectively handle outliers that fall outside dense regions and label them as noise.
- **Flexible Shape Detection:** DBSCAN can find clusters of arbitrary shapes, unlike K-Means, which is limited to spherical clusters

### **Limitations:**

- Parameter Sensitivity: Performance can vary depending on the choice of ε and MinPts
- High Time Complexity
- Calculating distances and identifying core points can be computationally expensive for large datasets

In [1]:
from sklearn.cluster import DBSCAN
import numpy as np

X = np.array([[1, 2], [2, 2], [2, 3],
              [8, 7], [8, 8], [25, 80]])

clustering = DBSCAN(eps=3, min_samples=2).fit(X)
print(clustering.labels_)

print(clustering)

[ 0  0  0  1  1 -1]
DBSCAN(eps=3, min_samples=2)
