# DBSCAN - Density-based spatial clustering of applications with noise

Below we will illustrate examples where DBSCAN can do better job in clustering than K-means. This is because for some type of data patterns we need to use the appropriate clustering model.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data exploration

In [None]:
blobs = pd.read_csv('blobs.csv')
blobs.head()

In [None]:
sns.scatterplot(data=blobs, x='X1', y='X2')

In [None]:
moons = pd.read_csv('moons.csv')
moons.head()

In [None]:
sns.scatterplot(data=moons, x='X1', y='X2')

In [None]:
circles = pd.read_csv('circles.csv')
circles.head()

In [None]:
sns.scatterplot(data=circles,x='X1',y='X2')

## K-means vs DBSCAN

Here is some utility function to plot the examples

In [None]:
def display_categories(model, data):
    labels = model.fit_predict(data)
    sns.scatterplot(data=data, x='X1', y='X2', hue=labels, palette='Set1')

### K-means

In [None]:
from sklearn.cluster import KMeans

import warnings
warnings.filterwarnings('ignore')

In [None]:
model = KMeans(n_clusters = 3, n_init = 10)
display_categories(model, blobs)

We can see that K-means cannot cluster this type of data properly.

In [None]:
model = KMeans(n_clusters = 2, n_init = 10)
display_categories(model, moons)

Below is another example of bad clustering.

In [None]:
model = KMeans(n_clusters = 2, n_init = 10)
display_categories(model, circles)

### DBSCAN

In [None]:
from sklearn.cluster import DBSCAN

The red points are outliers. This is because of the default DBSCAN behaviour.

In [None]:
model = DBSCAN()
display_categories(model, blobs)

DBSCAN correctly cluster the data.

In [None]:
model = DBSCAN(eps=0.15)
display_categories(model, moons)

In [None]:
model = DBSCAN(eps=0.15)
display_categories(model, circles)