***DBSCAN stands for Density Based Spatial clustring of Applications with noise.***

### In DBSCAN there are 3 type of points:
* Core point
* Border Point
* Noise point

### Core point:
It is a point which fall between the centre of the circle.  A point is a core point if there are at least minPts number of points (including the point itself) in its surrounding area with radius eps. Border point: A point is a border point if it is reachable from a core point and there are less than minPts number of points within its surrounding area.

### Border point:
It is a point which is not a core point but near to the core point.Border points are points that are (in DBSCAN) part of a cluster, but not dense themselves (i.e. every cluster member that is not a core point). In the followup algorithm HDBSCAN, the concept of border points was discarded.

### Noise point:
It is a point which is neither core nor a Border point.A selected point that is neither a core point nor a border point. It means these points are outliers that are not associated with any dense clusters. In the figure 1, blue point is identified as noise point.

<img src='dbscan.png' />

[Article for DBSCAN Understanding](https://medium.com/@agarwalvibhor84/lets-cluster-data-points-using-dbscan-278c5459bee5)

## Advantages:
* Is great at separating clusters of high density versus clusters of low density within a great dataset. 
* Is great with handling outliers within the dataset.

## Disadvantages:
* Does not work well when dealing with the clusters of varying densities. While DBSCAN is great at separating high density clusters from low density clusters, DBSCAN struggles with clusters of similar density.
* Struggeles with high dimensionality data. 

In [1]:
import pandas as pd
import numpy as np

In [4]:
df = pd.read_csv('Mall_customer.csv')
df.head()

Unnamed: 0,CustomerID,Genre,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40


In [5]:
X = df.iloc[:, [3, 4]].values


# Using the elbow method to find the optimal number of clusters
from sklearn.cluster import DBSCAN
dbscan=DBSCAN(eps=3,min_samples=4) # giving the epsilon and minimum point values

In [6]:
# Fitting the model

model=dbscan.fit(X)

labels=model.labels_

In [7]:
from sklearn import metrics

#identifying the points which makes up our core points
sample_cores=np.zeros_like(labels,dtype=bool)

sample_cores[dbscan.core_sample_indices_]=True

In [8]:
#Calculating the number of clusters

n_clusters=len(set(labels))- (1 if -1 in labels else 0)



print(metrics.silhouette_score(X,labels))

-0.1908319132560097
