<div class="alert alert-block alert-success">
    <h1 align="center"> DBScan (Weather Stations Dataset)</h1>
    <h3 align="center"><a href="https://github.com/amirhosein-ziaei">Amirhosein Ziaei</a></h3>
</div>

## *What is DBScan?*

***DBSCAN (Density-Based Spatial Clustering of Applications with Noise)*** 

Finds core samples of high density and expands clusters from them. ... This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.

Density-Based Clustering refers to unsupervised learning methods that identify distinctive groups/clusters in the data, based on the idea that a cluster in data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base algorithm for density-based clustering. It can discover clusters of different shapes and sizes from a large amount of data, which is containing noise and outliers.

## *DBScan (Step by Step)*

<p align="center">
    <img src="Images/dbscan_1.jpeg" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_2.png" width="680"\>
</p>

Usually 2 parameters for DBSCAN to Optimize 
* <font color='red'>Epsilon</font> 
* <font color='red'>Minimum Points</font>

<font color='red'>Epsilon $(\epsilon)$ </font> determines how much close the points should be to be considered a part of a cluster and <font color='red'>Minimum Points _(MinPts)_ </font> determines how many number of samples (points) need to be considered around a point within the radius $\epsilon$ to be considered as a _Core Point_. _MinPts_ include the point in consideration itself.   


We start with loading the Canada Weather Data-Set. We will __cluster weather stations that show similar weather conditions__. Selection of features and applications on clustering will be shown. Since the data domain is not so well understood, it is always best to play around with $\epsilon$ and _MinPts_ parameter in Scikit learn.  

* Core — This is a point that has at least m points within distance n from itself.
* Border — This is a point that has at least one Core point at a distance n.
* Noise — This is a point that is neither a Core nor a Border. And it has less than m points within distance n from itself. 

<p align="center">
    <img src="Images/dbscan_3.jpeg" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_4.png" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_5.jpeg" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_6.png" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_7.png" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_8.png" width="680"\>
</p>

<p align="center">
    <img src="Images/dbscan_9.jpeg" width="680"\>
</p>

## *DBScan vs K-means?* 

<p align="center">
    <img src="Images/dbscan_10.jpeg" width="680"\>
</p>

## *DBScan Animation*

http://primo.ai/index.php?title=Density-Based_Spatial_Clustering_of_Applications_with_Noise_(DBSCAN)

## *Importing Libraries*

In [1]:
import numpy as np 
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns