# DBSCAN and OPTICS with R

**CS5483 Data Warehousing and Data Mining**

___

This jupyter notebook demonstrates how to cluster the iris2D dataset using density-based methods. It uses the language *R* and can be run live using an [R kernel](https://github.com/IRkernel/IRkernel).

## Setup

The following load and create the `iris2D` data set:

In [None]:
data("iris") # load the iris data set
x <- as.matrix(iris[,1:2]) # load the input attributes: sepal width and length
plot(x)

DBSCAN and OPTICS are implemented in the following package:

In [None]:
library(dbscan) # for DBSCAN and OPTICS
help(package="dbscan") # More information about the package

## DBSCAN

DBSCAN is implement by the function `dbscan`:

In [None]:
?dbscan

To apply DBSCAN to the iris data set with $\varepsilon=0.3$ and $\text{minPts} = 4$:

In [None]:
db <- dbscan(x, eps = .3, minPts = 4)
db

To visualize the clustering solution, we can plot the points in different clusters with different colors:

In [None]:
pairs(x, col = db$cluster + 1L)

**Exercise** What are the points colored in black? 

YOUR ANSWER HERE

For each data point, we can calculate the [local outlier factor (LOF)](https://en.wikipedia.org/wiki/Local_outlier_factor), which quantity how much a point is locally an outlier using the reachability distance:

In [None]:
lof <- lof(x, minPts=5)
pairs(x, cex = lof) # ploting the points scaled relative to the LOF score.

## OPTICS

OPTICS is implemented by the function `optics`:

In [None]:
?optics

To apply OPTICS with $\varepsilon=1$ and $\text{minPts} = 4$:

In [None]:
opt <- optics(x, eps=1, minPts = 4)
plot(opt)
opt

We can identify the clusters with a threshold, say 0.3, on the reachability distance:

In [None]:
opt <- extractDBSCAN(opt, eps_cl = .3)
plot(opt)

**Exercise** Use the minimum steepness to identify the cluster boundaries. You can call `extractXi` with your choice of the parameters.

In [None]:
# your R code here
# end of R code

plot(opt)
hullplot(x,opt)
opt