# Exercise - Density-based Spatial Clustering of Applications with Noise (DBSCAN) in R
This notebook is designed to help get you familiar with using the DBSCAN algorithm in R through a simple exercise. 

In this notebook, we will load and explore 'seeds.txt' in R, which is a dataset containing measurements of geometrical properties of kernels belonging to three different varieties of wheat. 

The DBSCAN algorithm combines data points that are closely packed together into clusters based on the parameters **eps** and **minPts**. 

The **eps** parameter corresponds to the size of the neighborhood. 

The **minPts** parameter corresponds to the minimum number of points that must exist in this neighborhood to define it as a dense area. 

This exercise covers:

1. Installing and loading the DBSCAN library and the data in R
2. Creating a subset from the datatset
3. Using the dbscan function to obtain clusters
4. Converting the clusters to factors 
5. Attaching the clusters to the measurements
6. Visualizing the results


## Installing and loading the DBSCAN library and the data in R
We begin by installing the dbscan package and loading it along with the dataset .



In [None]:
# Installing the library 'dbscan'
install.packages("dbscan", dependencies = TRUE)
library('dbscan')
# Downloading the file in the Data Scientist Workbench
download.file("https://raw.githubusercontent.com/domwarr/R-Mining/master/seeds.txt","/resources/seeds.txt")
# Load data
seeds <- read.csv("seeds.txt", sep = "	")
head(seeds)

    area perimeter compactness length width asymmetry_coeff
1   15.26     14.84      0.8710  5.763 3.312           2.221
2   14.88     14.57      0.8811  5.554 3.333           1.018
3   14.29     14.09      0.9050  5.291 3.337           2.699
4   13.84     13.94      0.8955  5.324 3.379           2.259
5   16.14     14.99      0.9034  5.658 3.562           1.355
6   14.38     14.21      0.8951  5.386 3.312           2.462
    groove_length type
1         5.220    1
2         4.956    1
3         4.825    1
4         4.805    1
5         5.175    1
6         4.956    1


## Creating a subset from the dataset


In [None]:
# Creating the subset: The subset contains the kernel width and the kernel length
seeds.sub <- subset(seeds, select = c(width,length))
head(seeds.sub)


   width length
1 3.312  5.763
2 3.333  5.554
3 3.337  5.291
4 3.379  5.324
5 3.562  5.658
6 3.312  5.386


## Using the dbscan function to obtain clusters


In [None]:
# Use the dbscan function to obtain clusters and store them in clusters_assignments1 
# For this exercise we will use eps = 0.8 and minPts = 4
# Note: you can change the values to see the formation of different clusters
clusters_assignments1 <- dbscan(seeds.sub, eps = .08, minPts = 4)
clusters_assignments1 


DBSCAN clustering for 210 objects.
Parameters: eps = 0.08, minPts = 4
The clustering contains 4 cluster(s) and 40 noise points.

  0   1   2   3   4
 40 128   8   4  30


## Converting the clusters to factors


In [None]:
# Clusters must be converted to factor before plotting in different colors
clusters_assignments1$cluster <- as.factor(clusters_assignments1$cluster)




## Attaching the clusters to the measurements


In [None]:
# Combine the cluster assignments with the subset
seeds.sub$cluster_no <- clusters_assignments1$cluster
head(seeds.sub)


  width length cluster_no
1 3.312  5.763          1
2 3.333  5.554          1
3 3.337  5.291          2
4 3.379  5.324          2
5 3.562  5.658          0
6 3.312  5.386          2


## Visualizing the results


In [None]:
# Visualize results (noise is shown in black) using a simple plot
plot(seeds.sub$width, seeds.sub$length, col = clusters_assignments1$cluster, pch = 16, main = "Scatterplot Displaying Clusters", xlab = "Kernel Width", ylab = "Kernel Length")

legend(x = 2.6, y = 6.75, legend = levels(clusters_assignments1$cluster), col = c(1:5), pch = 16, title = "Clusters")


### Thank you for completing this exercise!

Notebook created by: Dominique Warren

### References:

https://en.wikipedia.org/wiki/DBSCAN <br>
https://cran.r-project.org/web/packages/dbscan/dbscan.pdf <br>
https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf <br>
http://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf <br>

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).