# Exercise - Density-based Spatial Clustering of Applications with Noise (DBSCAN) in R
This notebook is designed to help get you familiar with using the DBSCAN algorithm in R through a simple exercise.  

In this notebook, we will load and explore 'seeds.txt' in R, which is a dataset containing measurements of geometrical properties of kernels belonging to three different varieties of wheat. Specifically, this exercise covers:

1. Installing and loading the DBSCAN library and the data in R
2. Creating a subset from the datatset
3. Using the dbscan function to obtain clusters
4. Converting the clusters to factors 
5. Attaching the clusters to the measurements
6. Visualizing the results


## Installing and loading the DBSCAN library and the data in R
We begin by installing the dbscan package and loading it along with the dataset .



In [None]:
# Installing the library 'dbscan'
install.packages("dbscan", dependencies = TRUE)
library('dbscan')
# Downloading the file in the Data Scientist Workbench
download.file("https://raw.githubusercontent.com/domwarr/R-Mining/master/seeds.txt","/resources/seeds.txt")
# Load data
seeds <- read.csv("seeds.txt", sep = "	")
head(seeds)

## Creating a subset from the dataset


In [None]:
# Creating the subset: The subset contains the kernel width and the kernel length
seeds.sub <- subset(seeds, select = c(width,length))
head(seeds.sub)


## Using the dbscan function to obtain clusters


In [None]:
# Use the dbscan function to obtain clusters and store them in clusters_assignments1 
clusters_assignments1 <- dbscan(seeds.sub, eps = .08, minPts = 4)
clusters_assignments1 


## Converting the clusters to factors


In [None]:
# Clusters must be converted to factor before plotting in different colors
clusters_assignments1$cluster <- as.factor(clusters_assignments1$cluster)


## Attaching the clusters to the measurements


In [None]:
# Combine the cluster assignments with the subset
seeds.sub$cluster_no <- clusters_assignments1$cluster
head(seeds.sub)


## Visualizing the results


In [None]:
# Visualize results (noise is shown in black) using a simple plot
plot(seeds.sub$width, seeds.sub$length, col = clusters_assignments1$cluster, pch = 16, main = "Scatterplot Displaying Clusters", xlab = "Kernel Width", ylab = "Kernel Length")
legend(x = 2.6, y = 6.75, legend = levels(clusters_assignments1$cluster), col = c(1:5), pch = 16, title = "Clusters")


### References:

https://en.wikipedia.org/wiki/DBSCAN <br>
https://cran.r-project.org/web/packages/dbscan/dbscan.pdf <br>
https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf <br>
http://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf <br>

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).