kselection: Selection of k in k-means clustering
kselectionimplements Pham, Dimov and Nguyen from 2004.
To install the latest development builds of
kselection directly from GitHub, run this instead:
require(devtools) install_github('kselection', 'drodriguezperez')
kselection implements the method proposed by Pham, Dimov and Nguyen for selecting the number of clusters for the K-means algorithm. In this method a function $f(K)$ is used to evaluate the quality of the resulting clustering and help decide on the optimal value of $K$ for each data set.
# Create a data set with two clusters dat <- matrix(c(rnorm(100, 2, .1), rnorm(100, 3, .1), rnorm(100, -2, .1), rnorm(100, -3, .1)), 200, 2) # Ejecute the method sol <- kselection(dat) # Get the results k <- num_clusters(sol) # optimal number of clustes f_k <- get_f_k(sol) # the f(k) vector # Plot the results plot(sol)
- D T Pham, S S Dimov, and C D Nguyen "Selection of k in k-means clustering", Mechanical Engineering Science, 2004, pp. 103-119.
I would like to thank Harold Pimentel for all of their helpful discussions during the develop of the package.