Skip to content
An implementation of the gap statistic algorithm to compute the number of clusters in a set of numerical data.
R
Find file
Latest commit aabafb1 Feb 17, 2011 Edwin Chen Adding link.
Failed to load latest commit information.
examples Adding simulations and README. Nov 29, 2010
README.md Adding link. Feb 17, 2011
gap-statistic.R Adding simulations and README. Nov 30, 2010
simulations.R Adding simulations and README. Nov 30, 2010

README.md

About

An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's "Estimating the number of clusters in a data set via the gap statistic". A description of the algorithm can be found here.

Examples

    # Single cluster in 5 dimensions
    data = cbind(rnorm(20), rnorm(20), rnorm(20), rnorm(20), rnorm(20))

    png("examples/1_cluster_5d_gaps.png")
    gap_statistic(data)
    dev.off()

Single cluster in 5 dimensions

    # Three clusters in 2 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5))
    y = c(rnorm(20, mean = 0), rnorm(20, mean = 5), rnorm(20, mean = 0))
    data = cbind(x, y)

    png("examples/3_clusters_2d.png")
    qplot(x, y)
    dev.off()

3 clusters in 2 dimensions

    png("examples/3_clusters_2d_gaps.png")
    gap_statistic(data)
    dev.off()

3 clusters in 2 dimensions

    # Four clusters in 3 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5), rnorm(20, mean = -10))
    y = rnorm(80, mean = 0)
    z = c(rnorm(40, mean = -5), rnorm(40, mean = 0))
    data = cbind(x, y, z)

    png("examples/4_clusters_3d.png")
    scatterplot3d(x, y, z)
    dev.off()

4 clusters in 3 dimensions

    png("examples/4_clusters_3d_gaps.png")
    gap_statistic(data)
    dev.off()

4 clusters in 3 dimensions

Something went wrong with that request. Please try again.