Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
An implementation of the gap statistic algorithm to compute the number of clusters in a set of numerical data.
branch: master

Adding link.

latest commit aabafb19d5
Edwin Chen authored
Failed to load latest commit information.
examples Adding simulations and README.
README.md Adding link.
gap-statistic.R Adding simulations and README.
simulations.R Adding simulations and README.

README.md

About

An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's "Estimating the number of clusters in a data set via the gap statistic". A description of the algorithm can be found here.

Examples

    # Single cluster in 5 dimensions
    data = cbind(rnorm(20), rnorm(20), rnorm(20), rnorm(20), rnorm(20))

    png("examples/1_cluster_5d_gaps.png")
    gap_statistic(data)
    dev.off()

Single cluster in 5 dimensions

    # Three clusters in 2 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5))
    y = c(rnorm(20, mean = 0), rnorm(20, mean = 5), rnorm(20, mean = 0))
    data = cbind(x, y)

    png("examples/3_clusters_2d.png")
    qplot(x, y)
    dev.off()

3 clusters in 2 dimensions

    png("examples/3_clusters_2d_gaps.png")
    gap_statistic(data)
    dev.off()

3 clusters in 2 dimensions

    # Four clusters in 3 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5), rnorm(20, mean = -10))
    y = rnorm(80, mean = 0)
    z = c(rnorm(40, mean = -5), rnorm(40, mean = 0))
    data = cbind(x, y, z)

    png("examples/4_clusters_3d.png")
    scatterplot3d(x, y, z)
    dev.off()

4 clusters in 3 dimensions

    png("examples/4_clusters_3d_gaps.png")
    gap_statistic(data)
    dev.off()

4 clusters in 3 dimensions

Something went wrong with that request. Please try again.