Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
An implementation of the gap statistic algorithm to compute the number of clusters in a set of numerical data.
R
Branch: master

Adding link.

latest commit aabafb19d5
Edwin Chen authored
Failed to load latest commit information.
examples Adding simulations and README.
README.md Adding link.
gap-statistic.R Adding simulations and README.
simulations.R Adding simulations and README.

README.md

About

An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's "Estimating the number of clusters in a data set via the gap statistic". A description of the algorithm can be found here.

Examples

    # Single cluster in 5 dimensions
    data = cbind(rnorm(20), rnorm(20), rnorm(20), rnorm(20), rnorm(20))

    png("examples/1_cluster_5d_gaps.png")
    gap_statistic(data)
    dev.off()

Single cluster in 5 dimensions

    # Three clusters in 2 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5))
    y = c(rnorm(20, mean = 0), rnorm(20, mean = 5), rnorm(20, mean = 0))
    data = cbind(x, y)

    png("examples/3_clusters_2d.png")
    qplot(x, y)
    dev.off()

3 clusters in 2 dimensions

    png("examples/3_clusters_2d_gaps.png")
    gap_statistic(data)
    dev.off()

3 clusters in 2 dimensions

    # Four clusters in 3 dimensions
    x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5), rnorm(20, mean = -10))
    y = rnorm(80, mean = 0)
    z = c(rnorm(40, mean = -5), rnorm(40, mean = 0))
    data = cbind(x, y, z)

    png("examples/4_clusters_3d.png")
    scatterplot3d(x, y, z)
    dev.off()

4 clusters in 3 dimensions

    png("examples/4_clusters_3d_gaps.png")
    gap_statistic(data)
    dev.off()

4 clusters in 3 dimensions

Something went wrong with that request. Please try again.