Ruby bindings to Cluster 3.0
Provides bindings to clustering methods in Cluster 3.0.
* K-Means * Kohonen Self-Organizing Maps * Tree Cluster or Hierarchical Clustering
require 'pp' require 'flock' data = Array.new(13) {[]} mask = Array.new(13) {[]} weights = Array.new(13) {1.0} data[ 0][ 0]=0.1; data[ 0][ 1]=0.0; data[ 0][ 2]=9.6; data[ 0][ 3] = 5.6; data[ 1][ 0]=1.4; data[ 1][ 1]=1.3; data[ 1][ 2]=0.0; data[ 1][ 3] = 3.8; data[ 2][ 0]=1.2; data[ 2][ 1]=2.5; data[ 2][ 2]=0.0; data[ 2][ 3] = 4.8; data[ 3][ 0]=2.3; data[ 3][ 1]=1.5; data[ 3][ 2]=9.2; data[ 3][ 3] = 4.3; data[ 4][ 0]=1.7; data[ 4][ 1]=0.7; data[ 4][ 2]=9.6; data[ 4][ 3] = 3.4; data[ 5][ 0]=0.0; data[ 5][ 1]=3.9; data[ 5][ 2]=9.8; data[ 5][ 3] = 5.1; data[ 6][ 0]=6.7; data[ 6][ 1]=3.9; data[ 6][ 2]=5.5; data[ 6][ 3] = 4.8; data[ 7][ 0]=0.0; data[ 7][ 1]=6.3; data[ 7][ 2]=5.7; data[ 7][ 3] = 4.3; data[ 8][ 0]=5.7; data[ 8][ 1]=6.9; data[ 8][ 2]=5.6; data[ 8][ 3] = 4.3; data[ 9][ 0]=0.0; data[ 9][ 1]=2.2; data[ 9][ 2]=5.4; data[ 9][ 3] = 0.0; data[10][ 0]=3.8; data[10][ 1]=3.5; data[10][ 2]=5.5; data[10][ 3] = 9.6; data[11][ 0]=0.0; data[11][ 1]=2.3; data[11][ 2]=3.6; data[11][ 3] = 8.5; data[12][ 0]=4.1; data[12][ 1]=4.5; data[12][ 2]=5.8; data[12][ 3] = 7.6; mask[ 0][ 0]=1; mask[ 0][ 1]=1; mask[ 0][ 2]=1; mask[ 0][ 3] = 1; mask[ 1][ 0]=1; mask[ 1][ 1]=1; mask[ 1][ 2]=0; mask[ 1][ 3] = 1; mask[ 2][ 0]=1; mask[ 2][ 1]=1; mask[ 2][ 2]=0; mask[ 2][ 3] = 1; mask[ 3][ 0]=1; mask[ 3][ 1]=1; mask[ 3][ 2]=1; mask[ 3][ 3] = 1; mask[ 4][ 0]=1; mask[ 4][ 1]=1; mask[ 4][ 2]=1; mask[ 4][ 3] = 1; mask[ 5][ 0]=0; mask[ 5][ 1]=1; mask[ 5][ 2]=1; mask[ 5][ 3] = 1; mask[ 6][ 0]=1; mask[ 6][ 1]=1; mask[ 6][ 2]=1; mask[ 6][ 3] = 1; mask[ 7][ 0]=0; mask[ 7][ 1]=1; mask[ 7][ 2]=1; mask[ 7][ 3] = 1; mask[ 8][ 0]=1; mask[ 8][ 1]=1; mask[ 8][ 2]=1; mask[ 8][ 3] = 1; mask[ 9][ 0]=1; mask[ 9][ 1]=1; mask[ 9][ 2]=1; mask[ 9][ 3] = 0; mask[10][ 0]=1; mask[10][ 1]=1; mask[10][ 2]=1; mask[10][ 3] = 1; mask[11][ 0]=0; mask[11][ 1]=1; mask[11][ 2]=1; mask[11][ 3] = 1; mask[12][ 0]=1; mask[12][ 1]=1; mask[12][ 2]=1; mask[12][ 3] = 1; pp Flock.kcluster(6, data, mask: mask) # method: (kcluster) # - Flock::METHOD_AVERAGE (kmeans, this is the default) # - Flock::METHOD_MEDIAN (kmedians) # method: (treecluster) # - Flock::METHOD_AVERAGE_LINKAGE (default) # - Flock::METHOD_SINGLE_LINKAGE # - Flock::METHOD_MAXIMUM_LINKAGE # - Flock::METHOD_CENTROID_LINKAGE # metric: # - Flock::METRIC_EUCLIDIAN (default) # - Flock::METRIC_CITY_BLOCK # - Flock::METRIC_CORRELATION # - Flock::METRIC_ABSOLUTE_CORRELATION # - Flock::METRIC_UNCENTERED_CORRELATION # - Flock::METRIC_ABSOLUTE_UNCENTERED_CORRELATION # - Flock::METRIC_SPEARMAN # - Flock::METRIC_KENDALL # seed: (initial cluster assignment) # - Flock::SEED_RANDOM (uniform random, this is the default) # - Flock::SEED_KMEANS_PLUSPLUS (kmeans++ - initial cluster centers chosen weighted by distance from closest center) # - Flock::SEED_SPREADOUT (similar to kmeans++ but deterministic, spreads out cluster centers) pp Flock.kcluster( 6, data, mask: mask, method: Flock::METHOD_AVERAGE, metric: Flock::METRIC_EUCLIDIAN, transpose: 0, weights: Array.new(13) {1.0}, seed: Flock::SEED_RANDOM ) pp Flock.treecluster( 6, data, mask: mask, method: Flock::METHOD_AVERAGE, metric: Flock::METRIC_EUCLIDIAN, transpose: 0, weights: Array.new(13) {1.0}, )
require 'pp' require 'flock' data = [] # keys don't need to be numeric data << { 1 => 0.5, 2 => 0.5 } data << { 3 => 1, 4 => 1 } data << { 4 => 1, 5 => 0.3 } data << { 2 => 0.75 } data << { 1 => 0.60 } pp Flock.kcluster(2, data, sparse: true) data = [] # a much simpler way to cluster text labels. data << %w(apple orange) data << %w(black white) data << %w(white cyan) data << %w(orange) data << %w(apple) # additional options such as metric, iterations can be passed in a hash. pp Flock.kcluster(2, data, sparse: true) pp Flock.treecluster(2, data, sparse: true)
Self-Organizing Maps (SOM) require that you specify a 2D grid on which data points can cluster. Some of the grid points may be empty and others might have clusters mapped to them. There is no need to provide a fixed cluster size.
require 'pp' require 'flock' data = [] # a much simpler way to cluster text data << %w(apple orange) data << %w(black white) data << %w(white cyan) data << %w(orange) data << %w(apple) # nxgrid, nygrid, data are required. # additional options such as metric, iterations can be passed in a hash. # cluster upto 4 groups in a 2x2 grid. pp Flock.self_organizing_map(2, 2, data, sparse: true)
Note: SOM clustering provides the 2D grid coordinate for each vector instead of an integer cluster value for each vector like kcluster and treecluster.
-
kmeans: use kcluster instead
-
sparse_kmeans: use kcluster with option sparse: true
-
sparse_treecluster: use treecluster with option sparse: true
-
sparse_self_organizing_map: use self_organizing_map with option sparse: true
-
kmeans, treecluster and self_organizing_map no longer take mask as a parameter.
-
mask needs to passed along with other optional parameters in the options hash.
-
Use Sparse Matrix instead of converting sparse data into dense matrices.
-
BIRCH hierarchical clustering.
-
EM clustering.
-
kcluster auto-suggest cluster size.