public
Description: Clustering algorithms for Clojure
Homepage: http://github.com/tyler/clojure-cluster
Clone URL: git://github.com/tyler/clojure-cluster.git
name age message
file LICENSE Sun Oct 12 01:05:42 -0700 2008 Initial commit. [tyler]
file README.textile Sun Oct 12 01:13:29 -0700 2008 To Do, Known Bugs, and typographic fixes. [tyler]
file cluster.clj Thu Nov 06 23:40:33 -0800 2008 Faster version of Pearson and use replicate not... [tyler]
directory internal/ Fri Nov 07 09:59:34 -0800 2008 More correct Pearson. Returns nil if either ve... [tyler]
file run-tests.clj Thu Nov 06 23:41:14 -0800 2008 Tests! [tyler]
directory test/ Thu Nov 06 23:41:14 -0800 2008 Tests! [tyler]
README.textile

Clustering algorithms for Clojure

Two clustering algorithms for Clojure: k-means and hierarchical.

Usage

  
    (ns my-namespace (:use cluster))    
  

k-means

Currently we expose two clustering algorithms: k-means and hierarchical. Use the k-means algorithm like so:

  
    ;; kcluster --
    ;;   :vectors - a sequence of vectors which you want clustered
    ;;   :count - number of clusters to find
    ;;   :range-start - lower limit for the randomized cluster nodes
    ;;   :range-end - upper limit for the randomized cluster nodes

    (kcluster [[1 2 3] [3 4 5] [5 6 7]] 2 0 7)
  

So, range-start and range-end may need a bit of clarification. A k-means algorithm works by randomly
placing a number of nodes amonst the nodes you want clustered, then moving those nodes until they fall
into the center of a cluster. Those random nodes need upper and lower limits. Usually these are just
the highest and lowest possible values for numbers in the vectors which you’re clustering.

The return value of kcluster is a tuple. The first value is a sequence of vectors which contain the
indices of the clustered vectors. So if you passed in five vectors the first return value might look like:
[[0 3 4] [1 2]]. The second value contains the final vectors for the cluster nodes.

Hierarchical

  
    ;; hcluster --
    ;;   :nodes - a sequence of maps in the form: { :vec [1 2 3] }

    (hcluster [{:vec [1 2 3]} {:vec [3 4 5]} {:vec [7 9 9]}])
  

The return value of hcluster is a tree of Maps. It might look something like this, for the above input:

  
    {:vec (9/2 6 13/2)
     :right {:vec [7 9 9]},
     :left  {:right {:vec [3 4 5]}, 
             :left  {:vec [1 2 3]}, 
             :vec (2 3 4)}}
     
  

Known Bugs

Passing vectors of all the same number to either clustering function will cause a division-by-zero error due
to my sucking at implementing Pearson correctly.

To Do

  • Fix Pearson
  • Add more similarity functions and allow use to choose which to use