Skip to content

Hierarchical clustering implemented via priority-queue algorithm

License

Notifications You must be signed in to change notification settings

compute-io/hclust

Repository files navigation

hclust

NPM version Build Status Coverage Status Dependencies

Hierarchical Clusteringfor JS implemented via priority-queue algorithm

Installation

$ npm install compute-hclust

For use in the browser, use browserify.

Usage

var hclust = require( 'compute-hclust' );

hclust( data[, opts] )

Given an input data in the form of a two-dimensional array, this function carries out Hierarchical Clustering.

The function accepts the following options:

  • distance: string indicating the chosen distance metric. Default: euclidean.
  • linkage: string indicating the linkage criterion to be used, either single or complete.

The function returns an object which exports two function: getClusters(k) and getTree. The former takes one parameter, the number of clusters k, and returns an array of clusters, where each cluster is represented as an array holding the indices of data points assigned to it. The function getTree returns a binary tree representing the tree hierarchy generated by the clustering algorithm.

The algorithm makes use of a priority queue, making it more efficient than a naive implementation. It has been adapted from the book Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008 [p.386].

Examples

'use strict';

var hclust = require( 'compute-hclust' );
var util = require( 'util' );

var dat = [
	[1, 2, 1],
	[80 ,100, 98],
	[1, 1.9, 1],
	[80, 101, 99],
	[1, 3, 2],
	[2, 2, 1]
];

var clustering = hclust( dat );

console.log( clustering.getClusters( 2 ) );
// [ [ 0, 2, 5, 4 ], [ 1, 3 ] ]

console.log( clustering.getClusters( 3 ) );
// [ [ 0, 2, 5 ], [ 1, 3 ], [ 4 ] ]

console.log( util.inspect( clustering.getTree(), null, 5 ) );
/*
{ left:
   { left:
      { left:
         { left: { obs: [ 1, 2, 1 ], size: 1 },
           right: { obs: [ 1, 1.9, 1 ], size: 1 },
           size: 2 },
        right: { obs: [ 2, 2, 1 ], size: 1 },
        size: 3 },
     right: { obs: [ 1, 3, 2 ], size: 1 },
     size: 4 },
  right:
   { left: { obs: [ 80, 100, 98 ], size: 1 },
     right: { obs: [ 80, 101, 99 ], size: 1 },
     size: 2 },
  size: 6 }
*/

To run the example code from the top-level application directory,

$ node ./examples/index.js

Tests

Unit

Unit tests use the Mocha test framework with Chai assertions. To run the tests, execute the following command in the top-level application directory:

$ make test

All new feature development should have corresponding unit tests to validate correct functionality.

Test Coverage

This repository uses Istanbul as its code coverage tool. To generate a test coverage report, execute the following command in the top-level application directory:

$ make test-cov

Istanbul creates a ./reports/coverage directory. To access an HTML version of the report,

$ make view-cov

License

MIT license.

Copyright

Copyright © 2015. Philipp Burckhardt.

About

Hierarchical clustering implemented via priority-queue algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published