Hierarchical bloom classifier for tagging text with a structured word list
JavaScript
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
example
.gitignore
.npmignore
.travis.yml
README.md
hBloom.js
package.json

README.md

hBloom

Hierarchical bloom classifier for tagging text with a structured word list.

Build Status

This has prpbably been solved before by smater people than myself. Given a structured tree of categories and corresponding words/strings, hBloom will create a bloom filter for each level of depth in the data, which returns true/false for all sub categories, in the case of 'true' it will return the matching categories.

An example is available at example/index.js. The example tests 5000 tweets against 16000 tag words and takes on average 4000 milliseconds. (0.8 ms per tweet)

##Install

npm install hbloom

##Usage

var hBloom = require('../hbloom');

var myBloom = hBloom( {STRUCTURED DATA} );
var txt = "This post is about celtic and rangers, but mentions villa.";

myBloom.classifyText(txt, function(result){
	console.log(result);
});

// logs: ['football', 'celtic', 'rangers', 'aston villa']

##Methods

hBloom.classifyText( text, callback )

hBloom.classify( word );

##Structured Data?

The data passed to hBloom({DATA}) should follow the example below. Where keys are tags/categories and strings in arrays or matching words.

{
	"racing": {
		"asscot": ["asscot", "ass", "the big race"]
	},
	"football": {
		"manchester united": ["manu", "man united", "mufc", "manchester united", "manufc"],
		"aston villa": ["aston villa", "villa" , "villafc"],
		"manchester city": ["mancity", "manchester city", "cityfc", "man city", "mancityfc"],
		"scottish league": {
			"dundee united": ["dundee", "dundee united", "dundeefc"],
			"rangers": ["rangers","rangersfc"],
			"celtic": ["celtic", "celticfc"]
		}
	}
}

##License

MIT

Copyright 2012 Christopher de beer