Hierarchical bloom classifier for tagging text with a structured word list
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Hierarchical bloom classifier for tagging text with a structured word list.

Build Status

This has prpbably been solved before by smater people than myself. Given a structured tree of categories and corresponding words/strings, hBloom will create a bloom filter for each level of depth in the data, which returns true/false for all sub categories, in the case of 'true' it will return the matching categories.

An example is available at example/index.js. The example tests 5000 tweets against 16000 tag words and takes on average 4000 milliseconds. (0.8 ms per tweet)


npm install hbloom


var hBloom = require('../hbloom');

var myBloom = hBloom( {STRUCTURED DATA} );
var txt = "This post is about celtic and rangers, but mentions villa.";

myBloom.classifyText(txt, function(result){

// logs: ['football', 'celtic', 'rangers', 'aston villa']


hBloom.classifyText( text, callback )

hBloom.classify( word );

##Structured Data?

The data passed to hBloom({DATA}) should follow the example below. Where keys are tags/categories and strings in arrays or matching words.

	"racing": {
		"asscot": ["asscot", "ass", "the big race"]
	"football": {
		"manchester united": ["manu", "man united", "mufc", "manchester united", "manufc"],
		"aston villa": ["aston villa", "villa" , "villafc"],
		"manchester city": ["mancity", "manchester city", "cityfc", "man city", "mancityfc"],
		"scottish league": {
			"dundee united": ["dundee", "dundee united", "dundeefc"],
			"rangers": ["rangers","rangersfc"],
			"celtic": ["celtic", "celticfc"]



Copyright 2012 Christopher de beer