Hierarchical bloom classifier for tagging text with a structured word list
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Hierarchical bloom classifier for tagging text with a structured word list.

Build Status

This has prpbably been solved before by smater people than myself. Given a structured tree of categories and corresponding words/strings, hBloom will create a bloom filter for each level of depth in the data, which returns true/false for all sub categories, in the case of 'true' it will return the matching categories.

An example is available at example/index.js. The example tests 5000 tweets against 16000 tag words and takes on average 4000 milliseconds. (0.8 ms per tweet)


npm install hbloom


var hBloom = require('../hbloom');

var myBloom = hBloom( {STRUCTURED DATA} );
var txt = "This post is about celtic and rangers, but mentions villa.";

myBloom.classifyText(txt, function(result){

// logs: ['football', 'celtic', 'rangers', 'aston villa']


hBloom.classifyText( text, callback )

hBloom.classify( word );

##Structured Data?

The data passed to hBloom({DATA}) should follow the example below. Where keys are tags/categories and strings in arrays or matching words.

	"racing": {
		"asscot": ["asscot", "ass", "the big race"]
	"football": {
		"manchester united": ["manu", "man united", "mufc", "manchester united", "manufc"],
		"aston villa": ["aston villa", "villa" , "villafc"],
		"manchester city": ["mancity", "manchester city", "cityfc", "man city", "mancityfc"],
		"scottish league": {
			"dundee united": ["dundee", "dundee united", "dundeefc"],
			"rangers": ["rangers","rangersfc"],
			"celtic": ["celtic", "celticfc"]



Copyright 2012 Christopher de beer