Wanted the ability to remove something from a category, so I added the removeDocument method. However, looks like train is both incremental, and additive-only, so it seemed like the most straightforward way to do it without a lot of rewriting was to add a retrain that would wipe the slate and start over.
The only thing I'm really dubious about is the ramifications of removing a text-item from features, in the case where the same thing exists in documents with multiple classifiers.
If this is completely crazy, I'd appreciate feedback on a better way to approach adding this feature.
I've also added a test for this -- the test for good/bad equality seems a little brittle, but as long as the classifications format doesn't change, it should work correctly. I am a little curious how the classify method returns a value in the case where categorizations are all equal. Does it just pick the first one it finds? Is this desirable behavior? If something can't be reasonably categorized, would returning null be too weird?
Thanks for the work on this. I plan to use this in a Hack Day project at Yammer. :)
Added removeDocument and retrain.
sounds reasonable so far. i'll look this over within the next 48 hours or so as i'm a bit backed up now. thanks for the contribution regardless!
One year later
Yeah, I'm looking for someone to take over the operations here as I don't have time to attend to issues much these days. In the meantime I'll try to have a look within the coming days.
I'll have to review the logic and resolve conflicts so it won't be super quick.