Narnach / groupie
- Source
- Commits
- Network (1)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
Wes Oldenbeuving (author)
Tue Jun 02 23:34:04 -0700 2009
commit b85a3cb1cd75e24046f0259423c5cb4f40d6a2f2
tree d957c4410bccafae9ff00758baf78066bd7dcfc4
parent 6db2b64980e70796c1b296180bc49baeff89e9fe
tree d957c4410bccafae9ff00758baf78066bd7dcfc4
parent 6db2b64980e70796c1b296180bc49baeff89e9fe
groupie /
| name | age | message | |
|---|---|---|---|
| |
MIT-LICENSE | ||
| |
lib/ | ||
| |
readme.rdoc | ||
| |
test/ |
readme.rdoc
Groupie
Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters.
The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark.
Goals
Groupie is a ‘fun’ project that has the following goals, in descending order of importance:
- Have fun playing with code
- Play with Bayesian-like (spam) filtering
- Check out the Testy BDD framework. It’s pretty good for 60 lines of code!
Current functionality
Current funcionality includes:
- Tokenize an input text to prepare it for grouping.
- Strip XML and HTML tag.
- Keep certain infix characters, such as period and comma.
- Add texts (as an Array of Strings) to any number of groups.
- Classify a single word to check the likelihood it belongs to each group.
- Do classification for complete (tokenized) texts.
License
As always, the code is licensed under the MIT license.
Wes Oldenbeuving

