text analysis software
This is a rewrite from scratch of some text analysis software I wrote earlier this year. The other software was really hacky so I'm using it as reference while I build this repo from scratch.
Below here are some quick dev notes, they are probably a little cryptic, will rewrite later.
Namespaces:
-
textome.token -- This will contain many different types of text tokenizers. Future functionality: ability to compose tokenizers into more powerful/abstract ones.
-
textome.ngram (moved to standalone library, may reintegrate into this.)
Roadmap coming soon...
first steps (roadmap coming soon...)
- tokenize
- ngrams
Pre alpha, no tutorial currently. sorry.
Copyright © 2014 bitfl1pper
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.