From 86d4612cbc058ec80898e84bc4992e1dd606af68 Mon Sep 17 00:00:00 2001 From: Lee Hinman Date: Thu, 17 May 2012 08:55:30 -0600 Subject: [PATCH] update documentation for document categorization --- .gitignore | 1 + .travis.yml | 3 +++ README.markdown | 10 ++++++++++ TRAINING.markdown | 30 ++++++++++++++++++++++++++++++ 4 files changed, 44 insertions(+) diff --git a/.gitignore b/.gitignore index fc0a99c..0d4bfc4 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ parser-model/en-parser-chunking.bin .lein-failures multi-lib/* .lein-deps-sum +target/* diff --git a/.travis.yml b/.travis.yml index 0791305..a7b2fec 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1 +1,4 @@ language: clojure +lein: lein2 +script: lein2 all test + diff --git a/README.markdown b/README.markdown index 1c20b99..06323fb 100644 --- a/README.markdown +++ b/README.markdown @@ -135,6 +135,16 @@ And with just strings: ("The override system" "is meant to deactivate" "the accelerator" "when" "the brake pedal" "is pressed") ``` +Document Categorization: + +See opennlp.test.tools.train for better usage examples. + +```clojure +(def doccat (make-document-categorizer "my-doccat-model")) + +(doccat "This is some good text") +"Happy" +``` Probabilities of confidence --------------------------- diff --git a/TRAINING.markdown b/TRAINING.markdown index e678f19..5337f93 100644 --- a/TRAINING.markdown +++ b/TRAINING.markdown @@ -239,6 +239,36 @@ to ```train-pos-tagger``` (def pos-tag (make-pos-tagger (train-pos-tagger "en" "postagger.train" tagdict))) +Document Categorization +----------------------- + +To train a document categorizing tool, provide text with the format +similar to (as an example, a sentiment detector): + + Happy squealed + Happy delight + Happy upbeat + Happy success + Happy dream + Happy smile + Happy smiles + Happy well + Happy enjoy + Happy sunny + Unhappy foreboding + Unhappy prisoner + Unhappy frowning + Unhappy confused + Unhappy disapproving + Unhappy upset + +You can then train a model with this file: + + (def doccat-model (train-document-categorization "training/doccat.train")) + (def doccat (make-document-categorizer doccat-model) + (doccat "I like to smile.") + => "Happy" + Notes ----- If you get an Exception, you might just not have enough data.