update documentation for document categorization

dakrone · May 17, 2012 · 86d4612 · 86d4612
1 parent f244033
commit 86d4612
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ parser-model/en-parser-chunking.bin
 .lein-failures
 multi-lib/*
 .lein-deps-sum
+target/*
diff --git a/.travis.yml b/.travis.yml
@@ -1 +1,4 @@
 language: clojure
+lein: lein2
+script: lein2 all test
+
diff --git a/README.markdown b/README.markdown
@@ -135,6 +135,16 @@ And with just strings:
 ("The override system" "is meant to deactivate" "the accelerator" "when" "the brake pedal" "is pressed")
 ```
 
+Document Categorization:
+
+See opennlp.test.tools.train for better usage examples.
+
+```clojure
+(def doccat (make-document-categorizer "my-doccat-model"))
+
+(doccat "This is some good text")
+"Happy"
+```
 
 Probabilities of confidence
 ---------------------------

diff --git a/TRAINING.markdown b/TRAINING.markdown
@@ -239,6 +239,36 @@ to ```train-pos-tagger```
      (def pos-tag (make-pos-tagger (train-pos-tagger "en" "postagger.train" tagdict)))
 
 
+Document Categorization
+-----------------------
+
+To train a document categorizing tool, provide text with the format
+similar to (as an example, a sentiment detector):
+
+    Happy squealed
+    Happy delight
+    Happy upbeat
+    Happy success
+    Happy dream
+    Happy smile
+    Happy smiles
+    Happy well
+    Happy enjoy
+    Happy sunny
+    Unhappy foreboding
+    Unhappy prisoner
+    Unhappy frowning
+    Unhappy confused
+    Unhappy disapproving
+    Unhappy upset
+
+You can then train a model with this file:
+
+    (def doccat-model (train-document-categorization "training/doccat.train"))
+    (def doccat (make-document-categorizer doccat-model)
+    (doccat "I like to smile.")
+    => "Happy"
+
 Notes
 -----
 If you get an Exception, you might just not have enough data.