Browse files

update documentation for document categorization

  • Loading branch information...
1 parent f244033 commit 86d4612cbc058ec80898e84bc4992e1dd606af68 @dakrone committed May 17, 2012
Showing with 44 additions and 0 deletions.
  1. +1 −0 .gitignore
  2. +3 −0 .travis.yml
  3. +10 −0 README.markdown
  4. +30 −0 TRAINING.markdown
View
1 .gitignore
@@ -6,3 +6,4 @@ parser-model/en-parser-chunking.bin
.lein-failures
multi-lib/*
.lein-deps-sum
+target/*
View
3 .travis.yml
@@ -1 +1,4 @@
language: clojure
+lein: lein2
+script: lein2 all test
+
View
10 README.markdown
@@ -135,6 +135,16 @@ And with just strings:
("The override system" "is meant to deactivate" "the accelerator" "when" "the brake pedal" "is pressed")
```
+Document Categorization:
+
+See opennlp.test.tools.train for better usage examples.
+
+```clojure
+(def doccat (make-document-categorizer "my-doccat-model"))
+
+(doccat "This is some good text")
+"Happy"
+```
Probabilities of confidence
---------------------------
View
30 TRAINING.markdown
@@ -239,6 +239,36 @@ to ```train-pos-tagger```
(def pos-tag (make-pos-tagger (train-pos-tagger "en" "postagger.train" tagdict)))
+Document Categorization
+-----------------------
+
+To train a document categorizing tool, provide text with the format
+similar to (as an example, a sentiment detector):
+
+ Happy squealed
+ Happy delight
+ Happy upbeat
+ Happy success
+ Happy dream
+ Happy smile
+ Happy smiles
+ Happy well
+ Happy enjoy
+ Happy sunny
+ Unhappy foreboding
+ Unhappy prisoner
+ Unhappy frowning
+ Unhappy confused
+ Unhappy disapproving
+ Unhappy upset
+
+You can then train a model with this file:
+
+ (def doccat-model (train-document-categorization "training/doccat.train"))
+ (def doccat (make-document-categorizer doccat-model)
+ (doccat "I like to smile.")
+ => "Happy"
+
Notes
-----
If you get an Exception, you might just not have enough data.

0 comments on commit 86d4612

Please sign in to comment.