Skip to content

Commit

Permalink
update documentation for document categorization
Browse files Browse the repository at this point in the history
  • Loading branch information
dakrone committed May 17, 2012
1 parent f244033 commit 86d4612
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -6,3 +6,4 @@ parser-model/en-parser-chunking.bin
.lein-failures
multi-lib/*
.lein-deps-sum
target/*
3 changes: 3 additions & 0 deletions .travis.yml
@@ -1 +1,4 @@
language: clojure
lein: lein2
script: lein2 all test

10 changes: 10 additions & 0 deletions README.markdown
Expand Up @@ -135,6 +135,16 @@ And with just strings:
("The override system" "is meant to deactivate" "the accelerator" "when" "the brake pedal" "is pressed")
```
Document Categorization:
See opennlp.test.tools.train for better usage examples.
```clojure
(def doccat (make-document-categorizer "my-doccat-model"))
(doccat "This is some good text")
"Happy"
```
Probabilities of confidence
---------------------------
Expand Down
30 changes: 30 additions & 0 deletions TRAINING.markdown
Expand Up @@ -239,6 +239,36 @@ to ```train-pos-tagger```
(def pos-tag (make-pos-tagger (train-pos-tagger "en" "postagger.train" tagdict)))


Document Categorization
-----------------------

To train a document categorizing tool, provide text with the format
similar to (as an example, a sentiment detector):

Happy squealed
Happy delight
Happy upbeat
Happy success
Happy dream
Happy smile
Happy smiles
Happy well
Happy enjoy
Happy sunny
Unhappy foreboding
Unhappy prisoner
Unhappy frowning
Unhappy confused
Unhappy disapproving
Unhappy upset

You can then train a model with this file:

(def doccat-model (train-document-categorization "training/doccat.train"))
(def doccat (make-document-categorizer doccat-model)
(doccat "I like to smile.")
=> "Happy"

Notes
-----
If you get an Exception, you might just not have enough data.
Expand Down

0 comments on commit 86d4612

Please sign in to comment.