Permalink
Browse files

updated readme to include command line usage, version bump to 0.2.0

  • Loading branch information...
1 parent 2083d62 commit a469823ba2e0baf3463584dce98c6fab179f4240 @eandrejko committed Apr 22, 2012
Showing with 71 additions and 39 deletions.
  1. +66 −7 README.md
  2. +1 −1 project.clj
  3. +1 −27 src/random_forests/core.clj
  4. +3 −4 src/random_forests/train.clj
View
@@ -1,10 +1,14 @@
# Random Forests in Clojure
-An _incomplete_ implementation of Random Forests in Clojure.
+A simple implementation of Random Forests for classification and regression in Clojure.
+
+Features:
+- Supports categorical, continuous and text features (as bag of words)
+- Supports classification
+- Supports regression
+- Estimates out of sample error during training
Limitations:
-- Only supports categorical features
-- Only supports classification
- All training examples must fit into memory
A description of random forests can be found at: [http://www.stat.berkeley.edu/~breiman/RandomForests/](http://www.stat.berkeley.edu/~breiman/RandomForests/).
@@ -16,7 +20,7 @@ Decision trees are constructed recursively as anonymous functions choosing split
To use add to your `project.clj`:
```clojure
- [random-forests-clj "0.1.0"]
+ [random-forests-clj "0.2.0"]
```
## Example
@@ -26,18 +30,73 @@ Feaures are represented by the index in the training example. A tree can be bui
```clojure
(use 'random-forests.core)
- (def t (build-tree (list ["M" "<25" 1] ["M" "<40" 0] ["F" "<35" 1] ["F" "<30" 1] ) #{0 1}))
+ ;; target is in the last position
+ (def examples (list ["M" "<25" 1] ["M" "<40" 0] ["F" "<35" 1] ["F" "<30" 1]))
+
+ ;; features can be continuous, categorical or text
+ (def features (set (list (feature 0 :categorical) (feature 1 :categorical))))
+
+ ;; return a lazy sequence of decision trees with:
+ ;; - 1 random feature per splitting node
+ ;; - a bootstrap resample of 2 examples per tree
+ (def t (first (build-random-forest examples features 1 2))
(meta t) ;; => {:tree "if(1=<40){0}else{1}"}
```
-The tree is a function, and new examples can classified by calling the function:
+Each tree is a function, and new examples can classified by calling the function:
```clojure
(t ["M" "<20"]) ;; => 1
```
+## Command Line Usage
+
+Models can built from the command line using `lein run`:
+```
+Usage:
+
+ Switches Default Desc
+ -------- ------- ----
+ -h, --no-help, --help false Show help
+ -f, --features [] Features specification (matching CSV header): name=continuous,foo=text
+ -s, --size 1000 Size of bootstrap sample per tree
+ -m, --split 100 Number of features to sample for each split
+ -o, --output Write detailed training error output in CSV format to output file
+ -t, --target Prediction target name
+ -b, --no-binary, --binary false Perform binary classification of target (measures AUC loss)
+ -l, --limit 100 Number of trees to build
+ ```
+
+To build a binary classifier on the provided test data set using a
+forest of 500 trees:
+
+```
+lein run -f V1=categorical,V2=categorical,V3=categorical,V4=categorical,V5=categorical,V6=categorical,V7=categorical,V8=categorical,V9=categorical -l 500 -t target=continuous -b test/data/cancer.csv```
+
+which will output out of sample AUC loss for the entire forest as each
+tree is added to the forest:
+
+```
+1: 0.875000
+2: 0.843000
+3: 0.824000
+4: 0.798000
+5: 0.843000
+6: 0.855000
+7: 0.855000
+8: 0.878000
+9: 0.864000
+10: 0.883000
+11: 0.879000
+12: 0.892000
+13: 0.906000
+14: 0.906000
+15: 0.935000
+...
+```
+
## License
-Copyright (C) 2010 Erik Andrejko
+Copyright (C) 2010-2012 Erik Andrejko
Distributed under the Eclipse Public License, the same as Clojure.
View
@@ -1,4 +1,4 @@
-(defproject random-forests-clj "0.1.0"
+(defproject random-forests-clj "0.2.0"
:description "An implementation of Random Forests for classification in Clojure"
:dependencies [[org.clojure/clojure "1.3.0"]
[clojure-csv/clojure-csv "2.0.0-alpha1"]
@@ -267,30 +267,4 @@
(stats/lazy-sample)
(take 1000))]
(float (avg (map #(if (< (first %) (last %)) 1 0)
- (map vector zero-scores one-scores))))))
-
-;; usage
-(comment
-
- (def data-file "test/data/cancer.csv")
-
- (def data (split-dataset-into-training-and-test
- ;; the target variable must be read as an integer to measure the auc
- (map
- #(vec (concat (butlast %) (list (Integer/parseInt (last %)))))
- (read-dataset data-file))))
-
- ;; everything but the last column is an input feature
- (def features (set (map #(feature (str "V" %) %) (range (dec (count (first (:training data))))))))
-
- (def features-with-interactions (set
- (concat
- features
- (for [a features b features :when (not (= a b))] [a b]))))
-
- (def forest (doall
- (take 250 (build-random-forest (:training data) features-with-interactions 3))))
-
- (println "AUC: " (auc forest (:test data)))
-
- )
+ (map vector zero-scores one-scores))))))
@@ -62,17 +62,16 @@
[& args]
(let [[options args banner] (cli args
["-h" "--help" "Show help" :default false :flag true]
- ["-f" "--features" "Features with types to use for prediction, comma separated with names matching header: name=continuous,foo=text" :parse-fn #(clojure.string/split % #",") :default []]
+ ["-f" "--features" "Features specification (matching CSV header): name=continuous,foo=text" :parse-fn #(clojure.string/split % #",") :default []]
["-s" "--size" "Size of bootstrap sample per tree" :parse-fn #(Integer/parseInt %) :default 1000]
["-m" "--split" "Number of features to sample for each split" :parse-fn #(Integer/parseInt %) :default 100]
["-o" "--output" "Write detailed training error output in CSV format to output file"]
["-t" "--target" "Prediction target name"]
["-b" "--binary" "Perform binary classification of target (measures AUC loss)" :default false :flag true]
- ["-l" "--limit" "Number of trees to build" :parse-fn #(Integer/parseInt %) :default 100])]
- (when (:help options)
+ ["-l" "--limit" "Number of trees to build" :parse-fn #(Integer/parseInt %) :default 100])]
+ (when (or (not (first args)) (:help options))
(println banner)
(System/exit 0))
-
(let [input (csv-from-path (first args))
[header & input] input
encoding (encoding-fns (conj (:features options) (:target options)))

0 comments on commit a469823

Please sign in to comment.