Skip to content

Commit

Permalink
Readme & examples update
Browse files Browse the repository at this point in the history
  • Loading branch information
ashenfad committed Jul 19, 2012
1 parent 926b0a3 commit 26ee372
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 89 deletions.
178 changes: 89 additions & 89 deletions README.md
Expand Up @@ -31,34 +31,34 @@ sequence of 100K samples from a normal distribution (mean 0, variance


```clojure ```clojure
user> (ns examples user> (ns examples
(:require (histogram [core :as hst]) (:use [histogram.core])
(histogram.test [examples :as ex]))) (:require (histogram.test [examples :as ex])))
examples> (def hist (reduce hst/insert! (hst/create) ex/normal-data)) examples> (def hist (reduce insert! (create) ex/normal-data))
``` ```


You can use the `sum` fn to find the approximate number of points less You can use the `sum` fn to find the approximate number of points less
than a given threshold: than a given threshold:


```clojure ```clojure
examples> (hst/sum hist 0) examples> (sum hist 0)
50044.02331806754 50044.02331
``` ```


The `density` fn gives us an estimate of the point density at the The `density` fn gives us an estimate of the point density at the
given location: given location:


```clojure ```clojure
examples> (hst/density hist 0) examples> (density hist 0)
39687.562791977114 39687.56279
``` ```


The `uniform` fn returns a list of points that separate the The `uniform` fn returns a list of points that separate the
distribution into equal population areas. Here's an example that distribution into equal population areas. Here's an example that
produces quartiles: produces quartiles:


```clojure ```clojure
examples> (hst/uniform hist 4) examples> (uniform hist 4)
(-0.6723425970050285 -0.0011145378611749357 0.6713314937601746) (-0.67234 -0.00111 0.67133)
``` ```


We can plot the sums and density estimates as functions. The red line We can plot the sums and density estimates as functions. The red line
Expand All @@ -71,7 +71,7 @@ function](http://en.wikipedia.org/wiki/Probability_density_function)
for the normal distribution. for the normal distribution.


```clojure ```clojure
examples> (ex/sum-density-chart hist) examples> (ex/sum-density-chart hist) ;; also see (ex/cdf-pdf-chart hist)
``` ```
![Histogram from normal distribution] ![Histogram from normal distribution]
(https://img.skitch.com/20120427-jhrhpshfm6pppu3t3bu4kt9g7e.png) (https://img.skitch.com/20120427-jhrhpshfm6pppu3t3bu4kt9g7e.png)
Expand All @@ -85,13 +85,13 @@ points are distributed evenly with half the points less than the mean
and half greater. This explains the fraction sum in the example below: and half greater. This explains the fraction sum in the example below:


```clojure ```clojure
examples> (def hist (-> (hst/create :bins 3) examples> (def hist (-> (create :bins 3)
(hst/insert! 1) (insert! 1)
(hst/insert! 2) (insert! 2)
(hst/insert! 3))) (insert! 3)))
examples> (hst/bins hist) examples> (bins hist)
({:mean 1.0, :count 1} {:mean 2.0, :count 1} {:mean 3.0, :count 1}) ({:mean 1.0, :count 1} {:mean 2.0, :count 1} {:mean 3.0, :count 1})
examples> (hst/sum hist 2) examples> (sum hist 2)
1.5 1.5
``` ```


Expand All @@ -102,7 +102,7 @@ fourth unique value it will create a fourth bin and then merge the
nearest two. nearest two.


```clojure ```clojure
examples> (hst/bins (hst/insert! hist 0.5)) examples> (bins (insert! hist 0.5))
({:mean 0.75, :count 2} {:mean 2.0, :count 1} {:mean 3.0, :count 1}) ({:mean 0.75, :count 2} {:mean 2.0, :count 1} {:mean 3.0, :count 1})
``` ```


Expand All @@ -112,31 +112,31 @@ red line represents a histogram with 16 bins and the blue line
represents 64 bins. represents 64 bins.


```clojure ```clojure
examples> (ex/multi-density-chart examples> (ex/multi-pdf-chart
[(reduce hst/insert! (hst/create :bins 16) ex/normal-data) [(reduce insert! (create :bins 16) ex/normal-data)
(reduce hst/insert! (hst/create :bins 64) ex/normal-data)]) (reduce insert! (create :bins 64) ex/normal-data)])
``` ```
![64 and 32 bins histograms] ![64 and 32 bins histograms]
(https://img.skitch.com/20120427-1x2fdrd7k5ks4rr9w59wkks7g.png) (https://img.skitch.com/20120719-e9hhw8hu6ye74b8stg1fh8tyhf.png)


Another option when creating a histogram is to use *gap Another option when creating a histogram is to use *gap
weighting*. When `:gap-weighted?` is true, the histogram is encouraged weighting*. When `:gap-weighted?` is true, the histogram is encouraged
to spend more of its bins capturing the densest areas of the to spend more of its bins capturing the densest areas of the
distribution. For the normal distribution that means better resolution distribution. For the normal distribution that means better resolution
near the mean and less resolution near the tails. The chart below near the mean and less resolution near the tails. The chart below
shows a histogram without gap weighting in blue and with gap weighting shows a histogram without gap weighting in blue and with gap weighting
in red. Near the center of the distribution, red uses five bins in in red. Near the center of the distribution, red uses six bins in
roughly the same space that blue uses three. roughly the same space that blue uses three.


```clojure ```clojure
examples> (ex/multi-density-chart examples> (ex/multi-pdf-chart
[(reduce hst/insert! (hst/create :bins 16 :gap-weighted? true) [(reduce insert! (create :bins 16 :gap-weighted? true)
ex/normal-data) ex/normal-data)
(reduce hst/insert! (hst/create :bins 16 :gap-weighted? false) (reduce insert! (create :bins 16 :gap-weighted? false)
ex/normal-data)]) ex/normal-data)])
``` ```
![Gap weighting vs. No gap weighting] ![Gap weighting vs. No gap weighting]
(https://img.skitch.com/20120427-x7591npy3393iqs2k2cqfrr5hn.png) (https://img.skitch.com/20120719-e82yxgkph9te4fucc5yuktfy1m.png)


# Merging # Merging


Expand All @@ -146,11 +146,11 @@ combined to give a better overall picture.


```clojure ```clojure
examples> (let [samples (partition 1000 ex/normal-data) examples> (let [samples (partition 1000 ex/normal-data)
hist1 (reduce hst/insert! (hst/create :bins 16) (first samples)) hist1 (reduce insert! (create :bins 16) (first samples))
hist2 (reduce hst/insert! (hst/create :bins 16) (second samples)) hist2 (reduce insert! (create :bins 16) (second samples))
merged (-> (hst/create :bins 16) merged (-> (create :bins 16)
(hst/merge! hist1) (merge! hist1)
(hst/merge! hist2))] (merge! hist2))]
(ex/multi-density-chart [hist1 hist2 merged])) (ex/multi-density-chart [hist1 hist2 merged]))
``` ```
![Merged histograms] ![Merged histograms]
Expand All @@ -169,21 +169,21 @@ contain information summarizing the target. For numerics the targets
sums are tracked. For categoricals a map of counts is maintained. sums are tracked. For categoricals a map of counts is maintained.


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! 1 9) (insert! 1 9)
(hst/insert! 2 8) (insert! 2 8)
(hst/insert! 3 7) (insert! 3 7)
(hst/insert! 3 6) (insert! 3 6)
(hst/bins)) (bins))
({:target {:sum 9.0, :missing-count 0.0}, :mean 1.0, :count 1} ({:target {:sum 9.0, :missing-count 0.0}, :mean 1.0, :count 1}
{:target {:sum 8.0, :missing-count 0.0}, :mean 2.0, :count 1} {:target {:sum 8.0, :missing-count 0.0}, :mean 2.0, :count 1}
{:target {:sum 13.0, :missing-count 0.0}, :mean 3.0, :count 2}) {:target {:sum 13.0, :missing-count 0.0}, :mean 3.0, :count 2})
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! 1 :a) (insert! 1 :a)
(hst/insert! 2 :b) (insert! 2 :b)
(hst/insert! 3 :c) (insert! 3 :c)
(hst/insert! 3 :d) (insert! 3 :d)
(hst/bins)) (bins))
({:target {:counts {:a 1.0}, :missing-count 0.0}, :mean 1.0, :count 1} ({:target {:counts {:a 1.0}, :missing-count 0.0}, :mean 1.0, :count 1}
{:target {:counts {:b 1.0}, :missing-count 0.0}, :mean 2.0, :count 1} {:target {:counts {:b 1.0}, :missing-count 0.0}, :mean 2.0, :count 1}
{:target {:counts {:d 1.0, :c 1.0}, :missing-count 0.0}, :mean 3.0, :count 2}) {:target {:counts {:d 1.0, :c 1.0}, :missing-count 0.0}, :mean 3.0, :count 2})
Expand All @@ -192,9 +192,9 @@ examples> (-> (hst/create)
Mixing target types isn't allowed: Mixing target types isn't allowed:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! 1 :a) (insert! 1 :a)
(hst/insert! 2 999)) (insert! 2 999))
Can't mix insert types Can't mix insert types
[Thrown class com.bigml.histogram.MixedInsertException] [Thrown class com.bigml.histogram.MixedInsertException]
``` ```
Expand All @@ -203,22 +203,22 @@ Can't mix insert types
set explicitly: set explicitly:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert-categorical! 1 1) (insert-categorical! 1 1)
(hst/insert-categorical! 1 2) (insert-categorical! 1 2)
(hst/bins)) (bins))
({:target {:counts {2 1.0, 1 1.0}, :missing-count 0.0}, :mean 1.0, :count 2}) ({:target {:counts {2 1.0, 1 1.0}, :missing-count 0.0}, :mean 1.0, :count 2})
``` ```


The `extended-sum` fn works similarly to `sum`, but returns a result The `extended-sum` fn works similarly to `sum`, but returns a result
that includes the target information: that includes the target information:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! 1 :a) (insert! 1 :a)
(hst/insert! 2 :b) (insert! 2 :b)
(hst/insert! 3 :c) (insert! 3 :c)
(hst/extended-sum 2)) (extended-sum 2))
{:sum 1.5, :target {:counts {:c 0.0, :b 0.5, :a 1.0}, :missing-count 0.0}} {:sum 1.5, :target {:counts {:c 0.0, :b 0.5, :a 1.0}, :missing-count 0.0}}
``` ```


Expand All @@ -230,29 +230,29 @@ plotting easier). The density is in red and the average target value
is in blue: is in blue:


```clojure ```clojure
examples> (def make-y (fn [x] (+ 10000 (* 10000 (Math/sin x))))) examples> (def make-y (fn [x] (Math/sin x)))
examples> (def hist (let [target-data (map (fn [x] [x (make-y x)]) examples> (def hist (let [target-data (map (fn [x] [x (make-y x)])
ex/normal-data)] ex/normal-data)]
(reduce (fn [h [x y]] (hst/insert! h x y)) (reduce (fn [h [x y]] (insert! h x y))
(hst/create) (create)
target-data))) target-data)))
examples> (ex/density-target-chart hist) examples> (ex/pdf-target-chart hist)
``` ```
![Numeric target] ![Numeric target]
(https://img.skitch.com/20120427-q2y753qwnt4x1mhbs3ri9ddgt.png) (https://img.skitch.com/20120719-tfjnabp7t7sanskf4iqecp751d.png)


Continuing with the same histogram, we can see that `average-target` Continuing with the same histogram, we can see that `average-target`
produces values close to original target: produces values close to original target:


```clojure ```clojure
examples> (def view-target (fn [x] {:actual (make-y x) examples> (def view-target (fn [x] {:actual (make-y x)
:approx (hst/average-target hist x)})) :approx (average-target hist x)}))
examples> (view-target 0) examples> (view-target 0)
{:actual 10000.0, :approx {:sum 9617.150788081583, :missing-count 0.0}} {:actual 0.0, :approx {:sum -0.04696, :missing-count 0.0}}
examples> (view-target (/ Math/PI 2)) examples> (view-target (/ Math/PI 2))
{:actual 20000.0, :approx {:sum 19967.590011881348, :missing-count 0.0}} {:actual 1.0, :approx {:sum 0.99698, :missing-count 0.0}}
examples> (view-target Math/PI) examples> (view-target Math/PI)
{:actual 10000.000000000002, :approx {:sum 9823.774137889975, :missing-count 0.0}} {:actual 1.22464E-16, :approx {:sum -0.04881, :missing-count 0.0}}
``` ```


# Missing Values # Missing Values
Expand All @@ -263,35 +263,35 @@ summarizing the instances with a missing input. For a basic histogram,
that is simply the count: that is simply the count:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! nil) (insert! nil)
(hst/insert! 7) (insert! 7)
(hst/insert! nil) (insert! nil)
(hst/missing-bin)) (missing-bin))
{:count 2} {:count 2}
``` ```


For a histogram with a target, the `missing-bin` includes target For a histogram with a target, the `missing-bin` includes target
information: information:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! nil :a) (insert! nil :a)
(hst/insert! 7 :b) (insert! 7 :b)
(hst/insert! nil :c) (insert! nil :c)
(hst/missing-bin)) (missing-bin))
{:target {:counts {:a 1.0, :c 1.0}, :missing-count 0.0}, :count 2} {:target {:counts {:a 1.0, :c 1.0}, :missing-count 0.0}, :count 2}
``` ```


Targets can also be missing, in which case the target `missing-count` Targets can also be missing, in which case the target `missing-count`
is incremented: is incremented:


```clojure ```clojure
examples> (-> (hst/create) examples> (-> (create)
(hst/insert! nil :a) (insert! nil :a)
(hst/insert! 7 :b) (insert! 7 :b)
(hst/insert! nil nil) (insert! nil nil)
(hst/missing-bin)) (missing-bin))
{:target {:counts {:a 1.0}, :missing-count 1.0}, :count 2} {:target {:counts {:a 1.0}, :missing-count 1.0}, :count 2}
``` ```


Expand All @@ -308,8 +308,8 @@ do this, set the `:categories` parameter:
examples> (def categories (map (partial str "c") (range 50))) examples> (def categories (map (partial str "c") (range 50)))
examples> (def data (vec (repeatedly 100000 examples> (def data (vec (repeatedly 100000
#(vector (rand) (str "c" (rand-int 50)))))) #(vector (rand) (str "c" (rand-int 50))))))
examples> (doseq [hist [(hst/create) (hst/create :categories categories)]] examples> (doseq [hist [(create) (create :categories categories)]]
(time (reduce (fn [h [x y]] (hst/insert! h x y)) (time (reduce (fn [h [x y]] (insert! h x y))
hist hist
data))) data)))
"Elapsed time: 1295.402 msecs" "Elapsed time: 1295.402 msecs"
Expand All @@ -325,12 +325,12 @@ when creating the histogram. Declaring the types on creation allows
the targets to be missing in the first insert: the targets to be missing in the first insert:


```clojure ```clojure
examples> (-> (hst/create :group-types [:categorical :numeric]) examples> (-> (create :group-types [:categorical :numeric])
(hst/insert! 1 [:a nil]) (insert! 1 [:a nil])
(hst/insert! 2 [:b 8]) (insert! 2 [:b 8])
(hst/insert! 3 [:c 7]) (insert! 3 [:c 7])
(hst/insert! 1 [:d 6]) (insert! 1 [:d 6])
(hst/bins)) (bins))
({:target ({:counts {:a 1.0, :d 1.0}, :missing-count 0.0} ({:target ({:counts {:a 1.0, :d 1.0}, :missing-count 0.0}
{:sum 6.0, :missing-count 1.0}), {:sum 6.0, :missing-count 1.0}),
:mean 1.0, :count 2} :mean 1.0, :count 2}
Expand All @@ -354,9 +354,9 @@ shift, inserts become computationally cheap. However the quality of
the histogram can suffer if the `:freeze` parameter is too small. the histogram can suffer if the `:freeze` parameter is too small.


```clojure ```clojure
examples> (time (reduce hst/insert! (hst/create) ex/normal-data)) examples> (time (reduce insert! (create) ex/normal-data))
"Elapsed time: 391.857 msecs" "Elapsed time: 391.857 msecs"
examples> (time (reduce hst/insert! (hst/create :freeze 1024) ex/normal-data)) examples> (time (reduce insert! (create :freeze 1024) ex/normal-data))
"Elapsed time: 99.92 msecs" "Elapsed time: 99.92 msecs"
``` ```


Expand Down
9 changes: 9 additions & 0 deletions test/histogram/test/examples.clj
Expand Up @@ -16,6 +16,15 @@
(charts/function-plot (hst/pdf (first hists)) min max) (charts/function-plot (hst/pdf (first hists)) min max)
(next hists))))) (next hists)))))


(defn multi-density-chart [hists]
(let [min (reduce min (map (comp :min hst/bounds) hists))
max (reduce max (map (comp :max hst/bounds) hists))]
(core/view
(reduce (fn [c h]
(charts/add-function c #(hst/density h %) min max))
(charts/function-plot #(hst/density (first hists) %) min max)
(next hists)))))

(defn sum-density-chart [hist] (defn sum-density-chart [hist]
(let [{:keys [min max]} (hst/bounds hist true)] (let [{:keys [min max]} (hst/bounds hist true)]
(core/view (-> (charts/function-plot #(hst/sum hist %) min max) (core/view (-> (charts/function-plot #(hst/sum hist %) min max)
Expand Down

0 comments on commit 26ee372

Please sign in to comment.