Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Streaming Histograms for Clojure/Java
Java Clojure

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.


This project is an implementation of the streaming, one-pass histograms described in Ben-Haim's Streaming Parallel Decision Trees. The histogram includes the extension added by Tyree's Parallel Boosted Regression Trees which allows the histogram to include numeric targets (useful for regression trees). The histogram follows a similar approach to support categorical targets (useful for classification trees).

The histograms act as an approximation of the underlying dataset. They can be used for learning, visualization, discretization, or analysis. This includes finding the median or any other percentile in one pass. The histograms may be built independently and merged, convenient for parallel and distributed algorithms.


  1. Make sure you have Java 1.6 or newer
  2. Install leiningen
  3. Checkout the histogram project with git
  4. Run lein jar


long pointCount = 100000;
int histogramBins = 100;
Random random = new Random();
Histogram hist = new Histogram(histogramBins);

for (long i = 0; i < pointCount; i++) {

//the sum at 0 should be about 50000
double sum = hist.sum(0);

//the split point between two uniform (by population) bins should be about 0
//this is an approximate median
double split = hist.uniform(2).get(0);
(let [data (repeatedly 100000 #(rand))
      hist (reduce insert! (create) data)]
  (median hist))


Insert time scales log(n) with respect to the number of bins in the histogram.

timing chart

Something went wrong with that request. Please try again.