# Histogrammar: a set of data aggregation primitives

Histogrammar is a set of data aggregation and statistical analysis primitives that integrates well with Spark, Scala and Spark SQL (oh, and Python). 

Histogrammar is an open source (Apache 2.0) project where the 2 main contributors are from Princeton University (Jim Pivarski and Alexey Svyatkovskiy - me).

More details and tutorials are avilable here: http://histogrammar.org/docs/

## Installation instructions

Histogrammar is available on Maven Central, a publicly accessible Java/Scala repository with dependency management.

### Apache Spark

To use Histogrammar in the Spark shell, you don’t have to download anything. Just start Spark with (same as we did with Databrick's spark-csv)

```bash
spark-shell --packages "org.diana-hep:histogrammar_2.11:1.0.3"
```

and call

```scala
import org.dianahep.histogrammar._
```

on the Spark prompt. For plotting with Bokeh, include `org.diana-hep:histogrammar-bokeh_2.11:1.0.3` and for interaction with Spark-SQL, include `org.diana-hep:histogrammar-sparksql_2.11:1.0.3`.

Use `_2.11` for compatibility with Spark 2.0.0 (Scala 2.11) and `_2.10` for compatibility with Spark 1.x (Scala 2.10).
Note: due to a dependency bug, Bokeh is incompatible with Spark 2.x (Scala 2.11).

### Java/Scala with Maven

To compile Histogrammar into a project with the Maven build tool, add

```
<dependency>
  <groupId>org.diana-hep</groupId>
  <artifactId>histogrammar_2.11</artifactId>
  <version>1.0.3</version>
</dependency>
```

to your `<dependencies>` section. Use `_2.11` for compatibility with Scala 2.11 and `_2.10` for compatibility with Scala 2.10.

### Scala with sbt

To use Histogrammar in sbt console or to compile it into a project with the sbt build tool, add

```
libraryDependencies += "org.diana-hep" %% "histogrammar" % "1.0.3"
```
to your build.sbt file. The double-percent gets the appropriate version of Histogrammar for your version of Scala.

More specifics on the installation can be found here:

http://histogrammar.org/docs/install

## Basic aggregation and plotting with Histogrammar and Spark

In a separate terminal window, start interactive spark-shell with Histogrammar and Bokeh packages pre-loaded:

```bash
spark-shell --packages "org.diana-hep:histogrammar-bokeh_2.10:1.0.3"
```

### Filling and plotting a Histogram in Scala
First example of plotting a histogram with scala-bokeh uses Scala and artificial data for the sake of simplicity.

Start by importing the Histogrammar package and the plotting library:

```scala
import org.dianahep.histogrammar._
import org.dianahep.histogrammar.bokeh._
```

Generate artificial data:

```scala
val simple = List(3.4, 2.2, -1.8, 0.0, 7.3, -4.7, 1.6, 0.0, -3.0, -1.7)
```

Book two histograms:

```scala
val one = Histogram(5, -5, 8, {x: Double => x})
val two = Histogram(5, -3, 7, {x: Double => x})
```

Fill both histograms in one line of code using Label class:

```scala
val labeling = Label("one" -> one, "two" -> two)
simple.foreach(labeling.fill(_))
```

Start by plotting histogram one:

```scala
val plot_one = one.bokeh().plot()
save(plot_one,"scala_plot_one.html")
```

### Configuring Bokeh Glyph attributes

By default, a line glyph of black color is plotted. One can easily turn this into a bar plot filled with red by passing arguments to bokeh() method as follows:

```scala
import io.continuum.bokeh._
val plot_one = one.bokeh(glyphType="histogram",fillColor=Color.Red).plot()
save(plot_one,"scala_plot_one.html")
```

### Superimposing multiple glyphs on one plot

To superimpose two histograms booked and filled above on one plot, one create and configure a glyph for each of the histograms, and call the plot() method awhich ccepts variable length argument list, and therefore can take any number of glyphs.

```scala
val glyph_one = one.bokeh() //use default
val glyph_two = two.bokeh(glyphType="histogram",fillColor=Color.Red) //customize
val plot_both = plot(glyph_one,glyph_two)
save(plot_both,"scala_plot_both.html")
```

### Plotting a stack of Histograms

Here is an example of how to make a stacked plot of histograms. Let us generate more artificial data, different from one and two:

```scala
val extra = List(3.2, 3.2, -2.1, 1.0, 1.3, -3.4, 0.6, 0.0, -1.0, 1.7)
```

and book a third histogram:

```scala
val three = Histogram(5, -3, 7, {x: Double => x})
```

Note: only histograms with the same binning can be stacked!

Now, fill it:

```scala
extra.foreach(three.fill(_))
```

Prepare a stacked histogram using a dedicated build() method, and plot it:

```scala
val s = Stack.build(two,three)
val glyph_stack = s.bokeh() //use defaults
val plot_stack = plot(glyph_stack)
save(plot_stack,"scala_plot_stack.html")
```