# histbook

Numpy-native histogramming

Numpy, Pandas, and just about every statistical toolbox has a "histogram" command, but HEP physicists would find them underwhelming:

   * they usually go directly from raw data (in memory) to a plot: no interface for accumulating data from disk, accessing bin contents programmatically, or even setting bin ranges by hand
   * they sometimes don't have weights, negative weights, and never have profile plots

Could use ROOT, but then we'd have to send data *back* over the divide.

Besides, histogram interfaces can be improved.

Familiar (to HEP) interface, except that we fill a whole array instead of one value:

In [None]:
import numpy
from histbook import *
h = Hist(bin("x", 10, -5, 5))

In [None]:
h.fill(x=numpy.random.normal(-1.5, 1, 1000000))        # fill with some data

In [None]:
h.fill(x=numpy.random.normal(1.5, 1, 1000000))         # fill with more data

In [None]:
from vega import VegaLite as canvas                    # plots in Jupyter
# import vegascope; canvas = vegascope.LocalCanvas()   # if you don't want to use Jupyter

In [None]:
h.step().to(canvas)

In [None]:
h.pandas()

Plays well with the scientific Python ecosystem:
   * Numpy for large data input
   * [Vega-Lite](https://vega.github.io/vega-lite/) for default plotting (PyROOT is an option)
   * Pandas for tabular access

Plays well with HEP expectations:
   * re-fillable, "hadd"-able, profile plots
   * weights with sumw2, HEP-style error handling

But also:
   * all histograms are n-dimensional
   * plotting is a matter of slicing and projecting onto graphical facets (x, y, color, trellis)
   * quantities to plot are reordered to fill in an optimal way, minimizing passes over data

N-dimensional histograms:

In [None]:
h = Hist(bin("Jet_pt", 100, 0, 100), bin("Jet_eta", 100, -5, 5), bin("Jet_phi", 100, -numpy.pi, numpy.pi))

In [None]:
import uproot
tree = uproot.open("~/storage/data/NanoAOD-DYJetsToLL.root")["Events"]
h.fill(**tree.arrays(["Jet_pt", "Jet_eta", "Jet_phi"]))

In [None]:
h.step("Jet_eta").to(canvas)

In [None]:
h.rebin("Jet_eta", (-5, -4, -2, 2, 4, 5)).overlay("Jet_eta").step("Jet_pt", yscale="log").to(canvas)