Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histograms or binned counters #21

Closed
veprbl opened this issue Jan 7, 2017 · 2 comments
Closed

Histograms or binned counters #21

veprbl opened this issue Jan 7, 2017 · 2 comments

Comments

@veprbl
Copy link

veprbl commented Jan 7, 2017

Hi! This is a really great work! A clear improvement over TH* mess.

There is another library which attempts to tackle this issue - YODA. They have some interesting ideas and one of them is that they've realised that ROOT histograms are not exactly histograms but more of "binned counters". A simple illustration of this can be done using modified 1d histogram example:

import histogram as hg
import numpy as np
import matplotlib.pyplot as plt

h = hg.histogram(hg.variable_axis(1.,2.,4.))
h.fill(1.5)
h.fill(2.5)

x = np.array(h.axis(0))
y = np.asarray(h)
y = y[:len(x)-1]
y = np.append(y, [0])

plt.plot(x, y, drawstyle="steps-post")
plt.ylim(0.5, 1.5)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

This will draw a straight line suggesting that the probability of filled value is uniformly distributed over range from 1.0 to 3.0, which is a wrong estimate. YODA, on the other hand, operates with values normalised by the bin width by default, so it will draw a step instead of a straight line. I think it makes a lot of sense for boost_histogram to distinguish between bin width and sum of weights as well.

@HDembinski
Copy link
Collaborator

Hi!
Thank you for the nice feedback! I am still working on getting this library into boost. Please also check out the development version in branch "static", which features an even faster static_histogram type for C++ (you cannot use it from Python, though) and a more consistent interface on the C++ side.

Now to your point about histograms being binned counters. I know what you mean, but for technical reasons, the histogram has to store counters per bin internally. It is not safe to store densities directly, and it would seriously harm the performance of high-dimensional histograms.

However, on the C++ side it would be easy to add a .density(index) member, which would return the count in the bin divided by the bin width. On the Python side, the could be a .as_density attribute, which would generate a numpy array of densities on the fly from the stored counts.

I will add that to the static branch.

@HDembinski
Copy link
Collaborator

PS: Wikipedia on histograms: "In a more general mathematical sense, a histogram is a function ... that counts the number of observations that fall into each of the disjoint categories (known as bins)." This is in line with the way ROOT and this library model histograms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants