-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histograms or binned counters #21
Comments
Hi! Now to your point about histograms being binned counters. I know what you mean, but for technical reasons, the histogram has to store counters per bin internally. It is not safe to store densities directly, and it would seriously harm the performance of high-dimensional histograms. However, on the C++ side it would be easy to add a .density(index) member, which would return the count in the bin divided by the bin width. On the Python side, the could be a .as_density attribute, which would generate a numpy array of densities on the fly from the stored counts. I will add that to the static branch. |
PS: Wikipedia on histograms: "In a more general mathematical sense, a histogram is a function ... that counts the number of observations that fall into each of the disjoint categories (known as bins)." This is in line with the way ROOT and this library model histograms. |
Hi! This is a really great work! A clear improvement over TH* mess.
There is another library which attempts to tackle this issue - YODA. They have some interesting ideas and one of them is that they've realised that ROOT histograms are not exactly histograms but more of "binned counters". A simple illustration of this can be done using modified 1d histogram example:
This will draw a straight line suggesting that the probability of filled value is uniformly distributed over range from 1.0 to 3.0, which is a wrong estimate. YODA, on the other hand, operates with values normalised by the bin width by default, so it will draw a step instead of a straight line. I think it makes a lot of sense for boost_histogram to distinguish between bin width and sum of weights as well.
The text was updated successfully, but these errors were encountered: