-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: Better integration with <numeric> #35
Comments
It would be an overstatement to say that it completely doesn't work:
Yes, it is less convenient, and one could provide an interface for iterating only over the values. I see that the real problem is that example code inserts values back into the original histogram. This really feels like what ROOT users normally do. The ROOT developers and community have implemented all imaginable operations on histograms, so it made it very tempting to use them as you normally use, say, numpy arrays (I call this a histogram driven data analysis). I can imagine I don't need to tell how ugly it gets. And it applies to this case too. The cumulative distribution is not a histogram, it has a different meaning, you won't be able to safely manipulate it with routines designed to handle histograms (e.g. think how the bin statistics are not independent now). So my point is that a good histogram object should provide a means to reduce it to some other kind of value (a single number, or vector, or matrix, graph, ...), but list of operations allowed to produce a histogram should be limited to the ones that actually produce a sensible histogram. |
I absolutely agree with you. A histogram is not a general containers of numbers, the values inside have a strong meaning. Giving write-access to the values would potentially violate the invariants of the class, notably the consistency of the variance estimates. I will quote my answer to the boost mailing list. tl;dr: I think I should provide a conversion to Boost.MultiArray. Users can then use iterators of
|
Ok, I thought about it some more. Returning a Boost.MultiArray does not work very well for several reasons. I don't want to return a copy, because the copy could be very large. Returning a view is difficult. Manually specifiying the histogram dimension in the call is also annoying. And if you just want to iterate over the values, it should be possible, so I am thinking of adding iterators for that. The code should then look like this:
The iterator just goes over all normal bins, excluding under/overflow bins in implementation-defined order. For 1d-histograms it would just work, and if you need to compute something multi-dimensional, you will be able to query the index of the current element. Like so:
|
has been fixed mean-while in the master |
On 18. Nov 2017, at 17:08, Bjorn Reese via Boost boost@lists.boost.org wrote:
I would like to see a better integration between boost.histogram and
standard algorithms, especially those from
<numeric>
. Today they do notwork together because the dereference operator of the axis iterator
returns a pair (index and content) rather than the content.
An example of where standard algorithms could be useful is calculating
the cumulative distribution of a one-dimensional histogram, which ought
to be a simple as:
auto& axis = h.axis();
std:partial_sum(axis.begin(), axis.end(), axis.begin());
Another example is ranking a set of histograms based on the cosine
similarity [1] using std::inner_product().
[1] https://en.wikipedia.org/wiki/Cosine_similarity
The text was updated successfully, but these errors were encountered: