New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upper limit to number of bins in hist and refactor bin edge calculation #7991
Conversation
Hi there @mwcraig 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. Everything looks good from my point of view! 👍 If there are any issues with this message, please report them here. |
Happy to add a changelog entry once a milestone is set... |
@mwcraig - can you rebase? |
…er of bins in hist
@astrofrog -- rebased -- does the changelog entry go in the section for 3.1 or 2.10? |
@mwcraig - I think 2.0.10 makes sense as I think this was really a bug. Note that there are some real failures here in the doc build:
|
Looks like a reasonable change to me. Though I think that |
I made the new function public and added a doctstring. A couple more changes are incoming because it turns out there is a bug when calculating the histogram with a |
aka coding by random walk 🤦
@crawfordsm or @larrybradley - could you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @mwcraig. I did find one typo in a docstring.
Codecov Report
@@ Coverage Diff @@
## master #7991 +/- ##
==========================================
+ Coverage 86.87% 86.92% +0.04%
==========================================
Files 384 383 -1
Lines 57860 57855 -5
Branches 1078 1056 -22
==========================================
+ Hits 50265 50288 +23
+ Misses 6957 6953 -4
+ Partials 638 614 -24
Continue to review full report at Codecov.
|
it will be (x.min(), x.max()). However, if bins is a list it is | ||
returned unmodified regardless of the range argument. | ||
|
||
weights : array_like, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a random comment from the sidelines, but why include weights
if it does not influence the calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was because of this statement in np.histogram_bin_edges
, which takes a weights
argument:
weights : array_like, optional
An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). This is currently not used by any of the bin estimators, but may be in the future.
Right now it does no harm, but if, in the future, there is some sort of way of finding histogram edges in numpy that uses weights we won't need to make any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if we just mimic np.histogram_bin_edges
, then that's totally fine. Maybe add the same sentence, that it might be used in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Thank you @mwcraig! |
Add upper limit to number of bins in hist and refactor bin edge calculation
Looking at the diffs here again, now it this looks more borderline of a slight API change (with the addition of a new kwarg, and well as a new function exposed to the users) and a pure bugfix. If the tests go all well with the backport I'm on the side of letting it end up in 2.0.10, but let me know if anyone would like to veto it. |
Add upper limit to number of bins in hist and refactor bin edge calculation
This PR fixes #7758 by separating the calculation of the bin edges for a histogram from the calculation of the histogram and raising an exception if the number of bins is very large. The default value I chose for the maximum number of bins is ridiculously large (1e6) but I was afraid that setting a value too low (around 3-4000 one sees matplotlib slow down) might start to raise exceptions in code people are already running.
FWIW, I first hit #7758 when I tried to histogram the pixel values of a calibrated image which turned out to have some extreme values. That led to ~5e8 bins, followed by a computer freeze as all the memory was slowly consumed. That same image dropped into this fix generates an error instead.
Edit: Also fixes #8010.