Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should default hist() bins be changed in 2.0? #4487

Closed
jakevdp opened this issue Jun 1, 2015 · 9 comments
Closed

Should default hist() bins be changed in 2.0? #4487

jakevdp opened this issue Jun 1, 2015 · 9 comments

Comments

@jakevdp
Copy link
Contributor

jakevdp commented Jun 1, 2015

I would advocate changing the default histogram bins to a data-adaptive choice in 2.0. For an example of how this might look, see info on the recently-merged astropy hist() function: http://astropy.readthedocs.org/en/latest/visualization/histogram.html#normal-reference-rules

Using the freedman-diaconis rule as the default would be a useful change, IMO (though it would require some modification in the case of weighted samples)

@tacaswell
Copy link
Member

Probably not for 2.0, I am pretty adamant about keeping it color/style changes only.

This come up recently (#4316) and it got pushed to the numpy mailing list (but I don't know what happened to it there).

hist is a giant function that already does too many things. I think the future of mpl hist should be to be a dispatch mechanism to the N ways to draw pre-binned histogram data and get out of the computation business all together.

If we really want to maintain the adaptive binning code in mpl we should add a auto_hist function rather than break API.

#4433 makes one common use case (filled steps) a lot easier to do with pre-binned data.

I also have some thoughts that I need to turn into a PR about providing a decorator in pyplot to allow users to more-or-less register pyplot functions which would make writing a small third-party auto_hist library much easier to do.

@tacaswell tacaswell added this to the next major release milestone Jun 1, 2015
@jakevdp
Copy link
Contributor Author

jakevdp commented Jun 1, 2015

Fair enough – thanks!

@tacaswell tacaswell modified the milestones: next major release, Color overhaul Jul 11, 2015
@Tillsten
Copy link
Contributor

@tacaswell did you change your opinion, because i would love to see that go into 2.0. Pushing
it to the next major release afterwards would probably mean 2-3 years for matplotlib.

@tacaswell
Copy link
Member

Support for automatic bin width selection is on numpy master and will be in numpy 1.11.

If the user has a new enough version of numpy installed ax.hist(..., bins='auto') will work.

I am still skeptical of any 'automatic' data analysis (but if @jakevdp trusts it I should probably stop worrying), however I would not want to bump the mpl default until our min numpy version is > 1.11.

I think adding an rcparam which sets the default values of bins (and hence, can be controlled via the styles module) is a decent compromise and can go in for 2.0.

@nayyarv
Copy link

nayyarv commented Sep 9, 2015

@jakevdp I'm trying to add support for weighted data for the binwidth selection procedures in numpy (which was overlooked in numpy/numpy#6029). See numpy/numpy#6288 for details.

We're currently unsure about how to deal with the various possibilities of the weights, i.e. count-like (n=weights.sum()), probability like (n=a.size), or neither (n=a.size), and was wondering if you had any thoughts or suggestions that might us either automatically guess or get explicit instructions from user.

@tacaswell tacaswell modified the milestones: Color overhaul, next major release (2.0) Oct 26, 2015
@mdboom mdboom self-assigned this Nov 17, 2015
@mdboom mdboom closed this as completed in f1b89b2 Nov 18, 2015
tacaswell added a commit that referenced this issue Nov 18, 2015
Fix #4487: Take hist bins from rcParam
tacaswell added a commit that referenced this issue Nov 18, 2015
Fix #4487: Take hist bins from rcParam
@tacaswell
Copy link
Member

This was addressed by adding an rcparam.

@amueller
Copy link
Contributor

amueller commented Feb 3, 2020

Is the numpy dependency high enough to change the default yet? I don't think an rcparam addresses this. We can try to educate users and should, but people will still call hist without arguments. I had students not do this after I told them for over an hour that they shouldn't use the defaults and should use auto here. Given that sample, I'm pretty sure way too many people use the default unknowingly.

In other words: pretty please change the default for the next major version if possible.

@jklymak
Copy link
Member

jklymak commented Feb 3, 2020

@amueller can you open a new issue with more details of what you are proposing? As much as possible, I think we should just pass arguments through to np.histogram.

@tacaswell
Copy link
Member

tacaswell commented Feb 3, 2020

Yes, we are are np.1.15 on master so yes. 👍 on a new issue for this. We currently don't have a target date for 4.0 (which is what we are roughly calling the major re-think), but it may be the case that there are enough breaking, but incremental changes, that people have asked for to be worth doing a "small" major release.

[edit to make run on sentence a bit less run-on]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants