Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better document meaning of notches in boxplots #3631

Closed
rasbt opened this issue Oct 10, 2014 · 12 comments · Fixed by #6703
Closed

Better document meaning of notches in boxplots #3631

rasbt opened this issue Oct 10, 2014 · 12 comments · Fixed by #6703
Labels
Difficulty: Easy https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Documentation
Milestone

Comments

@rasbt
Copy link
Contributor

rasbt commented Oct 10, 2014

Hi,
I found a problem regarding the boxplot function when the notch shape is being used (matplotlib 1.4.0).

I uploaded an self-contained example as IPython notebook with more details.

1

2

@tacaswell tacaswell added this to the v1.4.1 milestone Oct 10, 2014
@tacaswell
Copy link
Member

cc @phobson

@tacaswell
Copy link
Member

Digging into the code a bit, it looks like it is using the confidence intervals to determine the size of the notch and what is happening is that the internal extends past the 1st or 3rd quartile.

I can see two ways to fix this:

  1. draw the q{1,3} lines at their full widths and make the slopes of the notes shallower to make that work
  2. draw the q{1,3} lines at the width they are where they hit the notch lines

This is a relatively easy code fix (around L3426 in _axes.py), but I think we need a domain expert (as in someone who uses box plots!) to make the call as to which solution to use.

@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Oct 10, 2014
@phobson
Copy link
Member

phobson commented Oct 10, 2014

If that's where the confidence limits are, then there's nothing wrong. Mathematically it's perfectly plausible that the lower CL on the median is less than the first quartile. My boxplots appear like this all the time and seeing the notch inversion is a very useful visual key.

@tacaswell
Copy link
Member

Fair enough. I did not consider that this was the correct behaviour....

@phobson
Copy link
Member

phobson commented Oct 10, 2014

Just for thoroughness, I'm going to jump into -- ::deep breath:: -- R and confirm

@WeatherGod
Copy link
Member

Whatever decision is made, it might make sense to add an image test
codifying the behavior. It certainly looks "wrong" to me at first. Perhaps
documentation and examples might be prudent?

On Fri, Oct 10, 2014 at 12:35 PM, Paul Hobson notifications@github.com
wrote:

Just for thoroughness, I'm going to jump into -- ::deep breath:: -- R and
confirm


Reply to this email directly or view it on GitHub
#3631 (comment)
.

@phobson
Copy link
Member

phobson commented Oct 10, 2014

Yup, same behavior with a different dataset in R

r_bxp

@phobson
Copy link
Member

phobson commented Oct 10, 2014

worth noting that R raises a warning:

Warning message:
In bxp(list(stats = c(0.67, 0.70, 0.76, 0.78,  :
  some notches went outside hinges ('box'): maybe set notch=FALSE

But I think that's a poor message as it essentially tells the user that they should hide the fact that the values of the medians are uncertain.

To @rasbt, you can use:

bplot = plt.boxplot(data, notch=True, bootstrap=20000)  

to maybe tighten up the medians' confidence limit

@tacaswell tacaswell added status: won't or can't fix and removed Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Oct 10, 2014
@tacaswell tacaswell modified the milestones: unassigned, v1.4.1 Oct 10, 2014
@tacaswell tacaswell modified the milestones: v1.4.x, unassigned Oct 10, 2014
@tacaswell
Copy link
Member

I am convinced that this is correct behaviour and just needs some documentation and a test to make sure that confused people like me don't try to 'fix' it in the future.

@rasbt
Copy link
Contributor Author

rasbt commented Oct 11, 2014

Sorry for the confusion, you are absolutely right. This was actually a very stupid question in retrospect and this "flipping" behavior is quite useful information as @phobson pointed out :). Maybe it would be worthwhile to add this to documentation though. Or maybe a warning message that the confidence interval is very uncertain in those cases where upper CI > quartile 3 and lower CI < quartile 1.

@tacaswell tacaswell modified the milestones: v1.4.x, v1.4.2 Oct 17, 2014
@tacaswell tacaswell modified the milestones: v1.4.x, v1.4.3 Jan 22, 2015
@tacaswell tacaswell modified the milestones: 1.5.0, v1.4.x Feb 7, 2015
@tacaswell tacaswell changed the title Boxplots are drawn incorrectly when using the notch shape Better document meaning of notches in boxplots Feb 7, 2015
@tacaswell tacaswell modified the milestones: next point release, proposed next point release Jul 17, 2015
@tacaswell tacaswell modified the milestones: proposed next point release, next point release Jul 17, 2015
@story645 story645 added the Difficulty: Easy https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues label Jun 1, 2016
@lofidevops
Copy link

I think "won't fix" label needs to be removed, now that this is a documentation ticket? (Thanks for the thorough investigation here and at https://stackoverflow.com/questions/26291082 btw.)

@WeatherGod
Copy link
Member

good point. tag removed.

@QuLogic QuLogic modified the milestones: 2.0 (style change major release), 2.1 (next point release) Jul 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Easy https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants