New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boxplot with zero IQR sets whiskers to max and min and leaves no outliers #5331
Comments
Do you have a reference that this is not the correct behavior? |
For reference here's a link to the section of |
Looks like the (my) decision to do so was fairly intentional |
I'm looking through all my references (including R docs) and I can't find any guidance and this specific scenario. At the time, I can only imagine myself thinking that:
I stand by the first thought. But now I'm one the fence about the second. The omission of guidance that you should do something different in an edge case like makes me think that we shouldn't. However, I do think that a horizontal line with whiskers looks better than a horizontal like with dots floating around. But matters of the heart matter little in matters such as these. |
Since boxplots have been tweaked and modified many times over the years I won't say your decision is wrong but it's definitely not one I've run into before. Looking at this (http://www.stata-journal.com/sjpdf.html?articlenum=gr0039) paper by Cox. The useful text from this paper is: ... Lines, often called whiskers, are drawn to span all data points within 1.5 IQR of He notes the zero length whisker possibility but doesn't worry about it. My assumption from this, other general reading, and working with R over the years is that the expected behaviour for a box-whisker plot was that the whiskers would be drawn at 1.5 IQR. As such the choice of 1.5 IQR unless IQR is zero then at the 0 and 100th percentile throws me off a bit. I feel that it's a bit misleading especially in a plot with many other features who have a non zero IQR. That being said, there is no canonical box-whisker plot. I've seen a number of alternate variations in the literature such as drawing whiskers at 2% and 98%. But I feel that the interpretation which is truest the original description by Tukey (who himself played with variations) would be to have zero length whiskers with any remaining points differing from the median as points. In the end it's definitely your decision. I'd at the very least recommend documenting this behaviour very explicitly in the boxplot documentation and perhaps even making it an option. |
@jc-healy I agree with this. Thanks very much with the well considered input. It's very valuable. See the PR references above. |
Just to bump the conversation as I encountered this issue today and am curious to know what the plans are for next updates. In the case of plotting multiple boxplots on the same plot, it happens that one of the dataset has a IQR of 0 and I think it brings a lot of confusion to have a 90% of the boxplots using 1.5IQR whiskers and 10% of them using the "min/max" whiskers. Adding the possibility to chose what should happen to the whiskers in case of IQR=0 would add a bit more flexibility to the function and solve the issue. Thanks |
Fixed in #5343 |
I believe the behaviour in matplotlib is deliberate (given lines 2018-2019 of cbook). However I don't believe this is the expected behaviour for a boxplot (for example R leaves the whiskers at the median and draws outlier points). I was hoping for clarification or a reference on the choice taken in matplotlib.
The text was updated successfully, but these errors were encountered: