New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-specified medians and conf. intervals in boxplots #906
Conversation
I've been thinking of implementing something similar recently, but you've beat me to it ;-) I notice your spacing is not looking so hot. What editor are you using? Does it put tabs in instead of 4 spaces? Would it be easy for you be able to correct this before we go much further in the review? |
Yikes! Yes. I'll fix that. I cooked up a quick VM at home this weekend Thanks for looking at the PR. On Tue, May 29, 2012 at 7:33 AM, Phil Elson
|
OK. This is looking good on my system now. Rebuilt matplotlib from scratch, ran my script, and everything worked as expected. |
bsIndex = np.random.random_integers(0,M-1,M) | ||
bsData = data[bsIndex] | ||
estimate[n] = mlab.prctile(bsData, 50) | ||
CI = mlab.prctile(estimate, percentile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just occurred to me that we should use numpy.percentile now (assuming it's available in minimum version of numy that MPL supports.
This looks good. Can you add an example and a unit test? |
text_transform= mtransforms.blended_transform_factory(ax.transData, | ||
ax.transAxes) | ||
ax.set_xlabel('treatment') | ||
ax.set_ylabel('response') | ||
ax.set_ylim(-0.2, 1.4) | ||
#ax.set_ylim(-0.2, 1.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment serves no purpose other than to make it harder to read. Would you mind just nuking it?
Would you mind adding a simple unit test? Other than that, this change gets my thumbs up. Great work! |
Phil, I will happily address all of these comments shortly. It has been a On Friday, June 15, 2012, Phil Elson wrote:
|
@phobson: No problems. Have a great break! |
I addressed your comments. Sorry for the delay. In summary:
|
@pelson, has the OP addressed your concerns here? |
msg2 = "usermedians' length must be compatible with x" | ||
if usermedians is not None: | ||
if hasattr(usermedians, 'shape'): | ||
assert len(usermedians.shape) == 1, msg1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do asserts for checking user inputs. Raise a ValueError instead.
Neat work. We are close to getting this merged in. One more very important thing to add is a note in doc/users/whats_new.rst. Also, you need to run boilerplate.py in the matplotlib's source directory in order to regenerate the pyplot file. |
# conf. intervals from user, if available | ||
if conf_intervals is not None and \ | ||
conf_intervals[i] is not None: | ||
notch_max = np.max(conf_intervals[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indentation looks a little funny. Have tabs been used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looked a little funny since the conditional was wrapped around the second line. python parsed it fine, but i've lined everything up to look nicer. will see with my next commit/push
Yes. I am finding it hard to assert that the logic/statistics are identical (thanks to the improvement of layout). But what I can see looks good. Gets my +1. |
@pelson, good to know. Tasks that remain: get pyplot.py regenerated, odd indentation fixed (or explained), and the new entry to the whats_new.rst, |
@WeatherGod Sorry to require so much hand-holding, here. I ran boilerplate.py like this: paul@flint ~/sources/matplotlib $ git status On branch manual-boxplots-2nothing to commit (working directory clean) There don't seem to be any calls to sys.argv, so i'm not sure how to get this to behave properly. Any advice would be much appreciated. Thanks. |
It may be that your branch is based on a version before the bolierplate script was updated. My advice at this stage would be to rebase your branch (the rebase is needed anyway), and then run the boilerplate.py script. The workflow goes something like:
|
…e confidence intervals
…g anything and there's no point in adjusting the subplot spacing.
@pelson thanks for the git-fu! worked like a charm. |
@@ -670,6 +670,7 @@ def test_hist_log(): | |||
ax.set_xticks([]) | |||
ax.set_yticks([]) | |||
|
|||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you missed something while resolving conflicts. This means you didn't run the tests after rebasing. Please run the tests to make sure everything looks good. This may also impact the resulting test images because changes may have occurred to the rendering algorithms (line snapping, text anti-aliasing, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeatherGod: Just cleared that up. Sorry for the slow response. All tests just passed.
@pelson Thanks for squashing and merging! Really glad I could contribute back to matplotlib! I also really appreciate the feedback and guidance through the whole process from you and everyone else. |
First, here's a test script: https://gist.github.com/2814818
I know I've submitted at least one other PR for this, but this time I've think I've got it right (or at least much closer).
Basically, the user provides a list of medians and confidence intervals where (if using numpy arrays)
usermedians.shape = (N,)
conf_intervals.shape = (N,2)
and the data to be plotted has: data.shape = (M,N).
All of this allows the user to compute the medians and its confidence intervals outside of the boxplot function using more statistically robust methods of his or her choice.
I've used a lot of
assert
statements to verify that the usermedians and conf_intervals inputs are compatible with the data being plotted (x
in the axes.boxplot call signature).Hope this is useful and can be incorporated into the library.
Thanks for all of the hard work
-paul