-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table.hist(bin_column=...) generates wrong x-axis label #337
Comments
@davidwagner I do agree about this. I can put up a change quite quickly (I've figured it out, I think) - but do we need the other instructors to agree to the proposed changes? |
I don't think we need to wait for the other instructors. This shouldn't change anything for existing usage (without bin_column), and for the rare cases with bin_column, it seems clear enough that the current behavior is broken. |
On second thought, I'm not so sure this is the wrong behavior. I think the original use case for bin_column was for something like a 2-column table where one column is the bins and the other column being the count of a unit as related to the first column. For example, a table like this:
So in this case, a call to Table#hist (ex. |
However, I'm getting curious about one thing: If the bin_column parameter's column should be showing the lower end of the bins, in the example you've shown, why isn't the 3 bin extending all the way to 9? Are bins of default size 1 the only thing that's supported with this parameter? |
Oh. You've got a good point -- now that I look at it again, I think I agree with you. I ran across this when building the textbook; the example is taken from Chapter 14.1 of the textbook (it's in a hidden cell in the notebook). However now that I read your comments I suspect this might be a misuse of I found only two other places where we use Based on a superficial glance, both seem to be used in a way consistent to how you mention. So I'm now thinking the correct fix might be to preserve the existing behavior, and change the way we draw the chart in Chapter 14.1 of the textbook. Does this sound right to you? Perhaps @papajohn will have an opinion. |
Yeah, Chapter 7.2 was the reason `bin_column` was added, and the behavior
is as intended.
…On Thu, Jun 20, 2019 at 7:30 PM davidwagner ***@***.***> wrote:
Oh. You've got a good point -- now that I look at it again, I think I
agree with you.
I ran across this when building the textbook; the example is taken from Chapter
14.1 of the textbook
<https://www.inferentialthinking.com/chapters/14/1/Properties_of_the_Mean.html>
(it's in a hidden cell in the notebook
<https://github.com/data-8/textbook/blob/7a590fa192fbe8bfc1678661a016cd18aba6bab8/content/chapters/14/1/Properties_of_the_Mean.ipynb>).
However now that I read your comments I suspect this might be a misuse of
bin_column.
I found only two other places where we use bin_column in the course
materials:
-
Chapter 7.2 of the textbook
<https://www.inferentialthinking.com/chapters/07/2/Visualizing_Numerical_Distributions.html>
-
Lab 5
<http://datahub.berkeley.edu/hub/user-redirect/git-sync?repo=https://github.com/data-8/materials-sp19&subPath=materials/sp19/lab/lab05/lab05.ipynb>
Based on a superficial glance, both seem to be used in a way consistent to
how you mention.
So I'm now thinking the correct fix might be to preserve the existing
behavior, and change the way we draw the chart in Chapter 14.1 of the
textbook. Does this sound right to you?
Perhaps @papajohn <https://github.com/papajohn> will have an opinion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#337?email_source=notifications&email_token=AACOFEIZOFT4QI6BSMP7G43P3O5E3A5CNFSM4FZDSWF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYGCZOQ#issuecomment-504114362>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACOFEPZGU7L53XTYSA6WXDP3O5E3ANCNFSM4FZDSWFQ>
.
|
Sounds good. I'll close this issue and perhaps we can open an issue to rewrite Chapter 14.1? |
If you pass the
bin_column
parameter toTable.hist()
, the x-axis is labelled inappropriately. Consider, e.g.,:This produces the following histogram:
Notice how the x-axis got a label based on
Proportion
. Instead, the x-axis should be labelled withValue
.The bug is in
prepare_hist_with_bin_column()
inTable.hist()
. In particular, it generates the x-axis label with the codew.rstrip(' count')
where herew
is the label on the column of counts (e.g.,Proportion
in this case). It seems that the code is expecting/assuming that the label will have a particular format (e.g., that it would beValue count
in this case). Instead, the x-axis label should be generated from the column label you passed tobin_column
. Thus, it looks likew.rstrip(' count')
should be changed tobin_column
.@papajohn, @a-adhikari, do you agree?
The text was updated successfully, but these errors were encountered: