Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table.hist(bin_column=...) generates wrong x-axis label #337

Closed
davidwagner opened this issue Oct 4, 2018 · 7 comments
Closed

Table.hist(bin_column=...) generates wrong x-axis label #337

davidwagner opened this issue Oct 4, 2018 · 7 comments
Labels

Comments

@davidwagner
Copy link
Member

If you pass the bin_column parameter to Table.hist(), the x-axis is labelled inappropriately. Consider, e.g.,:

t = Table().with_columns(
    'Value', make_array(2, 3, 9),
    'Proportion', make_array(0.25, 0.5, 0.25)
)

t.hist(bin_column='Value', bins=np.arange(1.5, 9.6, 1))

This produces the following histogram:

Notice how the x-axis got a label based on Proportion. Instead, the x-axis should be labelled with Value.

The bug is in prepare_hist_with_bin_column() in Table.hist(). In particular, it generates the x-axis label with the code w.rstrip(' count') where here w is the label on the column of counts (e.g., Proportion in this case). It seems that the code is expecting/assuming that the label will have a particular format (e.g., that it would be Value count in this case). Instead, the x-axis label should be generated from the column label you passed to bin_column. Thus, it looks like w.rstrip(' count') should be changed to bin_column.

@papajohn, @a-adhikari, do you agree?

@adnanhemani
Copy link
Member

@davidwagner I do agree about this. I can put up a change quite quickly (I've figured it out, I think) - but do we need the other instructors to agree to the proposed changes?

@davidwagner
Copy link
Member Author

I don't think we need to wait for the other instructors. This shouldn't change anything for existing usage (without bin_column), and for the rare cases with bin_column, it seems clear enough that the current behavior is broken.

@adnanhemani
Copy link
Member

On second thought, I'm not so sure this is the wrong behavior. I think the original use case for bin_column was for something like a 2-column table where one column is the bins and the other column being the count of a unit as related to the first column. For example, a table like this:

t = Table().with_columns("bins", np.arange(4), "cars_seen", [6, 4, 5, 9])

So in this case, a call to Table#hist (ex. t.hist(bin_column="bins")) should have the label "cars_seen" on the x-axis and bins drawn at 0, 1, 2, 3. In the example you've shown above, the "cars_seen" column from the example matches to the "Proportion" column and the "bins" column relates to the "Value" column. If I'm correct, then from this example you've identified a use-case that isn't what we expect from the library but a use-case that (I'd think) is something that the library doesn't support (and may not be able to support due to the expense on the simplicity of the library that supporting a feature like this may occur).

@adnanhemani
Copy link
Member

However, I'm getting curious about one thing: If the bin_column parameter's column should be showing the lower end of the bins, in the example you've shown, why isn't the 3 bin extending all the way to 9? Are bins of default size 1 the only thing that's supported with this parameter?

@davidwagner
Copy link
Member Author

Oh. You've got a good point -- now that I look at it again, I think I agree with you.

I ran across this when building the textbook; the example is taken from Chapter 14.1 of the textbook (it's in a hidden cell in the notebook). However now that I read your comments I suspect this might be a misuse of bin_column.

I found only two other places where we use bin_column in the course materials:

Based on a superficial glance, both seem to be used in a way consistent to how you mention.

So I'm now thinking the correct fix might be to preserve the existing behavior, and change the way we draw the chart in Chapter 14.1 of the textbook. Does this sound right to you?

Perhaps @papajohn will have an opinion.

@papajohn
Copy link
Contributor

papajohn commented Jun 23, 2019 via email

@adnanhemani
Copy link
Member

Sounds good. I'll close this issue and perhaps we can open an issue to rewrite Chapter 14.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants