New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Suggestion] Add histogram plot of residual errors to ResidualsPlot #264

Closed
ianozsvald opened this Issue Jun 21, 2017 · 12 comments

Comments

Projects
None yet
5 participants
@ianozsvald
Contributor

ianozsvald commented Jun 21, 2017

The current ResidualsPlot shows training and testing residuals as a scatter plot, by eye we can get an idea of whether more errors are above or below the 0 line. By adding a histogram of testing errors we might more clearly be able to tell if errors have a Normal distribution.

In the following examples I have some large positive and negative errors, from the histogram it looks as though I have a negatively skewed distribution which might tell me something about my training examples:
image

from yellowbrick.regressor import ResidualsPlot
fig, ax = plt.subplots(figsize=(8,6)); 
model = ResidualsPlot(clone_estimator(clf), ax=ax)
model.fit(X_train, y_train)
model.score(X_test, y_test)

# add histogram of residual errors
left, bottom, width, height = [0.65, 0.17, 0.2, 0.2]
ax2 = fig.add_axes([left, bottom, width, height])

testing_residuals = pd.Series(model.predict(X_test) - y_test)
testing_residuals.plot(kind="hist", bins=50, title="Residuals on Predicted", ax=ax2);
ax2.vlines(0, ymin=0, ymax=ax2.get_ylim()[1] ) # add x==0 line

model.poof()

It isn't obvious where the best location would be for the histogram. Annoyingly I cannot get an alpha value for ax2 either (I'd hoped to make this semi-transparent so location was less of an issue).

@ndanielsen

This comment has been minimized.

Show comment
Hide comment
@ndanielsen

ndanielsen Jun 21, 2017

Contributor

@ianozsvald thanks for this great feature enhancement suggestion. you're welcome to make a pull request with this enhancement =)

Contributor

ndanielsen commented Jun 21, 2017

@ianozsvald thanks for this great feature enhancement suggestion. you're welcome to make a pull request with this enhancement =)

@ndanielsen

This comment has been minimized.

Show comment
Hide comment
@ndanielsen
Contributor

ndanielsen commented Jun 21, 2017

@ianozsvald

This comment has been minimized.

Show comment
Hide comment
@ianozsvald

ianozsvald Jun 21, 2017

Contributor

Sorry, I'm only going as far as sharing some proof-of-concept code, I'm still recovering from running PyDataLondon 2017 and I'm not up to extending any libraries at the moment! Maybe someone will find time to take this a little further. Cheers!

Contributor

ianozsvald commented Jun 21, 2017

Sorry, I'm only going as far as sharing some proof-of-concept code, I'm still recovering from running PyDataLondon 2017 and I'm not up to extending any libraries at the moment! Maybe someone will find time to take this a little further. Cheers!

@ndanielsen

This comment has been minimized.

Show comment
Hide comment
@ndanielsen

ndanielsen Jun 21, 2017

Contributor

@ianozsvald thank you so much for your feature suggestions and the time that you've spent in writing up the detailed issues with examples. it is really fantastic and helpful. i'm sure that someone will move forward with these.

Contributor

ndanielsen commented Jun 21, 2017

@ianozsvald thank you so much for your feature suggestions and the time that you've spent in writing up the detailed issues with examples. it is really fantastic and helpful. i'm sure that someone will move forward with these.

@ianozsvald

This comment has been minimized.

Show comment
Hide comment
@ianozsvald

ianozsvald Jun 21, 2017

Contributor

Well, thank you all too for putting this library together, I've got a bunch of my own hacky viz tools but you've built something far more useful here. @rebeccabilbro's talk for us at the conference (and the book signing she joined me for) was ace :-)

Contributor

ianozsvald commented Jun 21, 2017

Well, thank you all too for putting this library together, I've got a bunch of my own hacky viz tools but you've built something far more useful here. @rebeccabilbro's talk for us at the conference (and the book signing she joined me for) was ace :-)

@rebeccabilbro

This comment has been minimized.

Show comment
Hide comment
@rebeccabilbro

rebeccabilbro Jun 21, 2017

Collaborator

Hey there @ianozsvald - thanks so much for all your work on PyData London - what a terrific conference! And thanks for checking out Yellowbrick -- I love this idea of plotting error distributions to make it easier to look for things like skew and heavy tails. As for how to best cope with subaxis locations, we might take a look at GridSpec, similar to what @pdamodaran did with JointPlot.

Collaborator

rebeccabilbro commented Jun 21, 2017

Hey there @ianozsvald - thanks so much for all your work on PyData London - what a terrific conference! And thanks for checking out Yellowbrick -- I love this idea of plotting error distributions to make it easier to look for things like skew and heavy tails. As for how to best cope with subaxis locations, we might take a look at GridSpec, similar to what @pdamodaran did with JointPlot.

@pdamodaran

This comment has been minimized.

Show comment
Hide comment
@pdamodaran

pdamodaran Jun 22, 2017

Contributor

I would be up for making the enhancement to the ResidualsPlot, do you guys think adding the histogram should be on the top of the main plot or the bottom?

Contributor

pdamodaran commented Jun 22, 2017

I would be up for making the enhancement to the ResidualsPlot, do you guys think adding the histogram should be on the top of the main plot or the bottom?

@bbengfort

This comment has been minimized.

Show comment
Hide comment
@bbengfort

bbengfort Jun 22, 2017

Member

@pdamodaran @ianozsvald my instinct actually says to put it on the right side, oriented vertically so that the histogram shares an access with the residuals (the y axis of the plot). In this orientation I think it might be easier to directly compare and would probably also have the effect of balancing the axes so that zero is in the middle, RE: the axes issues we had in #263 -- what do you guys think?

Member

bbengfort commented Jun 22, 2017

@pdamodaran @ianozsvald my instinct actually says to put it on the right side, oriented vertically so that the histogram shares an access with the residuals (the y axis of the plot). In this orientation I think it might be easier to directly compare and would probably also have the effect of balancing the axes so that zero is in the middle, RE: the axes issues we had in #263 -- what do you guys think?

@bbengfort

This comment has been minimized.

Show comment
Hide comment
@bbengfort

bbengfort Jun 15, 2018

Member

@ianozsvald sorry for the delay on this; we've been slammed - but I wanted to take a crack at this when you reminded me via the notebooks in the Slack channel the other day.

What do you think about this?

residuals_hist_annotation

It's an initial prototype, but I'm thinking we'll just make it an option right now (e.g. hist=True) and then this is displayed.

Member

bbengfort commented Jun 15, 2018

@ianozsvald sorry for the delay on this; we've been slammed - but I wanted to take a crack at this when you reminded me via the notebooks in the Slack channel the other day.

What do you think about this?

residuals_hist_annotation

It's an initial prototype, but I'm thinking we'll just make it an option right now (e.g. hist=True) and then this is displayed.

bbengfort added a commit to bbengfort/yellowbrick that referenced this issue Jun 15, 2018

Implements histogram alongside Residuals Plot
Adds a hist=True option to the ResidualsPlot visualizer, which plots the
histogram of the residuals as a barh on a vertical axes with sharey=True
to the right of the scatter plot axes. This allows the user an easier
method to identify the distribution of the errors (e.g. more positive or
more negative).

Note that matplotlib 2.0.2 or greater is required for this method.

Still to do:

- update documentation
- write matplotlib version tests

Fixes #264

@bbengfort bbengfort referenced this issue Jun 15, 2018

Merged

Implements histogram alongside ResidualsPlot #480

2 of 2 tasks complete
@ianozsvald

This comment has been minimized.

Show comment
Hide comment
@ianozsvald

ianozsvald Jun 17, 2018

Contributor

Brilliant! This feels more informative than mine. I defo prefer your orientation for the histogram and the larger size is clearly better. I wonder if a normalised histogram makes more sense (as the test set is likely to be smaller than the training set, which won't aid comparison)?

Contributor

ianozsvald commented Jun 17, 2018

Brilliant! This feels more informative than mine. I defo prefer your orientation for the histogram and the larger size is clearly better. I wonder if a normalised histogram makes more sense (as the test set is likely to be smaller than the training set, which won't aid comparison)?

@bbengfort

This comment has been minimized.

Show comment
Hide comment
@bbengfort

bbengfort Jun 18, 2018

Member

@ianozsvald I'm happy to add a normalization argument here - so instead of a frequency, use a PDF and stack training and test rather than overlay test on top of training?

Member

bbengfort commented Jun 18, 2018

@ianozsvald I'm happy to add a normalization argument here - so instead of a frequency, use a PDF and stack training and test rather than overlay test on top of training?

@bbengfort bbengfort closed this in #480 Jun 18, 2018

bbengfort added a commit that referenced this issue Jun 18, 2018

Implements histogram alongside ResidualsPlot (#480)
* Implements histogram alongside Residuals Plot

Adds a hist=True option to the ResidualsPlot visualizer, which plots the
histogram of the residuals as a barh on a vertical axes with sharey=True
to the right of the scatter plot axes. This allows the user an easier
method to identify the distribution of the errors (e.g. more positive or
more negative).

Note that matplotlib 2.0.2 or greater is required for this method.

Fixes #264
@ianozsvald

This comment has been minimized.

Show comment
Hide comment
@ianozsvald

ianozsvald Jun 18, 2018

Contributor

Overlaying test and train probably looks fine (given the PDF), that should be more comparable than having one stacked on the other? The PDF should look lovely regardless. I look forward to seeing it :-)

Contributor

ianozsvald commented Jun 18, 2018

Overlaying test and train probably looks fine (given the PDF), that should be more comparable than having one stacked on the other? The PDF should look lovely regardless. I look forward to seeing it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment