Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Q-Q plot to the ResidualsPlot (#853) #1042

Merged
merged 10 commits into from
Apr 6, 2020

Conversation

VladSkripniuk
Copy link
Contributor

This PR fixes #853 which requested a feature to allow the user to create Q-Q plot beside the residual plot as a mean of normality checking of the residuals.

I have made the following changes:

  1. Plot Q-Q plot using scipy.stats.probplot
  2. Add tests.

Sample Code and Plot

from yellowbrick.regressor.residuals import ResidualsPlot
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from yellowbrick.datasets import load_concrete

X, y = load_concrete()

X_train, X_test, y_train, y_test = train_test_split(X, y)
viz = ResidualsPlot(LinearRegression(), hist=False, qqplot=True)

viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.show()

image

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

Adds functionality to plot Q-Q plot side by side with residuals plot and verifies that it works as expected
Copy link
Contributor

@lwgray lwgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this PR. I had a question about your use of the probplot from scipy and whether it is valid for creating a Q-Q plot. Can you give your reason as to why you choose probplot

yellowbrick/regressor/residuals.py Show resolved Hide resolved
yellowbrick/regressor/residuals.py Show resolved Hide resolved
yellowbrick/regressor/residuals.py Show resolved Hide resolved
yellowbrick/regressor/residuals.py Show resolved Hide resolved
yellowbrick/regressor/residuals.py Show resolved Hide resolved
yellowbrick/regressor/residuals.py Show resolved Hide resolved
tests/test_regressor/test_residuals.py Show resolved Hide resolved
tests/test_regressor/test_residuals.py Show resolved Hide resolved
@lwgray
Copy link
Contributor

lwgray commented Mar 14, 2020

@VladSkripniuk For each visualizer there is a "Quick Method". It allows for the Visualizer to be called on a single line. Since you modified the ResidualsPlot Class you will need to alter the quick method also. The quickmethod begins as Line 766

Tasks

  1. Add the qqplot parameter Lines 771 to 786
  2. Add qqplot to the instantiation of ResidualsPlot Lines 878 to 889
  3. Add qqplot to the docstring

@lwgray
Copy link
Contributor

lwgray commented Mar 14, 2020

@VladSkripniuk Another, note... Your branch is out-of-date with the base branch... Please Update this branch before pushing any changes

Copy link
Contributor

@lwgray lwgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladSkripniuk Great job on this PR. There are just a couple of things that need to be done before I can test and merge this PR.

  1. Update Quick method - See comments on main PR page
  2. Update Documentation to showcase Q-Q Plot here

yellowbrick/regressor/residuals.py Show resolved Hide resolved
@VladSkripniuk
Copy link
Contributor Author

@lwgray thank you for your help with this PR!

@@ -719,6 +776,7 @@ def residuals_plot(
y_test=None,
ax=None,
hist=True,
qqplot=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this addition!

@@ -772,6 +830,12 @@ def residuals_plot(
If set to 'density', the probability density function will be plotted.
If set to True or 'frequency' then the frequency will be plotted.

qqplot : {True, False}, default: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this here. We do our best to keep our docstring up-to-date because this is how our docs are generated.

@@ -52,6 +52,19 @@ Note that if the histogram is not desired, it can be turned off with the ``hist=

.. warning:: The histogram on the residuals plot requires matplotlib 2.0.2 or greater. If you are using an earlier version of matplotlib, simply set the ``hist=False`` flag so that the histogram is not drawn.

Histogram can be replaced with a Q-Q plot, which is a common way to check that residuals are normally distributed. If the residuals are normally distributed, then their quantiles when plotted against quantiles of normal distribution should form a straight line. The example below shows, how Q-Q plot can be drawn with a ``qqplot=True`` flag. Notice that ``hist`` has to be set to ``False`` in this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great explanation. Awesome on notifying the user that hist has to be set to False

@lwgray
Copy link
Contributor

lwgray commented Mar 15, 2020

Your PR is currently blocked because of PR #1050 which resolves some bugs in the test but yours will be merged once #1050 is merged

@lwgray
Copy link
Contributor

lwgray commented Apr 4, 2020

We fixed the test failures.... i will do a final review and if test past then I will merge this PR. I have updated your branch so do a pull before you make anymore commits

@lwgray
Copy link
Contributor

lwgray commented Apr 4, 2020

@bbengfort or @rebeccabilbro Travis has Image Comparison failures for the Q-Q plot related tests although they are passing locally. I had a couple of questions:

  1. The local test for Q-Q plot xfail. Is it necessary to have the pytest xfail decorator with the Q-Qplot test ?

  2. Would an appropriate fix be to clear the actual & base images, run residual plots tests, then move the actual images to the baseline images (ie)

python -m tests.images -C tests/test_regressor/test_residuals.py
pytest -vs tests/test_regressor/test_residuals.py
python -m tests.images tests/test_regressor/test_residuals.py


visualizer.fit(self.data.X.train, self.data.y.train)
visualizer.score(self.data.X.test, self.data.y.test)
visualizer.finalize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwgray I recommend removing this call to finalize and seeing if that helps fix the tests on Travis - there is a known problem that the image similarity tests aren't removing text from the alternate axes (as you can see in the baseline image) - and text is one of the primary reasons we have a divergence between AppVeyor and Travis.

Were the baseline images created on a Windows machine or Mac/Linux? That might also contribute to this; we have an xfail for windows or conda, but if the image were generated on Windows than it would fail anyway on Travis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladSkripniuk can you remove the call to finalize as described in the above comment from @bbengfort ?
I updated your branch so make sure you pull before you commit and new changes.
Also, did you create the baseline images on Windows or Mac?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwgray Sure, I just removed the call to finalize. I created tests/baseline_images/test_regressor/test_residuals/test_residuals_plot_QQ_plot.png in commit 71f83bb using Ubuntu 18.04.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @VladSkripniuk your baseline image is out-of-date. We have since merged fixes issues we were having with matplotlib, so you will need to recreate your baseline images. You can perform the following commands to reset your baseline images.

python -m tests.images -C tests/test_regressor/test_residuals.py
pytest -vs tests/test_regressor/test_residuals.py
python -m tests.images tests/test_regressor/test_residuals.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwgray I executed the commands you mentioned:

python -m tests.images -C tests/test_regressor/test_residuals.py
pytest -vs tests/test_regressor/test_residuals.py
python -m tests.images tests/test_regressor/test_residuals.py

and ran python -m pytest tests/test_regressor/test_residuals.py and tests passed successfully on my machine. I upload the updated version of tests/baseline_images/test_regressor/test_residuals/test_residuals_plot_QQ_plot.png

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladSkripniuk great call on upgrading mpl to 3.2.1 - I forgot to tell you to do that.

@lwgray lwgray merged commit bc2c316 into DistrictDataLabs:develop Apr 6, 2020
@lwgray
Copy link
Contributor

lwgray commented Apr 6, 2020

@VladSkripniuk Thank you for your patience and your contribution. I will start working on your other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Q-Q plot to the yellowbrick.regressor.residuals class
3 participants