New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocauc refactor #533

Merged
merged 39 commits into from Aug 1, 2018

Conversation

Projects
None yet
2 participants
@rebeccabilbro
Copy link
Collaborator

rebeccabilbro commented Jul 30, 2018

There is currently a bug in ROCAUC that triggers an error when a user attempts to use the visualizer for binary classification with a scikit-learn estimator that has only a decision_function and not also a predict_proba. Estimators with both are fine, and estimators with only predict_proba also seem to be fine. Multi-class classification also seems to work without error regardless of estimator. More documentation of the bug in this (notebook).

It seems like the bug occurs because the default params for ROCAUC are micro=True, macro=True, and per_class=True, however, in the case of an estimator with only a decision function being used for a binary classification problem, there will only be enough information to plot a single curve. Note: estimators that also have a predict_proba are safe because of the order in which the attrs are checked in the _get_y_scores function.

This PR attempts to address the bug by:

  • in score, checking if y_pred is of shape (n_samples, ), and if so, setting self._binary_decision to True
  • raising an error on score if self._binary_decision is True and micro, macro, and per_class are all also set to True (Note: I also moved the no curves will be drawn check inside the self._binary_decision = False logic inside score — it was originally in __init__)
  • in score, if self._binary_decision, computing the ROC curve and ROC area for the single class
  • in draw, if self._binary_decision, drawing the ROC curve and ROC area for the single class

Let me know what you think and if you have any suggestions!

Rebecca Bilbro and others added some commits Jun 12, 2018

Rebecca Bilbro
encountering some issues when trying to use the size parameter to set…
… the size of the figure on instantiation of the visualizer
fixes broken decision function test, switches to using datasets from …
…conftest, renames binary decision function test case and adds coverage for multiclass decision function test case
fixes xfailed test for ROCAUC without per-class curves, switching to …
…dataset from conftest, and replacing baseline image accordingly
fixes xfailed test for ROCAUC multiclass classification, switches to …
…multiclass dataset from conftest, and replaces baseline image accordingly
updates test for ROCAUC with classifiers that have no scoring method …
…to use binary classification dataset from conftest
adds increased Windows tolerance to rocauc image comparisons tests fo…
…r AppVeyor to resolve image not close errors

@bbengfort bbengfort added the review label Jul 30, 2018

@bbengfort
Copy link
Member

bbengfort left a comment

I think this is an excellent solution (it took me a few minutes to wrap my head around it) but it makes sense to me, and I think that it clearly communicates to the user what needs to be done in the case you discovered! Thank you also for adding the three tests for each of the three possibilities for methods having these scoring functions!

I had a few comments for you to review, but I think once we get the CI tests passing we can go ahead and merge this in.

We should probably also create an issue to document this situation in the ROCAUC docs, and follow up on that while its fresh in our heads.

Thank you so much @rebeccabilbro for taking on this challenging bug and sticking with it until you found a solution!

self.assertEqual(len(visualizer.roc_auc.keys()), 1)

# Compare the images
visualizer.poof()

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

The test is failing on CI with an RMSE of 6.8 - I'm wondering if it's a font issue; removing the call to poof() simplifies the test and may get it to pass?

This comment has been minimized.

@rebeccabilbro

rebeccabilbro Jul 31, 2018

Collaborator

Ah ok, I removed poof and it seems to be working, at least on my machine. Guess we'll see what happens when I push ¯_(ツ)_/¯

ROCAUC(
LogisticRegression(), per_class=False, macro=False, micro=False
)
visualizer.score(self.binary.X.test, self.binary.y.test)

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Should we create a test that checks that an exception is raised when self._binary_decision is True and micro, macro, and per_class are all also set to True?

This comment has been minimized.

@rebeccabilbro

rebeccabilbro Jul 31, 2018

Collaborator

Great idea - I actually just added three to check for each possible combo, just in case.

@@ -88,6 +88,60 @@ def test_binary_rocauc(self):
visualizer.poof()
self.assert_images_similar(visualizer)

def test_binary_probability_decision(self):

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Stellar test cases for the models that have various combinations of decision_function and predict_proba! Very nice work!

This comment has been minimized.

@rebeccabilbro

rebeccabilbro Jul 31, 2018

Collaborator

Thanks!

positives and false positives across all classes.
positives and false positives across all classes. Micro is not defined
for binary classification problems with estimators with only a
decision_function method.

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Thank you for updating the doc string!

# # Classes may be label encoded so only use what's in y to compute.
# # The self.classes_ attribute will be used as names for labels.
# Check to see if shape of predictions signals a binary decision
if len(y_pred.shape) == 1:

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

I was a bit confused about this for a while, I didn't understand how the dimensionality of y_pred determined if the decision was binary or not. I had to realize that _get_y_scores is calling either the decision_function or predict_proba, which should return a 2D array.

Would it be possible to explain this a bit more in the comment above for future us who are also likely to forget as I did?

Also, note that len(y_pred.shape) == 1 is equivalent to y_pred.ndim == 1, which may also help explain what we're checking in this condition.

This comment has been minimized.

@rebeccabilbro

rebeccabilbro Jul 31, 2018

Collaborator

Yep, it took a bit of work to figure this out for me, too -- I just added a (maybe too lengthy) description of what's happening here in a comment inside the code. Hopefully that helps others and future us! Also switched to ndim.

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Not too lengthy, I think it was very descriptive, thank you!

# Raise an error if it's a binary decision and user has set micro,
# macro, or per_class to True
if self.micro or self.macro or self.per_class:
raise ModelError(

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

I love this exception message - it's not that there is a bug in ROCAUC, it's just that there are a set of conditions when the defaults don't work - this error explains how to fix it when that case is encountered and gives our users a clear path to success!

# If it's not a binary decision, at least one of micro, macro, or
# per_class must be True
if not self.micro and not self.macro and not self.per_class:
raise YellowbrickValueError(

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Absolutely the right place to move this to.

positives and false positives across all classes.
positives and false positives across all classes. Micro is not defined
for binary classification problems with estimators with only a
decision_function method.

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Good catch updating the quick method docs as well!

)
)

# If per-class plotting is requested, plot ROC curves for each class
if self.per_class:

This comment has been minimized.

@bbengfort

bbengfort Jul 31, 2018

Member

Just a quick check to make sure I'm following along - if self._binary_decision is True, then an exception will already have been raised if per_class, micro, or macro is true, which is why we can short-circuit the conditionals. I can't see the entirety of the function in the review; but if self._binary_decision is True can we return from there, or is there work at the bottom of the function after the conditionals that needs to be done?

This comment has been minimized.

@rebeccabilbro

rebeccabilbro Jul 31, 2018

Collaborator

There's a bit more logic that happens in the function, since we still have to check to make sure if it's not binary that at least one of the other curve options has been specified as True (there's another test for this). Here's the whole thing:

        if y_pred.ndim == 1:
            self._binary_decision = True

            # Raise an error if it's a binary decision and user has set micro,
            # macro, or per_class to True
            if self.micro or self.macro or self.per_class:
                raise ModelError(
                    "Micro, macro, and per-class scores are not defined for "
                    "binary classification for estimators with only "
                    "decision_function methods; set micro, macro, and "
                    "per-class params to False."
                )
        else:
            self._binary_decision = False
            # If it's not a binary decision, at least one of micro, macro, or
            # per_class must be True
            if not self.micro and not self.macro and not self.per_class:
                raise YellowbrickValueError(
                    "no curves will be drawn; specify micro, macro, or per_class"
                )

rebeccabilbro added some commits Jul 31, 2018

updates rocauc tests to make sure in the case of a binary decision, m…
…acro, micro, and per_class True raises an error. Also removes poof call to reduce image difference error and updates baseline image for test_binary_decision
@rebeccabilbro

This comment has been minimized.

Copy link
Collaborator

rebeccabilbro commented Jul 31, 2018

@bbengfort - ok, just pushed all those updates. One question on your suggestion to add a new issue for us to capture the binary decision case in the YB docs; #252 does have a bullet for docs, and this current PR won't close the issue because there are still a couple of todos associated with the issue. I think instead of opening a new issue, once this is merged in, I'll just open a new PR to wrap up the remaining stuff from the issue. Sound good?

@bbengfort

This comment has been minimized.

Copy link
Member

bbengfort commented Jul 31, 2018

@rebeccabilbro yes, opening a new PR to wrap up the remaining stuff sounds good to me; I'm going to clone your branch locally to see if I can help repair the image comparison test; not sure why it's still failing. Give me a few minutes and a whiskey!

@bbengfort

This comment has been minimized.

Copy link
Member

bbengfort commented Aug 1, 2018

@rebeccabilbro no joy - I tried all my tricks ... no idea why the tolerance is off. I just increased the tolerance, which should pass the tests. There is really nothing we can do until we have some way of seeing the diffs from Travis and AppVeyor, RE #379

@bbengfort

This comment has been minimized.

Copy link
Member

bbengfort commented Aug 1, 2018

@rebeccabilbro all the tests pass; am going to go ahead and merge -- thanks for working on this!

@bbengfort bbengfort merged commit 605b3e8 into DistrictDataLabs:develop Aug 1, 2018

4 checks passed

LGTM analysis: Python No alert changes
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.06%) to 79.596%
Details

@bbengfort bbengfort removed the review label Aug 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment