Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bin class pipeline: use predictions for "true" class in score #798

Merged
merged 2 commits into from May 22, 2020

Conversation

dsherry
Copy link
Collaborator

@dsherry dsherry commented May 22, 2020

Fixes #797, a bug introduced in #787 . The problem is that binary classification pipelines are no longer taking the "true" class from the predicted probs and passing that into the score math.

@dsherry dsherry requested review from kmax12 and angela97lin May 22, 2020
def test_score_auc(X_y, lr_pipeline):
X, y = X_y
lr_pipeline.fit(X, y)
lr_pipeline.score(X, y, ['auc'])
Copy link
Collaborator Author

@dsherry dsherry May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my reproducer. I will expand on this coverage before merging.

Copy link
Collaborator Author

@dsherry dsherry May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what, I'd like to merge this now to unblock master and then get another PR up with more coverage later today.

@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #798 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #798   +/-   ##
=======================================
  Coverage   99.51%   99.51%           
=======================================
  Files         150      150           
  Lines        5718     5727    +9     
=======================================
+ Hits         5690     5699    +9     
  Misses         28       28           
Impacted Files Coverage Δ
evalml/pipelines/binary_classification_pipeline.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 99.74% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b1024a...c226873. Read the comment docs.

"""
if predictions.ndim > 1:
predictions = predictions[:, 1]
return ClassificationPipeline._score(X, y, predictions, objective)
Copy link
Collaborator Author

@dsherry dsherry May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin @kmax12 what do you think of this solution?

Pros: it fixes the bug. And it keeps the binary-classification-specific code in the binary classification pipeline definition.
Cons: there may be a cleaner way to organize this. For example, we do the same indexing in BinaryClassificationPipeline.predict above, and ideally perhaps we'd have one method for computing this. But idk if its worth messing with that right now.

Copy link
Contributor

@angela97lin angela97lin May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer this! It makes more sense to me to do it here since after all, we just need this indexing for score, so predict shouldn't need to handle it.

Copy link
Contributor

@kmax12 kmax12 May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better for this to be PipelineBase._score since that is where it is actually defined?

Copy link
Contributor

@kmax12 kmax12 May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this organization is fine. my only thought is that ClassificationPipeline._score should be a utility rather than a static method. it just feels off

or maybe even define a ObjectiveBase.safe_score method that has this behavior

Copy link
Collaborator Author

@dsherry dsherry May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @angela97lin @kmax12 !

@kmax12 , I agree this doesn't feel ideal yet. And yes, perhaps moving this functionality to a util or to the objectives would make more sense, I like those ideas.

I'll plan to update the tests and merge this fix, and then we can circle back and put something better in place later. This doesn't alter our public API so we have flexibility.

@dsherry dsherry merged commit 7786dd2 into master May 22, 2020
2 checks passed
@dsherry dsherry deleted the ds_797_fix_auc_score branch May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AUC score fails during automl
3 participants