Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AutoML to use objective decision function during scoring for custom objectives #1934

Merged
merged 27 commits into from Mar 17, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Mar 5, 2021

Closes #1868

  • Added BinaryClassificationPipelineMixin to share logic between binary classification / time series binary classification pipelines regarding the pipeline threshold.
  • Moved optimize_threshold from ClassificationPipeline to BinaryClassificationPipelineMixin. I believe this makes sense since only binary classification pipelines, not multiclass classification pipelines, know about thresholds.

@angela97lin angela97lin self-assigned this Mar 5, 2021
@codecov
Copy link

codecov bot commented Mar 5, 2021

Codecov Report

Merging #1934 (cf4a911) into main (11d1cff) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1934     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         273      274      +1     
  Lines       22381    22418     +37     
=========================================
+ Hits        22375    22412     +37     
  Misses          6        6             
Impacted Files Coverage Δ
evalml/pipelines/classification_pipeline.py 100.0% <ø> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <ø> (ø)
evalml/pipelines/binary_classification_pipeline.py 100.0% <100.0%> (ø)
.../pipelines/binary_classification_pipeline_mixin.py 100.0% <100.0%> (ø)
evalml/pipelines/pipeline_base.py 100.0% <100.0%> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
...ation_pipeline_tests/test_binary_classification.py 100.0% <100.0%> (ø)
...peline_tests/test_time_series_baseline_pipeline.py 100.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11d1cff...cf4a911. Read the comment docs.

predictions = self._predict_with_objective(X, ypred_proba, objective)
return infer_feature_types(predictions)

def _predict_with_objective(self, X, ypred_proba, objective):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper function that takes in predict_proba, so we don't have to recalculate it for each objective.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a question for my understanding, but implementation looks good to me

evalml/pipelines/binary_classification_pipeline.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's something funky with the double anys on L77 of binary_classification_pipeline.py. Besides that, this looks pretty solid.

evalml/pipelines/binary_classification_pipeline.py Outdated Show resolved Hide resolved
evalml/pipelines/binary_classification_pipeline.py Outdated Show resolved Hide resolved
evalml/pipelines/binary_classification_pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin This looks good to me! I left some scattered comments throughout. One thing that would be nice to address before merge is adding a time series test to test_binary_predict_pipeline_use_objective

problem_type = ProblemTypes.BINARY

@property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all of this to BinaryClassificationPipelineMixin

@patch('evalml.pipelines.MulticlassClassificationPipeline.fit')
@patch('evalml.pipelines.MulticlassClassificationPipeline.score')
@patch('evalml.pipelines.MulticlassClassificationPipeline.predict')
def test_pipeline_thresholding_errors(mock_multi_predict, mock_multi_score, mock_multi_fit,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed multiclass tests and moved to test_binary_classification_pipelines.py

@angela97lin angela97lin merged commit cf93929 into main Mar 17, 2021
@angela97lin angela97lin deleted the 1868_objective branch March 17, 2021 21:08
@dsherry dsherry mentioned this pull request Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automl does not use objective decision function during scoring
4 participants