Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for refitting pipelines on entire training data #876

Merged
merged 18 commits into from Jun 30, 2020

Conversation

dsherry
Copy link
Collaborator

@dsherry dsherry commented Jun 24, 2020

Fix #719

Changes

  • Update AutoSearchBase.get_pipeline to return an untrained pipeline, not the pipeline trained on the final CV fold
  • Refactored automl search tests. Moved most into test_autobase.py, parametrized many to run on all problem types, deleted some duplicates.
  • Save binary classification threshold from each cv fold in the results dict, for convenience/debugging.
  • Update AutoSearchBase._evaluate to leave the inputted pipeline unmodified, and use the new clone method to create copies to train with.

Future
After this PR, we should circle back and clean up the way AutoSearchBase passes pipelines and parameters to and from the AutoMLAlgorithm class. Currently that code returns a pipeline instance holding parameters, which is passed to _evaluate. This may no longer be necessary; _evaluate and _add_result no longer require a trained pipeline, just the pipeline class and the parameters.

@codecov
Copy link

codecov bot commented Jun 24, 2020

Codecov Report

Merging #876 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #876    +/-   ##
========================================
  Coverage   99.78%   99.79%            
========================================
  Files         195      195            
  Lines        8966     9085   +119     
========================================
+ Hits         8947     9066   +119     
  Misses         19       19            
Impacted Files Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py 100.00% <100.00%> (ø)
evalml/automl/automl_search.py 98.75% <100.00%> (+0.01%) ⬆️
evalml/exceptions/__init__.py 100.00% <100.00%> (ø)
evalml/exceptions/exceptions.py 100.00% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.00% <100.00%> (ø)
evalml/tests/objective_tests/test_lead_scoring.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 99.82% <100.00%> (+<0.01%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf6ad68...a10584d. Read the comment docs.

docs/source/changelog.rst Outdated Show resolved Hide resolved
@dsherry dsherry marked this pull request as ready for review June 24, 2020 22:34
@dsherry
Copy link
Collaborator Author

dsherry commented Jun 24, 2020

What's missing right now: in order to fix codecov I have to add more test coverage of PipelineBase.describe_pipeline and a few minor spots in AutoSearchBase. I will do so tomorrow!

@@ -388,17 +389,18 @@ def _evaluate(self, pipeline, X, y, raise_errors=True, pbar=None):

if self.optimize_thresholds and self.objective.problem_type == ProblemTypes.BINARY and self.objective.can_optimize_threshold:
X_train, X_threshold_tuning, y_train, y_threshold_tuning = train_test_split(X_train, y_train, test_size=0.2, random_state=self.random_state)
pipeline.fit(X_train, y_train)
cv_pipeline = pipeline.clone()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay here's clone in action! 🎊😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because there's no need to modify the inputted pipeline here. Once this PR is merged we should update _evaluate to make this clear


evaluation_entry = {"all_objective_scores": ordered_scores, "score": score}
if isinstance(cv_pipeline, BinaryClassificationPipeline) and cv_pipeline.threshold is not None:
evaluation_entry['binary_classification_threshold'] = cv_pipeline.threshold
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be helpful to save the binary classification thresholds learned in each CV fold, for reference and debugging.

Copy link
Collaborator Author

@dsherry dsherry Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were able to access the threshold value from the final CV fold via get_pipeline; but this was not correct behavior. The threshold should be (optionally) relearned when the model is retrained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an evaluation_entry that may or may not have evaluation_entry['binary_classification_threshold']? Is this favorable over always having a evaluation_entry['binary_classification_threshold'] entry, but that value could either be None or a numerical value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin yep your idea is simpler, will do that 😂 thanks

parameters = pipeline_results.get('parameters')
if pipeline_class is None or parameters is None:
raise PipelineNotFoundError("Pipeline class or parameters not found in automl results")
return pipeline_class(parameters)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to pass in random_state here. Will do so.

@@ -53,6 +53,9 @@ def test_convert_to_seconds():
assert convert_to_seconds("10 hour") == 36000
assert convert_to_seconds("10 hours") == 36000

with pytest.raises(AssertionError, match="Invalid unit."):
convert_to_seconds("10 years")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this coverage over from the automl tests

* Update `AutoSearchBase.get_pipelines` to return an untrained pipeline instance :pr:`876`
* Save learned binary classification thresholds in automl results cv data dict :pr:`876`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah let's make these past tense RE our convention (might be worth revisiting this convention / considering if something else is more natural since I've seen this across the team? 🤔)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Yeah, oops!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this to our contribution guidelines because I did not know of this convention!


evaluation_entry = {"all_objective_scores": ordered_scores, "score": score}
if isinstance(cv_pipeline, BinaryClassificationPipeline) and cv_pipeline.threshold is not None:
evaluation_entry['binary_classification_threshold'] = cv_pipeline.threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an evaluation_entry that may or may not have evaluation_entry['binary_classification_threshold']? Is this favorable over always having a evaluation_entry['binary_classification_threshold'] entry, but that value could either be None or a numerical value?

@@ -569,6 +575,10 @@ def full_rankings(self):

@property
def best_pipeline(self):
"""Returns the best model found"""
"""Returns an untrained instance of the best pipeline and parameters found during automl search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

assert not automl.optimize_thresholds


def test_init_default_binary():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these tests here as opposed to the test_automl_search_* classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move them back

assert not automl.optimize_thresholds


def test_init_custom_binary(dummy_binary_pipeline_class):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These init tests repeat a lot of code. Might be worth refactoring using parametrize?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry The implementation looks good to me - I just left some minor comments that don't need to be addressed in this PR.

@dsherry dsherry merged commit 704fe03 into master Jun 30, 2020
2 checks passed
@dsherry dsherry deleted the ds_719_automl_refit branch June 30, 2020 13:24
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automl: API to refit pipelines on entire training data
4 participants