Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating scikit-learn and scikit-optimize to latest version #1141

Merged
merged 5 commits into from Sep 22, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Sep 4, 2020

Pull Request Description

Fixes #1121 . With the release of scikit-optimize 0.8.0, we can update to scikit-learn version >= 0.23 (Release Notes).

The only catch is that now SKOptTuner can't propose a new pipeline if the pipeline has no parameters. I had to add dummy parameters to our estimator fixtures to address this.

We may want to add a check to AutoML that verifies the pipelines specified in allowed_pipelines have parameters. This can be done in #454 or in a separate issue.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Sep 4, 2020

Codecov Report

Merging #1141 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1141   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      196           
  Lines       11999    12006    +7     
=======================================
+ Hits        11990    11997    +7     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/tests/conftest.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc53514...478a23b. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review Sep 4, 2020
@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Sep 4, 2020

Performance tests forthcoming

@freddyaboulton freddyaboulton force-pushed the update-sklearn-skopt-dependencies branch from d796b4d to 2160637 Compare Sep 4, 2020
@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Sep 4, 2020

Performance Test Results here. Scores stayed the same but the fit times have increased slightly. I think it's still ok to merge this.

@freddyaboulton freddyaboulton force-pushed the update-sklearn-skopt-dependencies branch from 2160637 to a96b34a Compare Sep 9, 2020
Copy link
Contributor

@bchen1116 bchen1116 left a comment

LGTM!

@freddyaboulton freddyaboulton force-pushed the update-sklearn-skopt-dependencies branch 6 times, most recently from 70dac87 to 384a765 Compare Sep 16, 2020
Copy link
Collaborator

@dsherry dsherry left a comment

Thank you for tackling this!!

I left a question about the tests but that's all :)

scikit-learn>=0.21.3,!=0.22,<0.23.0
scikit-optimize>=0.7,<=0.7.4
scikit-learn>=0.23
scikit-optimize>=0.8
Copy link
Collaborator

@dsherry dsherry Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a relief! 😁 This was getting ugly.

@@ -199,10 +200,11 @@ class MockRegressor(Estimator):
name = "Mock Regressor"
model_family = ModelFamily.NONE
supported_problem_types = [ProblemTypes.REGRESSION]
hyperparameter_ranges = {}
hyperparameter_ranges = {'a': Integer(0, 10),
'b': Real(0, 10)}
Copy link
Collaborator

@dsherry dsherry Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I have the same question as @bchen1116 . You mentioned

the latest version of scikit-optimize wont optimize a pipeline without parameters

So which test(s) does setting these ranges affect? I don't think I follow yet.

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_automl_search_classification.py and test_automl_search_regression.py we pass in dummy_regression_pipeline_class, dummy_binary_pipeline_class, or dummy_multiclass_pipeline_class as the allowed_pipelines parameter to AutoMLSearch. Since these pipelines don't have parameters (because the estimator doesn't have any parameters) the SKOpt tuner can't propose any pipelines and the search fails. Since we mock the pipeline fit and score methods, adding these parameters doesn't change the behavior of the tests.

The other option is to mock SKOptTuner.propose in all tests that would fail but that seemed like a bigger change.

@freddyaboulton freddyaboulton force-pushed the update-sklearn-skopt-dependencies branch 2 times, most recently from e0eba46 to db9ad97 Compare Sep 22, 2020
@freddyaboulton freddyaboulton force-pushed the update-sklearn-skopt-dependencies branch from db9ad97 to 478a23b Compare Sep 22, 2020
@freddyaboulton freddyaboulton merged commit 0bc791f into main Sep 22, 2020
@angela97lin angela97lin mentioned this pull request Sep 29, 2020
@freddyaboulton freddyaboulton deleted the update-sklearn-skopt-dependencies branch Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update our scikit-learn version requirements
3 participants