Add support for scikit-learn>=1.0 #3051

angela97lin · 2021-11-15T21:16:23Z

Opening up a branch, enabling scikit-learn v1.0, and seeing what breaks :)

codecov · 2021-11-15T21:20:42Z

Codecov Report

Merging #3051 (a8cea5f) into main (24779e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3051     +/-   ##
=======================================
+ Coverage   99.7%   99.8%   +0.1%     
=======================================
  Files        312     312             
  Lines      30344   30340      -4     
=======================================
- Hits       30252   30249      -3     
+ Misses        92      91      -1

Impacted Files	Coverage Δ
evalml/model_understanding/graphs.py	`100.0% <ø> (ø)`
...lml/tests/objective_tests/test_standard_metrics.py	`100.0% <ø> (ø)`
...lml/tests/model_understanding_tests/test_graphs.py	`100.0% <100.0%> (ø)`
...derstanding/prediction_explanations/_algorithms.py	`100.0% <0.0%> (+0.8%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24779e0...a8cea5f. Read the comment docs.

angela97lin · 2021-11-16T19:43:46Z

evalml/tests/objective_tests/test_standard_metrics.py

@@ -662,9 +662,8 @@ def test_mse_linear_model():
 def test_mcc_catches_warnings():
    y_true = [1, 0, 1, 1]
    y_predicted = [0, 0, 0, 0]
-    with pytest.warns(RuntimeWarning) as record:


Between 0.24.0 and 1.0.0+, the implementation was changed to just return a matthews_corrcoef score of 0.0 if the denominator was 0 (previously caused invalid divide by 0)

angela97lin · 2021-11-16T20:03:53Z

core-requirements.txt

@@ -2,7 +2,7 @@ numpy>=1.20.0
 numba==0.53
 pandas>=1.3.0
 scipy>=1.5.0
-scikit-learn>=0.24.0,<1.0
+scikit-learn>=0.24.0


Note:
scikit-learn >1.0 will require scikit-optimize>=0.9.0 because modules / objects get shuffled around, causing the following:

evalml/pipelines/components/estimators/classifiers/logistic_regression_classifier.py:4: in <module> from skopt.space import Real ../evalml_venv/lib/python3.7/site-packages/skopt/__init__.py:55: in <module> from .searchcv import BayesSearchCV ../evalml_venv/lib/python3.7/site-packages/skopt/searchcv.py:16: in <module> from sklearn.utils.fixes import MaskedArray E ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' (/Users/angela.lin/Desktop/evalml_venv/lib/python3.7/site-packages/sklearn/utils/fixes.py)```

Our tests, which automatically install the latest version of scikit-optimize (0.9.0) will thus not break, so maybe we don't need to set our requirement of scikit-optimize>=0.9.0 but open to discussion if it feels like we should.

How did you run into this?

My local environment uses scikit-optimize==0.8.1 so as soon as I tried using scikit-learn 1.0, things broke 🥲

I see. I'm wondering if this will affect users in some way/if we need to update the requirements file in light of this. Did you just install scikit-learn or did you do pip install -r requirements.txt ?

I just installed scikit-learn! Our tests didn't break because they call pip install -r requirements.txt and in a clean environment, 0.9.0 will be installed.

This is a known issue in the scikit-optimize repo where it seems like the community solution is to either use scikit-learn<=0.24.2 or use scikit-optimize>=0.9.0: scikit-optimize/scikit-optimize#569

Cool, makes sense to me!

Would it be simpler to make the core requirements use scikit-learn >= 1.0.0 versus having support for both but test coverage for the later version?

Hmm, I can be persuaded otherwise but I think this aligns with what we've currently been doing aka every time our dependencies come out with a new version, we pin until we can fix the tests to support the latest version--and thats the version that we run on!

angela97lin · 2021-11-16T20:05:03Z

evalml/tests/model_understanding_tests/test_graphs.py

@@ -1403,7 +1403,7 @@ def test_t_sne_errors_marker_size(marker_size, has_minimal_dependencies):

 @pytest.mark.parametrize("data_type", ["np", "pd", "ww"])
 @pytest.mark.parametrize("perplexity", [0, 4.6, 100])
-@pytest.mark.parametrize("learning_rate", [100.0, -15, 0])


Negative and zero values are no longer accepted: https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/manifold/_t_sne.py#L816

freddyaboulton

@angela97lin This looks good to me!

bchen1116

LGTM! Is there a reason we chose to support >=0.24 versus >=1.0.0?

bchen1116 · 2021-11-17T15:35:19Z

core-requirements.txt

@@ -2,7 +2,7 @@ numpy>=1.20.0
 numba==0.53
 pandas>=1.3.0
 scipy>=1.5.0
-scikit-learn>=0.24.0,<1.0
+scikit-learn>=0.24.0


Would it be simpler to make the core requirements use scikit-learn >= 1.0.0 versus having support for both but test coverage for the later version?

evalml/tests/objective_tests/test_standard_metrics.py

init

05924f1

angela97lin self-assigned this Nov 15, 2021

clean up tests

fae0d09

angela97lin commented Nov 16, 2021

View reviewed changes

angela97lin added 2 commits November 16, 2021 15:07

clean up other tests

b6a26fa

fix conda recipe

8f24522

angela97lin changed the title ~~[SPIKE] Support scikit-learn 1.0~~ Add support for scikit-learn>=1.0 Nov 16, 2021

angela97lin marked this pull request as ready for review November 16, 2021 22:00

angela97lin requested review from freddyaboulton, bchen1116, chukarsten, dsherry, eccabay and jeremyliweishih November 16, 2021 22:01

freddyaboulton approved these changes Nov 16, 2021

View reviewed changes

Merge branch 'main' into 2843_scikit_learn_1.0

f6fe2f7

bchen1116 approved these changes Nov 17, 2021

View reviewed changes

angela97lin added 3 commits November 17, 2021 13:57

clean up comment and import

e04fc37

Merge branch 'main' into 2843_scikit_learn_1.0

5070894

Merge branch 'main' into 2843_scikit_learn_1.0

a8cea5f

angela97lin merged commit d6682a4 into main Nov 17, 2021

angela97lin deleted the 2843_scikit_learn_1.0 branch November 17, 2021 19:52

angela97lin mentioned this pull request Nov 18, 2021

Support scikit-learn 1.0.0 #2843

Closed

chukarsten mentioned this pull request Nov 29, 2021

Release v.0.38.0 #3102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for scikit-learn>=1.0 #3051

Add support for scikit-learn>=1.0 #3051

angela97lin commented Nov 15, 2021

codecov bot commented Nov 15, 2021 •

edited

Loading

angela97lin Nov 16, 2021

angela97lin Nov 16, 2021 •

edited

Loading

freddyaboulton Nov 16, 2021

angela97lin Nov 17, 2021

freddyaboulton Nov 17, 2021 •

edited

Loading

angela97lin Nov 17, 2021

freddyaboulton Nov 17, 2021

bchen1116 Nov 17, 2021

angela97lin Nov 17, 2021

angela97lin Nov 16, 2021

freddyaboulton left a comment

bchen1116 left a comment

bchen1116 Nov 17, 2021

Add support for scikit-learn>=1.0 #3051

Add support for scikit-learn>=1.0 #3051

Conversation

angela97lin commented Nov 15, 2021

codecov bot commented Nov 15, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

angela97lin Nov 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 15, 2021 •

edited

Loading

angela97lin Nov 16, 2021 •

edited

Loading

freddyaboulton Nov 17, 2021 •

edited

Loading