Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scikit-learn>=1.0 #3051

Merged
merged 8 commits into from
Nov 17, 2021
Merged

Add support for scikit-learn>=1.0 #3051

merged 8 commits into from
Nov 17, 2021

Conversation

angela97lin
Copy link
Contributor

Opening up a branch, enabling scikit-learn v1.0, and seeing what breaks :)

@angela97lin angela97lin self-assigned this Nov 15, 2021
@codecov
Copy link

codecov bot commented Nov 15, 2021

Codecov Report

Merging #3051 (a8cea5f) into main (24779e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3051     +/-   ##
=======================================
+ Coverage   99.7%   99.8%   +0.1%     
=======================================
  Files        312     312             
  Lines      30344   30340      -4     
=======================================
- Hits       30252   30249      -3     
+ Misses        92      91      -1     
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 100.0% <ø> (ø)
...lml/tests/objective_tests/test_standard_metrics.py 100.0% <ø> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)
...derstanding/prediction_explanations/_algorithms.py 100.0% <0.0%> (+0.8%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24779e0...a8cea5f. Read the comment docs.

@@ -662,9 +662,8 @@ def test_mse_linear_model():
def test_mcc_catches_warnings():
y_true = [1, 0, 1, 1]
y_predicted = [0, 0, 0, 0]
with pytest.warns(RuntimeWarning) as record:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between 0.24.0 and 1.0.0+, the implementation was changed to just return a matthews_corrcoef score of 0.0 if the denominator was 0 (previously caused invalid divide by 0)

@@ -2,7 +2,7 @@ numpy>=1.20.0
numba==0.53
pandas>=1.3.0
scipy>=1.5.0
scikit-learn>=0.24.0,<1.0
scikit-learn>=0.24.0
Copy link
Contributor Author

@angela97lin angela97lin Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:
scikit-learn >1.0 will require scikit-optimize>=0.9.0 because modules / objects get shuffled around, causing the following:

evalml/pipelines/components/estimators/classifiers/logistic_regression_classifier.py:4: in <module>
    from skopt.space import Real
../evalml_venv/lib/python3.7/site-packages/skopt/__init__.py:55: in <module>
    from .searchcv import BayesSearchCV
../evalml_venv/lib/python3.7/site-packages/skopt/searchcv.py:16: in <module>
    from sklearn.utils.fixes import MaskedArray
E   ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' (/Users/angela.lin/Desktop/evalml_venv/lib/python3.7/site-packages/sklearn/utils/fixes.py)```

Our tests, which automatically install the latest version of scikit-optimize (0.9.0) will thus not break, so maybe we don't need to set our requirement of scikit-optimize>=0.9.0 but open to discussion if it feels like we should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you run into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local environment uses scikit-optimize==0.8.1 so as soon as I tried using scikit-learn 1.0, things broke 🥲

Copy link
Contributor

@freddyaboulton freddyaboulton Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm wondering if this will affect users in some way/if we need to update the requirements file in light of this. Did you just install scikit-learn or did you do pip install -r requirements.txt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just installed scikit-learn! Our tests didn't break because they call pip install -r requirements.txt and in a clean environment, 0.9.0 will be installed.

This is a known issue in the scikit-optimize repo where it seems like the community solution is to either use scikit-learn<=0.24.2 or use scikit-optimize>=0.9.0: scikit-optimize/scikit-optimize#569

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense to me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to make the core requirements use scikit-learn >= 1.0.0 versus having support for both but test coverage for the later version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can be persuaded otherwise but I think this aligns with what we've currently been doing aka every time our dependencies come out with a new version, we pin until we can fix the tests to support the latest version--and thats the version that we run on!

@@ -1403,7 +1403,7 @@ def test_t_sne_errors_marker_size(marker_size, has_minimal_dependencies):

@pytest.mark.parametrize("data_type", ["np", "pd", "ww"])
@pytest.mark.parametrize("perplexity", [0, 4.6, 100])
@pytest.mark.parametrize("learning_rate", [100.0, -15, 0])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin angela97lin changed the title [SPIKE] Support scikit-learn 1.0 Add support for scikit-learn>=1.0 Nov 16, 2021
@angela97lin angela97lin marked this pull request as ready for review November 16, 2021 22:00
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin This looks good to me!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Is there a reason we chose to support >=0.24 versus >=1.0.0?

@@ -2,7 +2,7 @@ numpy>=1.20.0
numba==0.53
pandas>=1.3.0
scipy>=1.5.0
scikit-learn>=0.24.0,<1.0
scikit-learn>=0.24.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to make the core requirements use scikit-learn >= 1.0.0 versus having support for both but test coverage for the later version?

evalml/tests/objective_tests/test_standard_metrics.py Outdated Show resolved Hide resolved
@angela97lin angela97lin merged commit d6682a4 into main Nov 17, 2021
@angela97lin angela97lin deleted the 2843_scikit_learn_1.0 branch November 17, 2021 19:52
@chukarsten chukarsten mentioned this pull request Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants