Upgrade sklearn to 1.2 and switch tuner #3983

eccabay · 2023-02-08T16:17:52Z

Resolves #3984

Trying #3909 again, this time updating our SKOptTuner to work around the skopt issue, since their repo hasn't been updated since October of 2021.

Perf tests for the tuner estimator produced no significant changes save some small speedups.

codecov · 2023-02-08T16:31:15Z

Codecov Report

Merging #3983 (c4ff45c) into main (5f58354) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3983     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        347     347             
  Lines      36896   36906     +10     
=======================================
+ Hits       36775   36785     +10     
  Misses       121     121

Impacted Files	Coverage Δ
evalml/objectives/standard_metrics.py	`100.0% <ø> (ø)`
...s/estimators/regressors/decision_tree_regressor.py	`100.0% <ø> (ø)`
...ents/estimators/regressors/elasticnet_regressor.py	`100.0% <ø> (ø)`
evalml/pipelines/components/utils.py	`96.6% <ø> (ø)`
evalml/pipelines/regression_pipeline.py	`100.0% <ø> (ø)`
...valml/pipelines/time_series_regression_pipeline.py	`100.0% <ø> (ø)`
evalml/tests/component_tests/test_components.py	`99.0% <ø> (ø)`
evalml/tests/component_tests/test_en_regressor.py	`100.0% <ø> (ø)`
...valml/tests/pipeline_tests/test_component_graph.py	`99.9% <ø> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`99.9% <ø> (ø)`
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

chukarsten · 2023-02-08T22:20:58Z

evalml/objectives/standard_metrics.py

@@ -577,7 +577,7 @@ class LogLossBinary(BinaryClassificationObjective):
    Example:
        >>> y_true = pd.Series([0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1])
        >>> y_pred = pd.Series([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
-        >>> np.testing.assert_almost_equal(LogLossBinary().objective_function(y_true, y_pred), 18.8393325)
+        >>> np.testing.assert_almost_equal(LogLossBinary().objective_function(y_true, y_pred), 19.6601745)


Should this really change that much?

Agreed that this is weird, but I did test between the two versions and the number does in fact change. The release notes for version 1.2 mention the change of an epsilon parameter for log loss, which might be the cause of this.

chukarsten · 2023-02-08T22:28:00Z

evalml/pipelines/time_series_regression_pipeline.py

-        ...                                                       parameters={"Linear Regressor": {"normalize": True},
+        ...                                                       parameters={"Simple Imputer": {"impute_strategy": "mean"},


Why did we have to do this?

Scikit-learn removed the normalize parameter from their linear models (it's been deprecated for a while before that, as well, you can see the note about it in old docs here)

chukarsten · 2023-02-08T22:33:14Z

evalml/tests/component_tests/test_oversampler.py

-    assert snc.categorical_features == [0, 1, 2]
+    assert snc.categorical_features == [20, 21, 2]


Why would this be the case?

It's due to how columns are sorted with strings vs numbers - since the column names get converted to strings before adding the new columns, the new ones are added at the end instead of the beginning. I could remove the need for this change by converting the column names to strings after the new columns are added.

chukarsten · 2023-02-08T22:34:49Z

evalml/tests/pipeline_tests/test_component_graph.py

-        "IS_FREE_EMAIL_DOMAIN(email)_True": Boolean(),
+        "IS_FREE_EMAIL_DOMAIN(email)_1.0": Boolean(),


Uh oh, I don't know if this is desired, right? @Cmancuso

Yeah, I'm concerned about this as well, and I don't quite understand why it's happening. I spent some time digging into it and it seems to be stemming exclusively from scikit-learn. We correctly pass booleans into the encoder, but this came out as a float with no change on our end.

tamargrey · 2023-02-09T13:53:20Z

Just noting the relation to the nullable types epic for posterity: scikit-learn 1.2.0 and 1.2.1 introduce new support for nullable types, which will allow us to use the estimators mentioned in #3910 with nullable types.

eccabay added 4 commits February 8, 2023 10:29

Change optimizer in SKOptTuner

dd83010

Update sklearn min to 1.2.0 and sktime to 0.15.0

bd68d1d

Port over all previous upgrade changes

5569104

Merge branch 'main' into sklearn-with-skopt

2b7b858

Small test fixes and release notes

1392df1

eccabay mentioned this pull request Feb 8, 2023

Upgrade sklearn to 1.2 #3909

Closed

I hope that's all the tests now

d55fbb3

eccabay marked this pull request as ready for review February 8, 2023 19:12

auto-assign bot assigned eccabay Feb 8, 2023

eccabay requested review from christopherbunn, jeremyliweishih, chukarsten and tamargrey February 8, 2023 19:12

Bump minimum sklearn version to 1.2.1

05fc87f

chukarsten reviewed Feb 8, 2023

View reviewed changes

Merge branch 'main' into sklearn-with-skopt

c4ff45c

chukarsten approved these changes Feb 13, 2023

View reviewed changes

jeremyliweishih approved these changes Feb 13, 2023

View reviewed changes

eccabay merged commit 78bd72f into main Feb 13, 2023

eccabay deleted the sklearn-with-skopt branch February 13, 2023 15:39

christopherbunn mentioned this pull request Feb 15, 2023

Release v0.68.0 #4002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade sklearn to 1.2 and switch tuner #3983

Upgrade sklearn to 1.2 and switch tuner #3983

eccabay commented Feb 8, 2023 •

edited

Loading

codecov bot commented Feb 8, 2023 •

edited

Loading

chukarsten Feb 8, 2023

eccabay Feb 9, 2023

chukarsten Feb 8, 2023

eccabay Feb 9, 2023

chukarsten Feb 8, 2023

eccabay Feb 9, 2023

chukarsten Feb 8, 2023

eccabay Feb 9, 2023

tamargrey commented Feb 9, 2023

		... parameters={"Linear Regressor": {"normalize": True},
		... parameters={"Simple Imputer": {"impute_strategy": "mean"},

		assert snc.categorical_features == [0, 1, 2]
		assert snc.categorical_features == [20, 21, 2]

		"IS_FREE_EMAIL_DOMAIN(email)_True": Boolean(),
		"IS_FREE_EMAIL_DOMAIN(email)_1.0": Boolean(),

Upgrade sklearn to 1.2 and switch tuner #3983

Upgrade sklearn to 1.2 and switch tuner #3983

Conversation

eccabay commented Feb 8, 2023 • edited Loading

codecov bot commented Feb 8, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamargrey commented Feb 9, 2023

eccabay commented Feb 8, 2023 •

edited

Loading

codecov bot commented Feb 8, 2023 •

edited

Loading