Allow sklearn >=0.20 #47

elsander · 2019-10-04T18:17:26Z

Several tests relied on attributes and checks which were not compatible with sklearn 0.20 and above. This PR includes the following changes:

Removes check for .grid_scores_ attribute in hyperband
Uses updated estimator checks for DataFrameETL
Sets stricter sklearn requirement in dev-requirements only, since changes only affected the test suite
Updated assert_true and assert_false checks to use assert, squashing some deprecation warnings
Added default arguments to LogisticRegression and RandomForest estimators, squashing some FutureWarnings about changing default args

Closes #45.

elsander · 2019-10-04T18:20:06Z

@mheilman Feel free to reassign the review to someone else; wasn't sure who to send it to.

mheilman

Just a few clarification questions

mheilman · 2019-10-04T20:49:17Z

requirements.txt

@@ -1,4 +1,4 @@
-scikit-learn>=0.18.1,<0.20
+scikit-learn>=0.18.1


Should we perhaps do >=0.18.1,<0.22 instead? I'm not 100% sure, but I feel like sklearn essentially treats the minor versions as major versions.

Also, did you confirm that 0.18.1 still works? Travis will just test with the most recent version of sklearn, right?

Tests pass on sklearn 0.19.2 (except for the new estimator check, which does not exist in 0.19.2) 0.18.1 also fails the estimator check, but with a different error. In general, the estimator checks change frequently from version to version, so I expect this unit test to not be very stable across versions. This PR didn't include code changes outside of the test suite, so this specific PR shouldn't affect which versions work (although future changes could). If you want to play it safe and set 0.20 or 0.19 as the new lower bound, I can do that.

mheilman · 2019-10-04T20:53:48Z

civismlext/test/test_stacking.py

-    sc = StackedClassifier([('rf', RandomForestClassifier()),
-                            ('lr', LogisticRegression()),
-                            ('metalr', LogisticRegression())],
+    sc = StackedClassifier([('rf', RandomForestClassifier(n_estimators=10)),


Setting these kwargs explicitly is to avoid warnings and to keep the old default values for the tests, right? (I'm looking at https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

mheilman · 2019-10-04T20:55:07Z

civismlext/test/test_preprocessing.py

@@ -135,7 +133,7 @@ def levels_dict_numeric():
 def test_sklearn_api():
    name = DataFrameETL.__name__
    check_parameters_default_constructible(name, DataFrameETL)
-    check_no_fit_attributes_set_in_init(name, DataFrameETL)
+    # check_no_attributes_set_in_init(name, DataFrameETL)


Why is this commented out?

It was a mistake; I meant to remove this entirely. This specific check does not exist in sklearn>=0.20. But it's not really necessary to keep the check, since it's only checking that init follows the convention of not setting attributes that end in _ during init. This isn't actually a requirement as far as I know, which is probably why it was deprecated. See the source for sklearn 0.19.X https://github.com/scikit-learn/scikit-learn/blob/f0ab589f1541b1ca4570177d93fd7979613497e3/sklearn/utils/estimator_checks.py#L1481

mheilman · 2019-10-04T21:04:33Z

civismlext/test/test_hyperband.py

-
-
-def check_cv_results_grid_scores_consistency(search):
-    # TODO Remove for sklearn 0.20


this is removed because grid_scores_ was replaced by cv_results_, right?

mheilman

LGTM

Liz Sander added 3 commits October 3, 2019 16:29

TST update estimator checks

8b76c86

DEP Update tests to allow sklearn>=0.20

e4a54d9

DOC update changelog

cfef404

elsander requested a review from mheilman October 4, 2019 18:19

elsander assigned mheilman Oct 4, 2019

elsander added the Maintenance label Oct 4, 2019

mheilman reviewed Oct 4, 2019

View reviewed changes

mheilman assigned elsander and unassigned mheilman Oct 4, 2019

Liz Sander added 2 commits October 4, 2019 16:24

STY remove unnecessary commented line

703bb11

DEP set upper sklearn bound in library requirements

a441794

elsander assigned mheilman and unassigned elsander Oct 4, 2019

mheilman approved these changes Oct 7, 2019

View reviewed changes

elsander merged commit 5539ed2 into civisanalytics:master Oct 7, 2019

elsander deleted the 45-sklearn-version branch October 7, 2019 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sklearn >=0.20 #47

Allow sklearn >=0.20 #47

elsander commented Oct 4, 2019

elsander commented Oct 4, 2019

mheilman left a comment

mheilman Oct 4, 2019

elsander Oct 4, 2019

mheilman Oct 4, 2019

elsander Oct 4, 2019

mheilman Oct 4, 2019

elsander Oct 4, 2019

mheilman Oct 4, 2019

elsander Oct 4, 2019

mheilman left a comment

		@@ -1,4 +1,4 @@
		scikit-learn>=0.18.1,<0.20
		scikit-learn>=0.18.1



		def check_cv_results_grid_scores_consistency(search):
		# TODO Remove for sklearn 0.20

Allow sklearn >=0.20 #47

Allow sklearn >=0.20 #47

Conversation

elsander commented Oct 4, 2019

elsander commented Oct 4, 2019

mheilman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mheilman left a comment

Choose a reason for hiding this comment