Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sklearn >=0.20 #47

Merged
merged 5 commits into from Oct 7, 2019
Merged

Conversation

elsander
Copy link
Contributor

@elsander elsander commented Oct 4, 2019

Several tests relied on attributes and checks which were not compatible with sklearn 0.20 and above. This PR includes the following changes:

  • Removes check for .grid_scores_ attribute in hyperband
  • Uses updated estimator checks for DataFrameETL
  • Sets stricter sklearn requirement in dev-requirements only, since changes only affected the test suite
  • Updated assert_true and assert_false checks to use assert, squashing some deprecation warnings
  • Added default arguments to LogisticRegression and RandomForest estimators, squashing some FutureWarnings about changing default args

Closes #45.

@elsander elsander requested a review from mheilman October 4, 2019 18:19
@elsander
Copy link
Contributor Author

elsander commented Oct 4, 2019

@mheilman Feel free to reassign the review to someone else; wasn't sure who to send it to.

Copy link
Contributor

@mheilman mheilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few clarification questions

requirements.txt Outdated
@@ -1,4 +1,4 @@
scikit-learn>=0.18.1,<0.20
scikit-learn>=0.18.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps do >=0.18.1,<0.22 instead? I'm not 100% sure, but I feel like sklearn essentially treats the minor versions as major versions.

Also, did you confirm that 0.18.1 still works? Travis will just test with the most recent version of sklearn, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass on sklearn 0.19.2 (except for the new estimator check, which does not exist in 0.19.2) 0.18.1 also fails the estimator check, but with a different error. In general, the estimator checks change frequently from version to version, so I expect this unit test to not be very stable across versions. This PR didn't include code changes outside of the test suite, so this specific PR shouldn't affect which versions work (although future changes could). If you want to play it safe and set 0.20 or 0.19 as the new lower bound, I can do that.

sc = StackedClassifier([('rf', RandomForestClassifier()),
('lr', LogisticRegression()),
('metalr', LogisticRegression())],
sc = StackedClassifier([('rf', RandomForestClassifier(n_estimators=10)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting these kwargs explicitly is to avoid warnings and to keep the old default values for the tests, right? (I'm looking at https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@@ -135,7 +133,7 @@ def levels_dict_numeric():
def test_sklearn_api():
name = DataFrameETL.__name__
check_parameters_default_constructible(name, DataFrameETL)
check_no_fit_attributes_set_in_init(name, DataFrameETL)
# check_no_attributes_set_in_init(name, DataFrameETL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a mistake; I meant to remove this entirely. This specific check does not exist in sklearn>=0.20. But it's not really necessary to keep the check, since it's only checking that init follows the convention of not setting attributes that end in _ during init. This isn't actually a requirement as far as I know, which is probably why it was deprecated. See the source for sklearn 0.19.X https://github.com/scikit-learn/scikit-learn/blob/f0ab589f1541b1ca4570177d93fd7979613497e3/sklearn/utils/estimator_checks.py#L1481



def check_cv_results_grid_scores_consistency(search):
# TODO Remove for sklearn 0.20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is removed because grid_scores_ was replaced by cv_results_, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@mheilman mheilman assigned elsander and unassigned mheilman Oct 4, 2019
@elsander elsander assigned mheilman and unassigned elsander Oct 4, 2019
Copy link
Contributor

@mheilman mheilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elsander elsander merged commit 5539ed2 into civisanalytics:master Oct 7, 2019
@elsander elsander deleted the 45-sklearn-version branch October 7, 2019 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compatibility with newest scikit-learn
2 participants