New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sklearn >=0.20 #47
Conversation
@mheilman Feel free to reassign the review to someone else; wasn't sure who to send it to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few clarification questions
requirements.txt
Outdated
@@ -1,4 +1,4 @@ | |||
scikit-learn>=0.18.1,<0.20 | |||
scikit-learn>=0.18.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we perhaps do >=0.18.1,<0.22
instead? I'm not 100% sure, but I feel like sklearn essentially treats the minor versions as major versions.
Also, did you confirm that 0.18.1 still works? Travis will just test with the most recent version of sklearn, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass on sklearn 0.19.2 (except for the new estimator check, which does not exist in 0.19.2) 0.18.1 also fails the estimator check, but with a different error. In general, the estimator checks change frequently from version to version, so I expect this unit test to not be very stable across versions. This PR didn't include code changes outside of the test suite, so this specific PR shouldn't affect which versions work (although future changes could). If you want to play it safe and set 0.20 or 0.19 as the new lower bound, I can do that.
sc = StackedClassifier([('rf', RandomForestClassifier()), | ||
('lr', LogisticRegression()), | ||
('metalr', LogisticRegression())], | ||
sc = StackedClassifier([('rf', RandomForestClassifier(n_estimators=10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting these kwargs explicitly is to avoid warnings and to keep the old default values for the tests, right? (I'm looking at https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
@@ -135,7 +133,7 @@ def levels_dict_numeric(): | |||
def test_sklearn_api(): | |||
name = DataFrameETL.__name__ | |||
check_parameters_default_constructible(name, DataFrameETL) | |||
check_no_fit_attributes_set_in_init(name, DataFrameETL) | |||
# check_no_attributes_set_in_init(name, DataFrameETL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a mistake; I meant to remove this entirely. This specific check does not exist in sklearn>=0.20. But it's not really necessary to keep the check, since it's only checking that init follows the convention of not setting attributes that end in _
during init. This isn't actually a requirement as far as I know, which is probably why it was deprecated. See the source for sklearn 0.19.X https://github.com/scikit-learn/scikit-learn/blob/f0ab589f1541b1ca4570177d93fd7979613497e3/sklearn/utils/estimator_checks.py#L1481
|
||
|
||
def check_cv_results_grid_scores_consistency(search): | ||
# TODO Remove for sklearn 0.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed because grid_scores_
was replaced by cv_results_
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Several tests relied on attributes and checks which were not compatible with sklearn 0.20 and above. This PR includes the following changes:
.grid_scores_
attribute in hyperbandDataFrameETL
assert_true
andassert_false
checks to useassert
, squashing some deprecation warningsLogisticRegression
andRandomForest
estimators, squashing someFutureWarnings
about changing default argsCloses #45.