-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StandardScaler to ElasticNet pipelines. #1065
Add StandardScaler to ElasticNet pipelines. #1065
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1065 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 188 188
Lines 10286 10296 +10
=======================================
+ Hits 10277 10287 +10
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about adding a test but thanks for catching and adding this! 😁
(Side note: I'm curious / abusing the perf tests but still curious if/how this changes scores :d)
@@ -56,7 +55,7 @@ def _get_preprocessing_components(X, y, problem_type, estimator_class): | |||
if (add_datetime_featurizer or len(categorical_cols.columns) > 0) and estimator_class not in {CatBoostClassifier, CatBoostRegressor}: | |||
pp_components.append(OneHotEncoder) | |||
|
|||
if estimator_class in {LinearRegressor, LogisticRegressionClassifier}: | |||
if estimator_class.model_family == ModelFamily.LINEAR_MODEL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Could we add a test (testing test_make_pipeline
) to make sure :d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for thinking of this. As mentioned in slack, I think all non-tree-based models should apply scaling, for now.
@freddyaboulton it would be great to see perf test results on this, perhaps 5 trials on datasets_small_0.yaml
with model_family
limited to linear. But since this is a small change, not required--if it did introduce a regression we'd catch it before release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks for adding the test!
@@ -114,6 +116,11 @@ def test_make_pipeline(): | |||
assert isinstance(binary_pipeline, type(BinaryClassificationPipeline)) | |||
assert binary_pipeline.component_graph == [DropNullColumns, Imputer, DateTimeFeaturizer, OneHotEncoder, StandardScaler, LogisticRegressionClassifier] | |||
|
|||
en_binary_pipeline = make_pipeline(X, y, ElasticNetClassifier, ProblemTypes.BINARY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for this!
@angela97lin @dsherry @eccabay I was able to run the performance tests for this change. The results are here. In short, introducing this change increases the number of times EN is picked as the best pipeline without introducing any regressions (fit time or average best pipeline score) so I think we're ok to merge this. |
Pull Request Description
Adds StandardScaler to all ElasticNet pipelines.
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.