Add StandardScaler to ElasticNet pipelines. #1065

freddyaboulton · 2020-08-14T18:37:57Z

Pull Request Description

Adds StandardScaler to all ElasticNet pipelines.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-08-14T18:40:30Z

Codecov Report

Merging #1065 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1065   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files         188      188           
  Lines       10286    10296   +10     
=======================================
+ Hits        10277    10287   +10     
  Misses          9        9

Impacted Files	Coverage Δ
evalml/pipelines/utils.py	`100.00% <100.00%> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fe1630...0dabc81. Read the comment docs.

angela97lin

Left a comment about adding a test but thanks for catching and adding this! 😁

(Side note: I'm curious / abusing the perf tests but still curious if/how this changes scores :d)

angela97lin · 2020-08-14T18:53:07Z

evalml/pipelines/utils.py

@@ -56,7 +55,7 @@ def _get_preprocessing_components(X, y, problem_type, estimator_class):
    if (add_datetime_featurizer or len(categorical_cols.columns) > 0) and estimator_class not in {CatBoostClassifier, CatBoostRegressor}:
        pp_components.append(OneHotEncoder)

-    if estimator_class in {LinearRegressor, LogisticRegressionClassifier}:
+    if estimator_class.model_family == ModelFamily.LINEAR_MODEL:


Nice! Could we add a test (testing test_make_pipeline) to make sure :d

dsherry

Great! Thanks for thinking of this. As mentioned in slack, I think all non-tree-based models should apply scaling, for now.

@freddyaboulton it would be great to see perf test results on this, perhaps 5 trials on datasets_small_0.yaml with model_family limited to linear. But since this is a small change, not required--if it did introduce a regression we'd catch it before release.

eccabay

Wonderful, LGTM!

angela97lin

LGTM! thanks for adding the test!

angela97lin · 2020-08-17T15:10:56Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -114,6 +116,11 @@ def test_make_pipeline():
    assert isinstance(binary_pipeline, type(BinaryClassificationPipeline))
    assert binary_pipeline.component_graph == [DropNullColumns, Imputer, DateTimeFeaturizer, OneHotEncoder, StandardScaler, LogisticRegressionClassifier]

+    en_binary_pipeline = make_pipeline(X, y, ElasticNetClassifier, ProblemTypes.BINARY)


Nice, thanks for this!

freddyaboulton · 2020-08-17T16:23:38Z

@angela97lin @dsherry @eccabay I was able to run the performance tests for this change. The results are here. In short, introducing this change increases the number of times EN is picked as the best pipeline without introducing any regressions (fit time or average best pipeline score) so I think we're ok to merge this.

freddyaboulton added 2 commits August 14, 2020 14:36

All linear models have a standard scaler in the pipeline.

d5b394d

Edited release_notes for PR 1065.

7c7f7ef

freddyaboulton changed the title ~~All linear models have a standard scaler in the pipeline.~~ Add StandardScaler to ElasticNet pipelines. Aug 14, 2020

freddyaboulton marked this pull request as ready for review August 14, 2020 18:51

freddyaboulton requested review from angela97lin, dsherry, jeremyliweishih and eccabay August 14, 2020 18:51

auto-assign bot assigned freddyaboulton Aug 14, 2020

angela97lin suggested changes Aug 14, 2020

View reviewed changes

Adding test to make sure EN pipelines have a StandardScaler.

0dabc81

freddyaboulton requested a review from angela97lin August 14, 2020 19:12

dsherry approved these changes Aug 14, 2020

View reviewed changes

eccabay approved these changes Aug 17, 2020

View reviewed changes

angela97lin approved these changes Aug 17, 2020

View reviewed changes

freddyaboulton merged commit 2ecf3bd into main Aug 17, 2020

dsherry mentioned this pull request Aug 25, 2020

Release v0.13.1 #1101

Merged

freddyaboulton deleted the add-standard-scaler-elastic-net-pipelines branch October 22, 2020 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StandardScaler to ElasticNet pipelines. #1065

Add StandardScaler to ElasticNet pipelines. #1065

freddyaboulton commented Aug 14, 2020

codecov bot commented Aug 14, 2020 •

edited

Loading

angela97lin left a comment

angela97lin Aug 14, 2020

freddyaboulton Aug 14, 2020

dsherry left a comment

eccabay left a comment

angela97lin left a comment

angela97lin Aug 17, 2020

freddyaboulton commented Aug 17, 2020

Add StandardScaler to ElasticNet pipelines. #1065

Add StandardScaler to ElasticNet pipelines. #1065

Conversation

freddyaboulton commented Aug 14, 2020

Pull Request Description

codecov bot commented Aug 14, 2020 • edited Loading

Codecov Report

angela97lin left a comment

Choose a reason for hiding this comment

angela97lin Aug 14, 2020

Choose a reason for hiding this comment

freddyaboulton Aug 14, 2020

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

angela97lin Aug 17, 2020

Choose a reason for hiding this comment

freddyaboulton commented Aug 17, 2020

codecov bot commented Aug 14, 2020 •

edited

Loading