Enable ensembling as a parameter for `DefaultAlgorithm` #3435

jeremyliweishih · 2022-03-31T19:47:05Z

No description provided.

codecov · 2022-03-31T19:54:54Z

Codecov Report

Merging #3435 (cf4a1d4) into main (df13ed9) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3435     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        334     334             
  Lines      32959   32982     +23     
=======================================
+ Hits       32829   32852     +23     
  Misses       130     130

Impacted Files	Coverage Δ
...valml/automl/automl_algorithm/default_algorithm.py	`100.0% <100.0%> (ø)`
evalml/automl/automl_search.py	`99.7% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`99.5% <100.0%> (+0.1%)`	⬆️
...valml/tests/automl_tests/test_default_algorithm.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df13ed9...cf4a1d4. Read the comment docs.

…/evalml into js_enable_ensembling_for_default

…bling_for_default

jeremyliweishih · 2022-04-01T17:13:18Z

docs/source/release_notes.rst

    * Fixes
+        * Fix ``DefaultAlgorithm`` not handling Email and URL features :pr:`3419`


Didn't move this out in #3419.

jeremyliweishih · 2022-04-01T18:33:50Z

evalml/automl/automl_search.py

@@ -208,7 +208,7 @@ def search(
        if data_check_result["level"] == DataCheckMessageType.ERROR.value:
            return None, data_check_results

-    automl = AutoMLSearch(automl_algorithm="default", **automl_config)
+    automl = AutoMLSearch(automl_algorithm="default", ensembling=True, **automl_config)


Keeping the same behavior for the top level search method.

jeremyliweishih · 2022-04-01T18:40:51Z

evalml/tests/automl_tests/test_automl.py

@@ -300,11 +302,11 @@ def test_pipeline_limits(
        automl.search()
    out = caplog.text
    if verbose:
-        assert "Using default limit of max_batches=4." in out
-        assert "Searching up to 4 batches for a total of" in out
+        assert "Using default limit of max_batches=3." in out


Had to change because ensembling is turned off by default in AutoMLSearch.

Let's remember to update our perf test code! I think we still want the ability to test the ensembler pipeline in batch 4 when we kick off a job?

chukarsten

LGTM! Thanks!

chukarsten · 2022-04-04T14:15:04Z

One thing I forgot to check - do we need to update the documentation since turning ensembling on/off for iterative algorithm is called out?

…bling_for_default

jeremyliweishih · 2022-04-04T17:50:00Z

docs/source/user_guide/automl.ipynb

@@ -605,7 +605,7 @@
    "### Stacking\n",
    "[Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) is an ensemble machine learning algorithm that involves training a model to best combine the predictions of several base learning algorithms. First, each base learning algorithms is trained using the given data. Then, the combining algorithm or meta-learner is trained on the predictions made by those base learning algorithms to make a final prediction.\n",
    "\n",
-    "AutoML enables stacking using the `ensembling` flag during initalization; this is set to `False` by default. The stacking ensemble pipeline runs in its own batch after a whole cycle of training has occurred (each allowed pipeline trains for one batch). Note that this means __a large number of iterations may need to run before the stacking ensemble runs__. It is also important to note that __only the first CV fold is calculated for stacking ensembles__ because the model internally uses CV folds."
+    "AutoML enables stacking using the `ensembling` flag during initalization; this is set to `False` by default. How ensembling runs is defined by the AutoML algorithm you choose. In the `IterativeAlgorithm`, the stacking ensemble pipeline runs in its own batch after a whole cycle of training has occurred (each allowed pipeline trains for one batch). Note that this means __a large number of iterations may need to run before the stacking ensemble runs__. It is also important to note that __only the first CV fold is calculated for stacking ensembles__ because the model internally uses CV folds. See below in the AutoML Algorithms section to see how ensembling is run for `DefaultAlgorithm`."


Made some doc changes here @chukarsten.

freddyaboulton

Looks good @jeremyliweishih ! Saw you already merged but was half way through the review so figured I'd finish it.

Nothing blocking and if we decide address these comments we can do so in a follow up.

freddyaboulton · 2022-04-04T21:22:39Z

docs/source/user_guide/automl.ipynb

@@ -735,7 +735,7 @@
    "    a. For each of the previous top 3 estimators, sample 10 parameters from the tuner. Run all 30 in one batch\n",
    "    b. Run ensembling\n",
    "    \n",
-    "To this end, it is recommended to use the top level `search()` method to run `DefaultAlgorithm`. This allows users to specify running search with just the `mode` parameter, where `fast` is recommended for users who want a fast scan at how EvalML pipelines will perform on their problem and where `long` is reserved for a deeper dive into high performing pipelines. One can also specify `automl_algorithm='default'` using `AutoMLSearch` and it will default to using `fast` mode. Users are welcome to select `max_batches` according to the algorithm above (or other stopping criteria) but should be aware that results may not be optimal if the algorithm does not run for the full length of `fast` mode."
+    "To this end, it is recommended to use the top level `search()` method to run `DefaultAlgorithm`. This allows users to specify running search with just the `mode` parameter, where `fast` is recommended for users who want a fast scan at how EvalML pipelines will perform on their problem and where `long` is reserved for a deeper dive into high performing pipelines. If one needs finer control over AutoML parameters, one can also specify `automl_algorithm='default'` using `AutoMLSearch` and it will default to using `fast` mode. However, in this case ensembling will be defined by the `ensembling` flag (if `ensembling=False` the abovementioned ensembling batches will be skipped). Users are welcome to select `max_batches` according to the algorithm above (or other stopping criteria) but should be aware that results may not be optimal if the algorithm does not run for the full length of `fast` mode."


Let's say ensembling for time series is not enabled

freddyaboulton · 2022-04-04T21:28:40Z

evalml/automl/automl_algorithm/default_algorithm.py

@@ -90,6 +90,7 @@ def __init__(
        n_jobs=-1,
        text_in_ensembling=False,
        top_n=3,
+        ensembling=True,


Should we set this false to match the behavior in AutoMLSearch?

What I don't like about this is that we change the user parameter value silently for time series if it's not False. Perhaps it would be better to raise an exception if ensembling is set to true for time series?

freddyaboulton · 2022-04-04T21:30:14Z

evalml/tests/automl_tests/test_automl.py

@@ -300,11 +302,11 @@ def test_pipeline_limits(
        automl.search()
    out = caplog.text
    if verbose:
-        assert "Using default limit of max_batches=4." in out
-        assert "Searching up to 4 batches for a total of" in out
+        assert "Using default limit of max_batches=3." in out


Let's remember to update our perf test code! I think we still want the ability to test the ensembler pipeline in batch 4 when we kick off a job?

jeremyliweishih added 2 commits March 31, 2022 15:27

Make ensembling public and pass down from default algorithm

014f1c2

Add test cases

0b12986

auto-assign bot assigned jeremyliweishih Mar 31, 2022

jeremyliweishih added 2 commits March 31, 2022 15:47

Release notes'

868c88c

lint

35c6a23

jeremyliweishih added 5 commits April 1, 2022 12:13

Merge branch 'main' into js_enable_ensembling_for_default

49c56a4

lint

26dfe69

Merge branch 'js_enable_ensembling_for_default' of github.com:alteryx…

20fe977

…/evalml into js_enable_ensembling_for_default

Merge branch 'main' of github.com:alteryx/evalml into js_enable_ensem…

c6256cc

…bling_for_default

Merge branch 'main' of github.com:alteryx/evalml into js_enable_ensem…

2243caf

…bling_for_default

jeremyliweishih commented Apr 1, 2022

View reviewed changes

jeremyliweishih added 3 commits April 1, 2022 13:35

Change default max batches behavior

dac7b1e

Fix pipeline limit test

8c9d5c1

Turn ensembling on by default in top level search method

40bee8d

jeremyliweishih commented Apr 1, 2022

View reviewed changes

jeremyliweishih requested review from freddyaboulton, chukarsten, christopherbunn, eccabay and fjlanasa April 1, 2022 18:41

chukarsten approved these changes Apr 4, 2022

View reviewed changes

Merge branch 'main' into js_enable_ensembling_for_default

b0492d1

jeremyliweishih added 2 commits April 4, 2022 13:41

Merge branch 'main' of github.com:alteryx/evalml into js_enable_ensem…

79f019c

…bling_for_default

Edit docs

2ae983c

jeremyliweishih commented Apr 4, 2022

View reviewed changes

Merge branch 'main' into js_enable_ensembling_for_default

cf4a1d4

jeremyliweishih merged commit f05332d into main Apr 4, 2022

freddyaboulton reviewed Apr 4, 2022

View reviewed changes

jeremyliweishih mentioned this pull request Apr 5, 2022

DefaultAlgorithm Ensembling Followup #3444

Merged

chukarsten mentioned this pull request Apr 12, 2022

Release v.0.50.0 #3461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ensembling as a parameter for `DefaultAlgorithm` #3435

Enable ensembling as a parameter for `DefaultAlgorithm` #3435

jeremyliweishih commented Mar 31, 2022

codecov bot commented Mar 31, 2022 •

edited

Loading

jeremyliweishih Apr 1, 2022

jeremyliweishih Apr 1, 2022

jeremyliweishih Apr 1, 2022

freddyaboulton Apr 4, 2022

chukarsten left a comment

chukarsten commented Apr 4, 2022

jeremyliweishih Apr 4, 2022

chukarsten Apr 4, 2022

freddyaboulton left a comment

freddyaboulton Apr 4, 2022

freddyaboulton Apr 4, 2022

jeremyliweishih Apr 4, 2022

freddyaboulton Apr 4, 2022

		* Fixes
		* Fix ``DefaultAlgorithm`` not handling Email and URL features :pr:`3419`

Enable ensembling as a parameter for DefaultAlgorithm #3435

Enable ensembling as a parameter for DefaultAlgorithm #3435

Conversation

jeremyliweishih commented Mar 31, 2022

codecov bot commented Mar 31, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

chukarsten commented Apr 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enable ensembling as a parameter for `DefaultAlgorithm` #3435

Enable ensembling as a parameter for `DefaultAlgorithm` #3435

codecov bot commented Mar 31, 2022 •

edited

Loading