Integrate ensemble methods in AutoML #1253

angela97lin · 2020-10-01T20:00:48Z

…nto 1130_ensemble_in_automl

codecov · 2020-10-05T19:06:53Z

Codecov Report

Merging #1253 into main will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1253      +/-   ##
==========================================
+ Coverage   99.95%   99.95%   +0.01%     
==========================================
  Files         213      213              
  Lines       13436    13555     +119     
==========================================
+ Hits        13429    13548     +119     
  Misses          7        7

Impacted Files	Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py	`100.00% <100.00%> (ø)`
evalml/automl/automl_search.py	`99.61% <100.00%> (+0.01%)`	⬆️
...lines/components/ensemble/stacked_ensemble_base.py	`100.00% <100.00%> (ø)`
...components/ensemble/stacked_ensemble_classifier.py	`100.00% <100.00%> (ø)`
.../components/ensemble/stacked_ensemble_regressor.py	`100.00% <100.00%> (ø)`
evalml/pipelines/utils.py	`100.00% <100.00%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.00% <100.00%> (ø)`
...lml/tests/automl_tests/test_iterative_algorithm.py	`100.00% <100.00%> (ø)`
evalml/tests/conftest.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edd91f5...8c1937f. Read the comment docs.

angela97lin · 2020-10-21T15:08:45Z

@dsherry Okay! I kicked off a test last night. Seems like there aren't errors now, but I am getting a lot of warnings based on the update:

Are we curious and want to wait for tests to complete to see how performance changes with LightGBM? Or is that a nice to have, but not necessary for merging?

jeremyliweishih

LGTM on my end - just a couple things to clean up.

evalml/automl/automl_algorithm/iterative_algorithm.py

evalml/automl/automl_search.py

evalml/tests/automl_tests/test_iterative_algorithm.py

…nto 1130_ensemble_in_automl

freddyaboulton

@angela97lin I think this is great! I have some questions (mainly for my understanding lol)

freddyaboulton · 2020-10-21T20:54:47Z

evalml/automl/automl_algorithm/iterative_algorithm.py

        if self.batch_number == 1:
            self._first_batch_results.append((score_to_minimize, pipeline.__class__))

+        if pipeline.model_family not in self._best_pipeline_params and score_to_minimize is not None:


How come we check None and not np.nan (the score that AutoML uses when a pipeline fails)?

Haha I think this came out of the test_pipelines_in_batch_return_none test which checks what if the pipeline returns None 😅 When None is returned, we get an TypeError: '<' not supported between instances of 'NoneType' and 'int'; np.nan doesn't throw this error since you can compare (though score_to_minimize < current_best_score will always be False so we never update the value to np.nan)

evalml/automl/automl_search.py

evalml/tests/automl_tests/test_automl.py

dsherry

@angela97lin so exciting that we're ready to 🚢 this, nice going!

As discussed, let's change the automl default to ensembling=False. Other than that, all that's blocking: one comment in iterative algo, log message in _compute_cv_scores

It would be great to add the missing docstrings as well, and I left a couple other suggestions.

evalml/automl/automl_algorithm/iterative_algorithm.py

dsherry · 2020-10-21T20:15:13Z

evalml/automl/automl_algorithm/iterative_algorithm.py

+                pipeline_params = pipeline_dict['parameters']
+                input_pipelines.append(pipeline_class(parameters=self._transform_parameters(pipeline_class, pipeline_params)))
+            ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type)
+            next_batch.append(ensemble)
        else:
            idx = (self._batch_number - 1) % len(self._first_batch_results)


I believe this should be updated to (self._batch_number - 1) % (len(self._first_batch_results) + 1), to match the other modular arithmetic you added in the elif above.

This selects which pipeline class we should tune in the current batch, when we're not in the 0th batch or in a stacked ensemble batch.

This is an aside, but in the future I hope we can figure out how to represent the state here in a less confusing way! I think our requirements for this code have grown a little beyond its means, haha.

Ahh, really good catch. We have to update it if self.ensembling is True:

num_pipeline_classes = (len(self._first_batch_results) + 1) if self.ensembling else len(self._first_batch_results) idx = (self._batch_number - 1) % num_pipeline_classes

evalml/automl/automl_algorithm/iterative_algorithm.py

evalml/automl/automl_search.py

evalml/pipelines/utils.py

evalml/tests/automl_tests/test_automl.py

dsherry · 2020-10-21T21:28:56Z

evalml/tests/automl_tests/test_iterative_algorithm.py

-    assert any([p != dummy_binary_pipeline_classes[0]({}).parameters for p in all_parameters])
+
+    for i in range(1, 5):
+        for _ in range(len(dummy_binary_pipeline_classes)):


This is totally a style nit-pick / me just confusing myself, haha, but why the double-for here?

Ah, the for i in range(1, 5) is every cycle of batches (goes through every pipeline class) so it tests that the ensemble is called at the end of every cycle. And then the inner for loop is for each pipeline class. Really confusing stuff 😂

evalml/tests/automl_tests/test_iterative_algorithm.py

…nto 1130_ensemble_in_automl

angela97lin · 2020-10-22T04:21:18Z

@dsherry @freddyaboulton Thanks for all of the great feedback! I've addressed all of the comments you guys made, so this PR should be good to merge :D If there's anything to follow up on, we can open a separate PR after!

angela97lin added 3 commits September 30, 2020 19:42

wip

dba9464

hm

7efa971

more testing

e46d1a3

angela97lin self-assigned this Oct 1, 2020

angela97lin added 2 commits October 1, 2020 16:58

release notes

4d0dd01

Merge branch 'main' into 1130_ensemble_in_automl

ecc5833

angela97lin added this to the October 2020 milestone Oct 1, 2020

angela97lin added 8 commits October 1, 2020 17:29

Merge branch 'main' into 1130_ensemble_in_automl

48fabc9

Merge branch '1130_ensemble_in_automl' of github.com:alteryx/evalml i…

e748e6e

…nto 1130_ensemble_in_automl

weeeet warfsjdf

431dffb

fix

3e293a8

gr

879ce36

fix test

fe597ba

fix test?

210a264

more testing

5ecb845

angela97lin added 14 commits October 5, 2020 15:39

ckeanup

9bc226a

cleanup

bc60a4b

make helper private for now

8d95bac

merging

2fae1dd

hotfix

cbc240d

remove lightgbm entirely

40e1d3d

extract out data split default

37bf978

update folds

43d56b1

fix merge conflict

0ad76fd

oops fix accidental merge

611fae9

fixing on main branch, updating to not use cv

d4862b0

print

6c35615

print statement

589380b

try reset index

2ea5ee4

angela97lin added 3 commits October 21, 2020 11:08

Merge branch 'main' into 1130_ensemble_in_automl

7445a35

update test

1746a87

Merge branch 'main' into 1130_ensemble_in_automl

70d078d

angela97lin marked this pull request as ready for review October 21, 2020 17:19

angela97lin requested review from dsherry, freddyaboulton, bchen1116, jeremyliweishih, christopherbunn and eccabay October 21, 2020 17:20

jeremyliweishih approved these changes Oct 21, 2020

View reviewed changes

evalml/automl/automl_algorithm/iterative_algorithm.py Outdated Show resolved Hide resolved

evalml/automl/automl_search.py Show resolved Hide resolved

evalml/tests/automl_tests/test_iterative_algorithm.py Show resolved Hide resolved

angela97lin mentioned this pull request Oct 21, 2020

Add documentation about stacked ensembles in AutoML #1329

Closed

angela97lin added 3 commits October 21, 2020 15:09

update docstr

ab2e462

Merge branch '1130_ensemble_in_automl' of github.com:alteryx/evalml i…

66098ef

…nto 1130_ensemble_in_automl

add docstr to iterative_algo

f8c3601

angela97lin mentioned this pull request Oct 21, 2020

LightGBM emits warnings for subsample_freq and subsample #1330

Closed

Merge branch 'main' into 1130_ensemble_in_automl

a71bd1b

freddyaboulton approved these changes Oct 21, 2020

View reviewed changes

dsherry approved these changes Oct 21, 2020

View reviewed changes

angela97lin added 6 commits October 21, 2020 19:12

addressing most of review comments, still two

17bdf3e

update test for codecov

1f975a9

Merge remote-tracking branch 'origin' into 1130_ensemble_in_automl

fc2b58d

Merge branch 'main' into 1130_ensemble_in_automl

420364c

fix idx

dc22664

Merge branch '1130_ensemble_in_automl' of github.com:alteryx/evalml i…

8c1937f

…nto 1130_ensemble_in_automl

angela97lin merged commit bc9d185 into main Oct 22, 2020

angela97lin deleted the 1130_ensemble_in_automl branch October 22, 2020 04:21

dsherry mentioned this pull request Oct 29, 2020

Release v0.15.0 #1370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ensemble methods in AutoML #1253

Integrate ensemble methods in AutoML #1253

angela97lin commented Oct 1, 2020 •

edited

Loading

codecov bot commented Oct 5, 2020 •

edited

Loading

angela97lin commented Oct 21, 2020

jeremyliweishih left a comment

freddyaboulton left a comment

freddyaboulton Oct 21, 2020

angela97lin Oct 21, 2020

dsherry left a comment

dsherry Oct 21, 2020

angela97lin Oct 22, 2020

dsherry Oct 21, 2020

angela97lin Oct 21, 2020

angela97lin commented Oct 22, 2020

Integrate ensemble methods in AutoML #1253

Integrate ensemble methods in AutoML #1253

Conversation

angela97lin commented Oct 1, 2020 • edited Loading

codecov bot commented Oct 5, 2020 • edited Loading

Codecov Report

angela97lin commented Oct 21, 2020

jeremyliweishih left a comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Oct 21, 2020

Choose a reason for hiding this comment

angela97lin Oct 21, 2020

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

dsherry Oct 21, 2020

Choose a reason for hiding this comment

angela97lin Oct 22, 2020

Choose a reason for hiding this comment

dsherry Oct 21, 2020

Choose a reason for hiding this comment

angela97lin Oct 21, 2020

Choose a reason for hiding this comment

angela97lin commented Oct 22, 2020

angela97lin commented Oct 1, 2020 •

edited

Loading

codecov bot commented Oct 5, 2020 •

edited

Loading