Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388

angela97lin · 2020-10-31T02:45:11Z

evalml/pipelines/components/utils.py

evalml/tests/automl_tests/test_automl.py

angela97lin · 2020-11-02T15:38:12Z

Ah @bchen1116 I see you put up #1369 for review--our approach to getting rid of the warnings are different, but perhaps you have more context about what to do / whether I'm missing something? I could back out the LightGBM changes in favor for #1369 :)

angela97lin · 2020-11-02T16:01:09Z

Codecov passes here: https://codecov.io/gh/alteryx/evalml/commit/d388e9a957332b5cdecf036633b523db8c2ebe4a/graphs

Edit: Looking at the regenerated codecov report, codecov fails because the total number of lines has decreased, so the percentage coverage is also decreasing. 😬

codecov · 2020-11-02T16:05:33Z

Codecov Report

Merging #1388 (dd49ed0) into main (d78d1f2) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1388     +/-   ##
=========================================
- Coverage   100.0%   100.0%   -0.0%     
=========================================
  Files         214      214             
  Lines       14107    14073     -34     
=========================================
- Hits        14100    14066     -34     
  Misses          7        7

Impacted Files	Coverage Δ
...lines/components/ensemble/stacked_ensemble_base.py	`100.0% <ø> (ø)`
...components/ensemble/stacked_ensemble_classifier.py	`100.0% <ø> (ø)`
...ents/estimators/classifiers/lightgbm_classifier.py	`100.0% <100.0%> (ø)`
evalml/pipelines/components/utils.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.0% <100.0%> (ø)`
...alml/tests/component_tests/test_lgbm_classifier.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d78d1f2...dd49ed0. Read the comment docs.

bchen1116

LGTM! didn't run into the errors when I tried to repro!

For LightGBM, We just have different implementations that result in the same end behavior, so I'm fine closing my PR in exchange of this. I did leave a comment on always leaving subsample=None

bchen1116 · 2020-11-02T16:00:14Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

        # avoid lightgbm warnings having to do with parameter aliases
-        if lg_parameters['bagging_freq']:
+        if lg_parameters['bagging_freq'] is not None:


@angela97lin I think the if/elif sections are fine, since for rf, we need to do bagging, whereas for goss, lightgbm doesn't accept bagging. For the last check though, I would set subsample and subsample_freq to None by default since as long as bagging values are passed in, it'll throw the warning if subsample values aren't None.

@bchen1116 Just to make sure I'm understanding, are you suggesting that rather than checking if bagging_freq is None, always update subsample and subsample_freq to None? :o

Right, so I thought this was basically patching a bug in lightgbm. subsample is supposed to be an alias for bagging_fraction, and subsample_freq is supposed to be an alias for bagging_freq.

What I'm remembering is that @bchen1116 found that when you set bagging_fraction or bagging_freq, you have to set both subsample and subsample_freq to None in order to avoid a warning. (Is that right? I may be off by a little bit)

If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.

@dsherry @bchen1116 Oooo. Based on what you guys have said, does that mean I should be checking if either bagging_freq or bagging_fraction are set to None? (currently just if lg_parameters['bagging_freq'] is not None:)

@angela97lin yep I think so!

Perhaps the quickest way to resolve this is to write a unit test where we try various combos of all these params, run LightGBMClassifier.fit on a tiny dataset each time, and assert there are no warnings in warnings or in the output?

Right that should do it. I think if bagging_freq is None, then it should ignore bagging_fraction, but I do think the safest way to do it would be to just set subsample/subsample_freq to none as long as either of the bagging args are set. Sorry, didn't see this earlier!

evalml/tests/automl_tests/test_automl.py

freddyaboulton

@angela97lin Looks good! Thanks for making the change :) I left a couple of questions but nothing blocking hehe.

dsherry

Cool! I left some questions, will approve when we resolve those conversations

evalml/pipelines/components/ensemble/stacked_ensemble_base.py

dsherry · 2020-11-05T20:59:48Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

        # avoid lightgbm warnings having to do with parameter aliases
-        if lg_parameters['bagging_freq']:
+        if lg_parameters['bagging_freq'] is not None:


Right, so I thought this was basically patching a bug in lightgbm. subsample is supposed to be an alias for bagging_fraction, and subsample_freq is supposed to be an alias for bagging_freq.

What I'm remembering is that @bchen1116 found that when you set bagging_fraction or bagging_freq, you have to set both subsample and subsample_freq to None in order to avoid a warning. (Is that right? I may be off by a little bit)

If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.

evalml/pipelines/components/utils.py

dsherry · 2020-11-06T14:56:49Z

@angela97lin let's merge #1413 to tweak codecov and then retrigger the unit tests on this PR and see if that unblocks codecov. If not, I can definitely merge this 😅

dsherry · 2020-11-06T22:23:49Z

@angela97lin let's merge #1413 to tweak codecov and then retrigger the unit tests on this PR and see if that unblocks codecov. If not, I can definitely merge this 😅

Looks like it worked 🎊

angela97lin · 2020-11-09T18:29:38Z

@dsherry Did some testing, looks like it's not a pytest thing but rather, LightGBM since I'm not able to redirect the output from stdout:

from io import StringIO
import sys

import io
from contextlib import redirect_stdout, redirect_stderr

with io.StringIO() as buf, redirect_stdout(buf):
    print('redirected')
    clf = LightGBMClassifier(bagging_freq=1, bagging_fraction=0.5)
    clf.fit(X, y)
    output = buf.getvalue() # Is empty string

(I also tried with redirect_stderr.)

Really odd, I'm not sure then how the LightGBM warnings are being printed then, but I'm going to remove the tests that were testing pytest.warnings since that's not relevant here and merge this.

dsherry

🚢 ! Thanks for giving the warning coverage a shot, too bad lightgbm won't work with that 🤷‍♂️

angela97lin · 2020-11-09T18:42:06Z

🚢 ! Thanks for giving the warning coverage a shot, too bad lightgbm won't work with that 🤷‍♂️

Yeah, I guess it's not a big issue for now--if we continue to see similar warnings, we can revisit this!

angela97lin added 3 commits October 30, 2020 21:02

fix stacked ensemble

3027f5e

remove raise line

0cdba05

fix lightgbm

e3bb15d

angela97lin added this to the November 2020 milestone Oct 31, 2020

angela97lin self-assigned this Oct 31, 2020

angela97lin commented Oct 31, 2020

View reviewed changes

evalml/pipelines/components/utils.py Show resolved Hide resolved

angela97lin added 3 commits October 31, 2020 17:13

release notes

f5d5aeb

update test

dd4fefe

update comment

6ec025c

angela97lin commented Nov 1, 2020

View reviewed changes

evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved

add back for codecov

c284aa8

angela97lin marked this pull request as ready for review November 2, 2020 15:35

angela97lin requested review from dsherry, freddyaboulton, bchen1116 and jeremyliweishih November 2, 2020 15:35

empty for codecov

d388e9a

alteryx deleted a comment from codecov bot Nov 2, 2020

remove spacing; dummy change to retrigger

9abbe9d

bchen1116 approved these changes Nov 2, 2020

View reviewed changes

bchen1116 mentioned this pull request Nov 2, 2020

fix lightgbm warnings #1369

Closed

angela97lin added 2 commits November 2, 2020 13:48

Merge branch 'main' into ange_fix_stacked_ensemble

b884353

Merge branch 'main' into ange_fix_stacked_ensemble

1f7ff5b

freddyaboulton approved these changes Nov 4, 2020

View reviewed changes

dsherry reviewed Nov 5, 2020

View reviewed changes

angela97lin mentioned this pull request Nov 6, 2020

Clean up returned Codegen code #1371

Merged

freddyaboulton mentioned this pull request Nov 6, 2020

Try rounding codecov down instead of up #1413

Merged

angela97lin added 4 commits November 6, 2020 10:29

Merge branch 'main' into ange_fix_stacked_ensemble

638c2d8

Merge branch 'main' into ange_fix_stacked_ensemble

2a2d561

update via comments

c2f9e90

Merge branch 'main' into ange_fix_stacked_ensemble

2c8c2a4

dsherry approved these changes Nov 9, 2020

View reviewed changes

remove tests using warnings

dd49ed0

angela97lin merged commit cf50801 into main Nov 9, 2020

angela97lin deleted the ange_fix_stacked_ensemble branch November 9, 2020 18:56

dsherry mentioned this pull request Nov 24, 2020

Release v0.16.0 #1468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388

Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388

angela97lin commented Oct 31, 2020

angela97lin commented Nov 2, 2020

angela97lin commented Nov 2, 2020 •

edited

Loading

codecov bot commented Nov 2, 2020 •

edited

Loading

bchen1116 left a comment

bchen1116 Nov 2, 2020

angela97lin Nov 3, 2020

dsherry Nov 5, 2020

angela97lin Nov 5, 2020

dsherry Nov 5, 2020

bchen1116 Nov 5, 2020

freddyaboulton left a comment

dsherry left a comment

dsherry Nov 5, 2020

dsherry commented Nov 6, 2020

dsherry commented Nov 6, 2020

angela97lin commented Nov 9, 2020

dsherry left a comment

angela97lin commented Nov 9, 2020

Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388

Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388

Conversation

angela97lin commented Oct 31, 2020

angela97lin commented Nov 2, 2020

angela97lin commented Nov 2, 2020 • edited Loading

codecov bot commented Nov 2, 2020 • edited Loading

Codecov Report

bchen1116 left a comment

Choose a reason for hiding this comment

bchen1116 Nov 2, 2020

Choose a reason for hiding this comment

angela97lin Nov 3, 2020

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

angela97lin Nov 5, 2020

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

bchen1116 Nov 5, 2020

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

dsherry commented Nov 6, 2020

dsherry commented Nov 6, 2020

angela97lin commented Nov 9, 2020

dsherry left a comment

Choose a reason for hiding this comment

angela97lin commented Nov 9, 2020

angela97lin commented Nov 2, 2020 •

edited

Loading

codecov bot commented Nov 2, 2020 •

edited

Loading