Fix stacked ensemble and LightGBM errors in AutoMLSearch#1388
Fix stacked ensemble and LightGBM errors in AutoMLSearch#1388angela97lin merged 16 commits intomainfrom
Conversation
|
Ah @bchen1116 I see you put up #1369 for review--our approach to getting rid of the warnings are different, but perhaps you have more context about what to do / whether I'm missing something? I could back out the LightGBM changes in favor for #1369 :) |
|
Codecov passes here: https://codecov.io/gh/alteryx/evalml/commit/d388e9a957332b5cdecf036633b523db8c2ebe4a/graphs Edit: Looking at the regenerated codecov report, codecov fails because the total number of lines has decreased, so the percentage coverage is also decreasing. 😬 |
Codecov Report
@@ Coverage Diff @@
## main #1388 +/- ##
=========================================
- Coverage 100.0% 100.0% -0.0%
=========================================
Files 214 214
Lines 14107 14073 -34
=========================================
- Hits 14100 14066 -34
Misses 7 7
Continue to review full report at Codecov.
|
bchen1116
left a comment
There was a problem hiding this comment.
LGTM! didn't run into the errors when I tried to repro!
For LightGBM, We just have different implementations that result in the same end behavior, so I'm fine closing my PR in exchange of this. I did leave a comment on always leaving subsample=None
| lg_parameters['bagging_fraction'] = 1 | ||
| # avoid lightgbm warnings having to do with parameter aliases | ||
| if lg_parameters['bagging_freq']: | ||
| if lg_parameters['bagging_freq'] is not None: |
There was a problem hiding this comment.
@angela97lin I think the if/elif sections are fine, since for rf, we need to do bagging, whereas for goss, lightgbm doesn't accept bagging. For the last check though, I would set subsample and subsample_freq to None by default since as long as bagging values are passed in, it'll throw the warning if subsample values aren't None.
There was a problem hiding this comment.
@bchen1116 Just to make sure I'm understanding, are you suggesting that rather than checking if bagging_freq is None, always update subsample and subsample_freq to None? :o
There was a problem hiding this comment.
Right, so I thought this was basically patching a bug in lightgbm. subsample is supposed to be an alias for bagging_fraction, and subsample_freq is supposed to be an alias for bagging_freq.
What I'm remembering is that @bchen1116 found that when you set bagging_fraction or bagging_freq, you have to set both subsample and subsample_freq to None in order to avoid a warning. (Is that right? I may be off by a little bit)
If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.
There was a problem hiding this comment.
@dsherry @bchen1116 Oooo. Based on what you guys have said, does that mean I should be checking if either bagging_freq or bagging_fraction are set to None? (currently just if lg_parameters['bagging_freq'] is not None:)
There was a problem hiding this comment.
@angela97lin yep I think so!
Perhaps the quickest way to resolve this is to write a unit test where we try various combos of all these params, run LightGBMClassifier.fit on a tiny dataset each time, and assert there are no warnings in warnings or in the output?
There was a problem hiding this comment.
Right that should do it. I think if bagging_freq is None, then it should ignore bagging_fraction, but I do think the safest way to do it would be to just set subsample/subsample_freq to none as long as either of the bagging args are set. Sorry, didn't see this earlier!
freddyaboulton
left a comment
There was a problem hiding this comment.
@angela97lin Looks good! Thanks for making the change :) I left a couple of questions but nothing blocking hehe.
dsherry
left a comment
There was a problem hiding this comment.
Cool! I left some questions, will approve when we resolve those conversations
| lg_parameters['bagging_fraction'] = 1 | ||
| # avoid lightgbm warnings having to do with parameter aliases | ||
| if lg_parameters['bagging_freq']: | ||
| if lg_parameters['bagging_freq'] is not None: |
There was a problem hiding this comment.
Right, so I thought this was basically patching a bug in lightgbm. subsample is supposed to be an alias for bagging_fraction, and subsample_freq is supposed to be an alias for bagging_freq.
What I'm remembering is that @bchen1116 found that when you set bagging_fraction or bagging_freq, you have to set both subsample and subsample_freq to None in order to avoid a warning. (Is that right? I may be off by a little bit)
If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.
|
@angela97lin let's merge #1413 to tweak codecov and then retrigger the unit tests on this PR and see if that unblocks codecov. If not, I can definitely merge this 😅 |
Looks like it worked 🎊 |
|
@dsherry Did some testing, looks like it's not a pytest thing but rather, LightGBM since I'm not able to redirect the output from stdout: (I also tried with Really odd, I'm not sure then how the LightGBM warnings are being printed then, but I'm going to remove the tests that were testing |
dsherry
left a comment
There was a problem hiding this comment.
🚢 ! Thanks for giving the warning coverage a shot, too bad lightgbm won't work with that 🤷♂️
Yeah, I guess it's not a big issue for now--if we continue to see similar warnings, we can revisit this! |
Closes #1376