-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stacked ensemble and LightGBM errors in AutoMLSearch #1388
Conversation
Ah @bchen1116 I see you put up #1369 for review--our approach to getting rid of the warnings are different, but perhaps you have more context about what to do / whether I'm missing something? I could back out the LightGBM changes in favor for #1369 :) |
Codecov passes here: https://codecov.io/gh/alteryx/evalml/commit/d388e9a957332b5cdecf036633b523db8c2ebe4a/graphs Edit: Looking at the regenerated codecov report, codecov fails because the total number of lines has decreased, so the percentage coverage is also decreasing. 😬 |
Codecov Report
@@ Coverage Diff @@
## main #1388 +/- ##
=========================================
- Coverage 100.0% 100.0% -0.0%
=========================================
Files 214 214
Lines 14107 14073 -34
=========================================
- Hits 14100 14066 -34
Misses 7 7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! didn't run into the errors when I tried to repro!
For LightGBM, We just have different implementations that result in the same end behavior, so I'm fine closing my PR in exchange of this. I did leave a comment on always leaving subsample=None
# avoid lightgbm warnings having to do with parameter aliases | ||
if lg_parameters['bagging_freq']: | ||
if lg_parameters['bagging_freq'] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I think the if/elif
sections are fine, since for rf
, we need to do bagging, whereas for goss
, lightgbm doesn't accept bagging. For the last check though, I would set subsample
and subsample_freq
to None by default since as long as bagging values are passed in, it'll throw the warning if subsample values aren't None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 Just to make sure I'm understanding, are you suggesting that rather than checking if bagging_freq
is None, always update subsample
and subsample_freq
to None
? :o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so I thought this was basically patching a bug in lightgbm. subsample
is supposed to be an alias for bagging_fraction
, and subsample_freq
is supposed to be an alias for bagging_freq
.
What I'm remembering is that @bchen1116 found that when you set bagging_fraction
or bagging_freq
, you have to set both subsample
and subsample_freq
to None
in order to avoid a warning. (Is that right? I may be off by a little bit)
If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry @bchen1116 Oooo. Based on what you guys have said, does that mean I should be checking if either bagging_freq
or bagging_fraction
are set to None? (currently just if lg_parameters['bagging_freq'] is not None:
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin yep I think so!
Perhaps the quickest way to resolve this is to write a unit test where we try various combos of all these params, run LightGBMClassifier.fit
on a tiny dataset each time, and assert there are no warnings in warnings
or in the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that should do it. I think if bagging_freq
is None, then it should ignore bagging_fraction
, but I do think the safest way to do it would be to just set subsample/subsample_freq
to none as long as either of the bagging args are set. Sorry, didn't see this earlier!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good! Thanks for making the change :) I left a couple of questions but nothing blocking hehe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I left some questions, will approve when we resolve those conversations
# avoid lightgbm warnings having to do with parameter aliases | ||
if lg_parameters['bagging_freq']: | ||
if lg_parameters['bagging_freq'] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so I thought this was basically patching a bug in lightgbm. subsample
is supposed to be an alias for bagging_fraction
, and subsample_freq
is supposed to be an alias for bagging_freq
.
What I'm remembering is that @bchen1116 found that when you set bagging_fraction
or bagging_freq
, you have to set both subsample
and subsample_freq
to None
in order to avoid a warning. (Is that right? I may be off by a little bit)
If I'm right, what @angela97lin has now should work, no? What case would that not cover? If I'm wrong, well, haha, we should do whatever avoids warnings and produces good performance from lightgbm.
@angela97lin let's merge #1413 to tweak codecov and then retrigger the unit tests on this PR and see if that unblocks codecov. If not, I can definitely merge this 😅 |
Looks like it worked 🎊 |
@dsherry Did some testing, looks like it's not a pytest thing but rather, LightGBM since I'm not able to redirect the output from stdout:
(I also tried with Really odd, I'm not sure then how the LightGBM warnings are being printed then, but I'm going to remove the tests that were testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 ! Thanks for giving the warning coverage a shot, too bad lightgbm won't work with that 🤷♂️
Yeah, I guess it's not a big issue for now--if we continue to see similar warnings, we can revisit this! |
Closes #1376