Fix 'RF' error for LightGBM Classifier #1302

bchen1116 · 2020-10-13T20:17:38Z

Add in bagging_freq and bagging_fraction parameters for LightGBM classifier
Set num_leaves hyperparameter to start at 2 rather than 1 since the LightGBM's expecting a value > 1

codecov · 2020-10-13T20:21:17Z

Codecov Report

Merging #1302 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1302   +/-   ##
=======================================
  Coverage   99.94%   99.94%           
=======================================
  Files         213      213           
  Lines       13357    13387   +30     
=======================================
+ Hits        13349    13379   +30     
  Misses          8        8

Impacted Files	Coverage Δ
...ents/estimators/classifiers/lightgbm_classifier.py	`100.00% <100.00%> (ø)`
evalml/tests/component_tests/test_components.py	`100.00% <100.00%> (ø)`
...alml/tests/component_tests/test_lgbm_classifier.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d254e31...b3e8b40. Read the comment docs.

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

dsherry

@bchen1116 thanks for digging into this! I understand 1 of the 2 bug fixes. I left a comment asking for an explanation of the 2nd, just so I can follow along. I also left a couple questions/suggestions about how to set up the new default parameters. Approved pending resolution of those conversations.

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

dsherry · 2020-10-14T21:40:08Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

@@ -30,7 +30,7 @@ class LightGBMClassifier(Estimator):
    SEED_MIN = 0
    SEED_MAX = SEED_BOUNDS.max_bound

-    def __init__(self, boosting_type="gbdt", learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, n_jobs=-1, random_state=0, **kwargs):
+    def __init__(self, boosting_type="gbdt", learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, n_jobs=-1, random_state=0, bagging_fraction=0.9, bagging_freq=0, **kwargs):


Why default bagging_freq to 0? Won't that cause the bug when boosting_type="rf"? What default does lightgbm choose for this parameter?

LightGBM defaults to 0 for bagging_freq. Users can set it to 1 and change bagging_fraction if they want to speed up computation and randomly select data for other boosting types, but it's required to be 1 for boosting_type=rf (along with 0 < bagging_fraction < 1.0).

Got it. This looks good. Is 0.9 the default bagging_fraction in lightgbm?

@dsherry it defaults to 1.0

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

evalml/tests/component_tests/test_lgbm_classifier.py

dsherry · 2020-10-14T21:43:56Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

-                      "n_jobs": n_jobs}
+                      "n_jobs": n_jobs,
+                      "bagging_freq": bagging_freq,
+                      "bagging_fraction": bagging_fraction}


@bchen1116 could you please explain why adding these two parameters fixed the bug?

As some background, LightGBM has 4 boosting types: "gbdt", "dart", "goss", "rf". Bagging_freq refers to the frequency of bagging, where it bags every bagging_freq = k iterations (0 means it doesn't bag). bagging_fraction refers to the amount of data randomly selected without resampling (1 means select all, 0 means none). This can help speed up the training process.

The default bagging_freq that LightGBM sets is 0, which works with gbdt, dart, and goss. However, for rf, since its random forest, LightGBM requires that it uses bagging, which means bagging_freq must be 1 and bagging_fraction must be set to be below 1.0. By adding those two parameters and changing bagging_freq when the boosting_type=rf, we do a simple fix to avoid this bug.

Thanks for the clear explanation! That makes sense.

Can we tweak the comment you left on line 48:

if the boosting type is random forest, bagging is required by lightgbm, so we set bagging_freq to 1 in order to avoid errors

dsherry

@bchen1116 looks great!

dsherry · 2020-10-20T15:15:10Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

-                      "n_jobs": n_jobs}
+                      "n_jobs": n_jobs,
+                      "bagging_freq": bagging_freq,
+                      "bagging_fraction": bagging_fraction}


Thanks for the clear explanation! That makes sense.

Can we tweak the comment you left on line 48:

if the boosting type is random forest, bagging is required by lightgbm, so we set bagging_freq to 1 in order to avoid errors

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

dsherry · 2020-10-20T15:16:39Z

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py

@@ -30,7 +30,7 @@ class LightGBMClassifier(Estimator):
    SEED_MIN = 0
    SEED_MAX = SEED_BOUNDS.max_bound

-    def __init__(self, boosting_type="gbdt", learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, n_jobs=-1, random_state=0, **kwargs):
+    def __init__(self, boosting_type="gbdt", learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, n_jobs=-1, random_state=0, bagging_fraction=0.9, bagging_freq=0, **kwargs):


Got it. This looks good. Is 0.9 the default bagging_fraction in lightgbm?

bchen1116 · 2020-10-20T17:35:01Z

LightGBM doesn't have smart defaults for bagging_freq and bagging_fraction, so when boosting_type=rf, we have to manually set bagging_fraction < 1.0 with bagging_freq=0. We choose bagging_fraction = 0.9, although we didn't test for an 'ideal' value, nor did LightGBM have a recommended value, so this number can be updated whenever necessary.

dsherry · 2020-10-20T19:38:04Z

Thanks @bchen1116 . Yep, agreed, we may be able to find a better default for the value of 0.9 for bagging_fraction. Having this value for now is preferable to having more "magic" behavior where the actual value passed to lightgbm changes.

set bagging for rf

ee0b36a

bchen1116 self-assigned this Oct 13, 2020

bchen1116 added 8 commits October 13, 2020 16:21

update release notes

fdb2153

fix lint

111fcbe

fix num_leaves bug

94cbf69

update test

cd68682

Merge branch 'main' into bc_1251_rf

d14006c

update implmentation

ae59831

Merge branch 'main' into bc_1251_rf

c6c491c

fix test

78e40d5

bchen1116 commented Oct 14, 2020

View reviewed changes

evalml/pipelines/components/estimators/classifiers/lightgbm_classifier.py Show resolved Hide resolved

bchen1116 marked this pull request as ready for review October 14, 2020 15:50

bchen1116 requested review from dsherry, angela97lin, freddyaboulton, christopherbunn, eccabay and jeremyliweishih October 14, 2020 15:50

dsherry approved these changes Oct 14, 2020

View reviewed changes

bchen1116 added 2 commits October 19, 2020 10:11

update tests

492b248

Merge branch 'main' into bc_1251_rf

d03ffc5

dsherry approved these changes Oct 20, 2020

View reviewed changes

fix release notes

b3e8b40

bchen1116 merged commit 32a8b0c into main Oct 20, 2020

bchen1116 mentioned this pull request Oct 20, 2020

LightGBM bug: LightGBMError: Check failed: (num_leaves) > (1) at /Users/runner/work/1/s/python-package/compile/src/io/config_auto.cpp, line 318 . #1267

Closed

dsherry mentioned this pull request Oct 20, 2020

Integrate ensemble methods in AutoML #1253

Merged

angela97lin mentioned this pull request Oct 21, 2020

LightGBM emits warnings for subsample_freq and subsample #1330

Closed

dsherry mentioned this pull request Oct 29, 2020

Release v0.15.0 #1370

Merged

freddyaboulton deleted the bc_1251_rf branch May 13, 2022 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 'RF' error for LightGBM Classifier #1302

Fix 'RF' error for LightGBM Classifier #1302

bchen1116 commented Oct 13, 2020 •

edited

Loading

codecov bot commented Oct 13, 2020 •

edited

Loading

dsherry left a comment

dsherry Oct 14, 2020

bchen1116 Oct 14, 2020

dsherry Oct 20, 2020

bchen1116 Oct 20, 2020

dsherry Oct 14, 2020

bchen1116 Oct 14, 2020

dsherry Oct 20, 2020

dsherry left a comment

dsherry Oct 20, 2020

dsherry Oct 20, 2020

bchen1116 commented Oct 20, 2020 •

edited

Loading

dsherry commented Oct 20, 2020

Fix 'RF' error for LightGBM Classifier #1302

Fix 'RF' error for LightGBM Classifier #1302

Conversation

bchen1116 commented Oct 13, 2020 • edited Loading

codecov bot commented Oct 13, 2020 • edited Loading

Codecov Report

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bchen1116 commented Oct 20, 2020 • edited Loading

dsherry commented Oct 20, 2020

bchen1116 commented Oct 13, 2020 •

edited

Loading

codecov bot commented Oct 13, 2020 •

edited

Loading

bchen1116 commented Oct 20, 2020 •

edited

Loading