Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LightGBM Regressor #1459

Merged
merged 52 commits into from
Dec 17, 2020
Merged

Add LightGBM Regressor #1459

merged 52 commits into from
Dec 17, 2020

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Nov 23, 2020

fix #296

Perf tests here
Documentation here

@bchen1116 bchen1116 self-assigned this Nov 23, 2020
@codecov
Copy link

codecov bot commented Nov 23, 2020

Codecov Report

Merging #1459 (6283a2c) into main (7e21b20) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1459     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         236      238      +2     
  Lines       16933    17136    +203     
=========================================
+ Hits        16925    17128    +203     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
...lines/components/estimators/regressors/__init__.py 100.0% <100.0%> (ø)
...onents/estimators/regressors/lightgbm_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
...valml/tests/component_tests/test_lgbm_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_utils.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <100.0%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e21b20...6283a2c. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review November 30, 2020 22:59
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 nice going! Great work debugging that weird looking glass error, haha.

I left some questions about the parameters, and some comments on the perf test doc. I think it would be cool to change the default max_depth to something greater than 0, and see how that impacts the accuracy in the first batch. It would be cool to try perf testing a couple other tweaks too but not required.

SEED_MIN = 0
SEED_MAX = SEED_BOUNDS.max_bound

def __init__(self, boosting_type="gbdt", learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, n_jobs=-1, random_state=0, bagging_fraction=0.9, bagging_freq=0, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 any particular reason for choosing these default values? It could be interesting to do some scanning on these parameters and see if they should be tweaked. max_depth=0 jumps out to me as being too low--the default in lightgbm is -1 i.e. "no limit", but I don't think we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, max_depth <= 0 results in the same behavior of setting "no limit" to the depth. I can definitely tweak it though, but the values I chose are the defaults for LightGBM, and I remember having a discussion to use these provided defaults rather than choosing arbitrary defaults on our end.

evalml/tests/component_tests/test_lgbm_regressor.py Outdated Show resolved Hide resolved
@bchen1116
Copy link
Contributor Author

Perf docs updated with new tests that determine the idea n_estimators and max_depth values.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This is great! The performance tests to see the impact of different default parameter values are awesome.

There's a lot of overlap between this class and the lgbm classifier. It'd be nice if we could share at least the logic for overriding the parameters in init and _encode_categories. What do you think? Not blocking and probably best to do in another issue anyways if you agree.

@bchen1116
Copy link
Contributor Author

@freddyaboulton definitely agree! It would be nice to run some perf tests on the LGBM Classifier and see if we can find better default values, especially since we have much more test data for classification. I think that is a good separate issue, which I've just submitted here

@bchen1116 bchen1116 merged commit a48ebfb into main Dec 17, 2020
@dsherry dsherry mentioned this pull request Dec 29, 2020
@freddyaboulton freddyaboulton deleted the bc_296_lgbm_regressor branch May 13, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add LightGBM regression component
3 participants