Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up catboost fit: change catboost default and automl parameter ranges #998

Merged
merged 5 commits into from Jul 31, 2020

Conversation

dsherry
Copy link
Contributor

@dsherry dsherry commented Jul 31, 2020

Fix #979

Two changes to speed up catboost fit times:

  • Update catboost estimator default value for n_estimators
  • Lower min/max automl ranges for n_estimators and max_depth

Performance results and discussion here.

Hopefully we can do more tuning after this and open these ranges up some more.

We have other features on the roadmap which will help with this too.

@codecov
Copy link

codecov bot commented Jul 31, 2020

Codecov Report

Merging #998 into main will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #998   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files         179      179           
  Lines        9424     9424           
=======================================
  Hits         9411     9411           
  Misses         13       13           
Impacted Files Coverage Δ
...ents/estimators/classifiers/catboost_classifier.py 100.00% <100.00%> (ø)
...onents/estimators/regressors/catboost_regressor.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83cd394...20a7962. Read the comment docs.

@dsherry dsherry marked this pull request as ready for review July 31, 2020 13:30
"eta": Real(0.000001, 1),
"max_depth": Integer(1, 16),
"max_depth": Integer(4, 10),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to note we already had the regressor set with this range, but not the classifier.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Its going to be fun with more experimentation later on!

@dsherry
Copy link
Contributor Author

dsherry commented Jul 31, 2020

I was just testing #337 with the titanic dataset. I noticed that on main, the first batch took 14sec to fit, ~13sec of which was catboost. On ds_979_catboost_ranges_uniform, the first batch took ~3sec total. In both runs, catboost was the best model from the first batch, and got a pretty similar F1 score on the holdout: 0.953 before vs 0.948. Since that dataset is small I'd call that an even match.

Copy link
Contributor

@ctduffy ctduffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the analysis w the graphs and all of the data sets-excited to see the perf testing being used to change parameters to speed up evalml!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Very cool analysis!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dsherry dsherry merged commit 86bf8c3 into main Jul 31, 2020
@angela97lin angela97lin mentioned this pull request Jul 31, 2020
@freddyaboulton freddyaboulton deleted the ds_979_catboost_ranges_uniform branch May 13, 2022 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automl: first-batch catboost is slow to fit on dataset
5 participants