-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoSearchBase: refactor pipeline evaluation into evaluate method #762
Conversation
Codecov Report
@@ Coverage Diff @@
## master #762 +/- ##
=======================================
Coverage 99.38% 99.38%
=======================================
Files 151 151
Lines 5529 5532 +3
=======================================
+ Hits 5495 5498 +3
Misses 34 34
Continue to review full report at Codecov.
|
ed81f9d
to
2304757
Compare
@@ -20,7 +20,7 @@ class CatBoostClassifier(Estimator): | |||
hyperparameter_ranges = { | |||
"n_estimators": Integer(10, 1000), | |||
"eta": Real(0, 1), | |||
"max_depth": Integer(1, 16), | |||
"max_depth": Integer(4, 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past we've had some trouble with the performance of the catboost pipeline (#568). I think I know why: the max tree depth here was too large. Catboost recommends a range of 4 to 10. I think training time grows nonlinearly with this parameter, meaning a depth of 16 will take a lot longer to run.
Why is this relevant to this PR: for convenience/clarity, this PR changed the order of calls to the RNG, here. This happened to mean that we run catboost in the lead scoring notebook with depth=16
, whereas before we happened to not run catboost in that notebook. Catboost took 9min to train on that data with that depth. Updating the max to 10 has it training in a reasonable time. The specific change in this PR isn't critical, but once I figured out the problem I wanted to fix it.
Another argument for the perf tests being highly valuable to us! Not that we needed more evidence, haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #752, fixes #568
Update
AutoSearchBase
to have a separate evaluate method, to make it easy to evaluate pipeline scores. This will replaceAutoSearchBase._do_iteration
, but where the next pipeline and parameters are proposed inAutoSearchBase.search
instead.