-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add add_to_leaderboard()
to AutoSearchBase
#874
Conversation
Codecov Report
@@ Coverage Diff @@
## master #874 +/- ##
=======================================
Coverage 99.75% 99.75%
=======================================
Files 195 195
Lines 8532 8585 +53
=======================================
+ Hits 8511 8564 +53
Misses 21 21
Continue to review full report at Codecov.
|
@jeremyliweishih I think its fine to merge this before #719--but we should plan to get another PR up once that's merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih looks good!
So now users could do:
automl = AutoClassificationSearch(...)
automl.search(X_train, y_train)
custom_pipeline = CustomPipeline({'custom': 'parameters', ...})
autom.add_to_rankings(pipeline, X_train, y_train)
@jeremyliweishih I have a few more thoughts we should address before merging:
- I left a comment about renaming the main method to
add_to_rankings
or similar, since we don't use the word "leaderboard" anywhere. That name was my idea originally haha oops - We should add a check that the data provided here matches the training data passed to
search
, right? It feels inappropriate to allow other data to be used. Otherwise the CV scores aren't fully comparable. We could check the input data shape--in particular the number and name of the columns should be identical. We could also compute/store/compare a hash of the X and y data, but that may be too much to add for now. - Add a test to ensure passing a trained vs untrained pipeline into this method doesn't produce different results
- What happens when the user tries to add a pipeline which is identical to one which was already added? Will this create a new entry in the results structure and in
full_rankings
? Ideally it would not. Suggested change: check if the pipeline class/params have already been evaluated and return early if so. - As in
search
, what happens ifX
ory
are np/list type? Should get converted to pandas df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a comment on the impl return value and a proposal on how to simplify it. Let's close that conversation, but otherwise this is ready to go.
for parameter in pipeline_rows['parameters']: | ||
if pipeline.parameters == parameter: | ||
return | ||
self._evaluate(pipeline, X, y, raise_errors=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks
Fixes #628.