-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoMLSearch API update #871
Conversation
Codecov Report
@@ Coverage Diff @@
## master #871 +/- ##
==========================================
+ Coverage 99.75% 99.76% +0.01%
==========================================
Files 195 193 -2
Lines 8585 8603 +18
==========================================
+ Hits 8564 8583 +19
+ Misses 21 20 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy this is awesome! I left a bunch of comments. LMK if you wanna talk about any of it.
I like the new tests you added. May have suggestions for more in the next round.
@clara reminder to please connect this with the parent issue and move to Review when in review! :) |
@dsherry sorry about that! I thought I had connected it to the issue, but I suppose I was mistaken. I will be more diligent with updating the Zenhub status in the future. It should be ready for more review now. |
else: | ||
self.cv = StratifiedKFold(n_splits=3, random_state=random_state, shuffle=True) | ||
else: | ||
self.cv = cv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -327,14 +393,16 @@ def _check_stopping_condition(self, start): | |||
return False | |||
return should_continue | |||
|
|||
def _check_multiclass(self, y): | |||
if y.nunique() <= 2: | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of removing this check in favor of our DataChecks
API. But before we do let's confirm that our DefaultDataChecks
will catch the case when there's 1 unique value. @ctduffy could you please make sure that is currently true and that we have unit test coverage for that? It'd be ok to do so in a separate PR if changes are needed. @angela97lin would have context on this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry here, do you mean the case where, for example, all of the y label values are all one single value (say, 0), to check if there will be an error when search is called on that (say, in a binary search)? If yes, I tried this and there is a strange error that occurs that comes from outside of evalml. Just want to clarify that this is what you mean before I put up an issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy , yep, the case here is: what happens if the target contains only a single unique value, i.e. np.array([42, 42, 42, 42])
? I'd want us to add/augment a DataCheck
to raise a good error for this case.
+1 for filing as a separate issue and doing it in a separate PR. But we should aim to get the fix for that in before releasing. Please add the issue to the June milestone.
The fix won't be too hard! We just need to figure out if we wanna add a new data check to our defaults, or update an existing one. We do have a target checker in place. We can discuss on the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry Cool, just created this as issue #889.
I think now that this has been created, all of your comments have been addressed, and this is likely ready to merge. I will go through to make sure all of your recent comments have been addressed, but let me know if you see anything or think its good to go
|
||
for pipeline in self.allowed_pipelines: | ||
if not pipeline.problem_type == self.problem_type: | ||
raise ValueError("Given pipeline {} is not compatible with problem_type {}.".format(pipeline.name, self.problem_type.value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks, these checks are helpful. Do we have unit test coverage for each of the ValueError
s here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy wow, great job on this!! The unit tests you added are 💯 👍 And the validation logic is simple and helpful!
I left a few comments. The only ones I think are blocking are the one about the unique
check we're removing, moving get_default_objective
to private/inside AutoMLSearch
.
No prob :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy yeah good to go! Left one more cleanup comment
evalml/automl/automl_search.py
Outdated
_MAX_NAME_LEN = 40 | ||
|
||
# Necessary for "Plotting" documentation, since Sphinx does not work well with instance attributes. | ||
plot = PipelineSearchPlots | ||
|
||
_DEFAULT_OBJECTIVES = {'binary': get_objective('log_loss_binary'), | ||
'multiclass': get_objective('log_loss_multi'), | ||
'regression': get_objective('r2')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call get_objective in this dict since we do so already in init
fixes #825, combines AutoRegressionSearch and AutoClassificationSearch into one call with an argument that dictates the problem type.