Update automl search API: AutoMLSearch class #825

SydneyAyx · 2020-06-01T20:20:19Z

Instantiating AutoClassificationSearch() without specifying Multiclass=True and not providing an objective results in an error.

ValueError                                Traceback (most recent call last)
<ipython-input-23-5d3db2adac13> in <module>
      1 automl = AutoClassificationSearch()
----> 2 automl.search(X, y)

~\AppData\Local\Continuum\anaconda3\envs\evalml\lib\site-packages\evalml\automl\auto_base.py in search(self, X, y, feature_types, raise_errors, show_iteration_plot)
    135 
    136         if self.problem_type != ProblemTypes.REGRESSION:
--> 137             self._check_multiclass(y)
    138 
    139         logger.log_title("Beginning pipeline search")

~\AppData\Local\Continuum\anaconda3\envs\evalml\lib\site-packages\evalml\automl\auto_base.py in _check_multiclass(self, y)
    230             return
    231         if self.objective.problem_type != ProblemTypes.MULTICLASS:
--> 232             raise ValueError("Given objective {} is not compatible with a multiclass problem.".format(self.objective.name))
    233         for obj in self.additional_objectives:
    234             if obj.problem_type != ProblemTypes.MULTICLASS:

ValueError: Given objective Log Loss Binary is not compatible with a multiclass problem.

Code to reproduce:

data = pd.read_csv("./iris.csv")
target = "class"
X = data.drop([target], axis=1)
y = data[target]
automl = AutoClassificationSearch()
automl.search(X, y)

The error message is clear and this is super easy to work around, but it is probably not the expected behavior for a user trying to use defaults and auto-model in easy mode.

The text was updated successfully, but these errors were encountered:

dsherry · 2020-06-05T21:45:19Z

Thanks @SydneyAyx ! This is great feedback to have. I agree this is nonintuitive and that we can improve it.

The reason this usage triggers an error is that the provided data is multiclass but the AutoClassificationSearch wasn't provided with multiclass=True option.

The objective and problem type are set in AutoClassificationSearch.__init__ and AutoRegressionSearch.__init__. We don't use the objective directly until search, although it does appear in __str__. We need the problem_type in AutoSearchBase.__init__ so that we can compute self.allowed_pipelines.

Options which come to mind:

Define AutoMulticlassClassificationSearch and AutoBinaryClassificationSearch instead of having the multiclass flag. This would line up well with how we're organizing our pipelines and objectives.
Delete the multiclass flag and infer whether a problem is multiclass vs binary from the provided target. @SydneyAyx provided an example of this in Automl: infer problem type from target data #826
We could move the multiclass flag and the computation of self.allowed_pipelines into search.
Do nothing.

I'm split between options 1 and 2. I don't feel great about options 3 or 4.

@kmax12 what do you think?

dsherry · 2020-06-12T15:52:04Z

Looking at this and #826 again, here's what I'd like us to do:

Go with option 1 from the list above. Define AutoMulticlassClassificationSearch and AutoBinaryClassificationSearch instead of having the multiclass flag. This would line up well with how we're organizing our pipelines and objectives. And then delete AutoClassificationSearch
After this issue is merged, we can use Automl: infer problem type from target data #826 to think about adding a helper method to infer the problem type from the target data. But I think for now, its best if the users determine the problem in advance. I'll update that issue to match.

kmax12 · 2020-06-12T19:41:52Z

rather than have 3 different class that are so long in name, what if we had one class with a required problem_type argument?

# could use enum instead, but i bet most users wouldn't
AutoMLSearch(problem_type="regression")
AutoMLSearch(problem_type="binary")
AutoMLSearch(problem_type="multiclass")

if a user has to look up to know the name of the complicated class, they can look up the parameter.

I also think this structure better presents what is going on. The searches are more similar than different, which was part of the motivation for lumping binary and multiclass together in the first place.

this also sets us up better in the future if we don't want to make problem type required. The dynamic pipeline in #841 will make this change eaiser, since we dont have to determine the pipelines at init any more.

since, we're trying to tackle this this month, lmk if talking live would be better

dsherry · 2020-06-12T22:13:09Z

That's a cool idea @kmax12 . That could be a nice simplification over what we have now. Yeah, since this API is the first thing most users will see, let's take some time and talk it over. I just sent you and @ctduffy an invite for Tues afternoon.

Worth noting that if the scope creeps on this, we may wanna get a short-term fix in for June and file the API update as a separate issue.

dsherry · 2020-06-16T19:26:48Z

@ctduffy @kmax12 and I just met to discuss. Here's our notes.

Next steps

@ctduffy write a design doc, goal is to have a draft out EOD tomorrow (2020/06/17 Weds)
Confirm implementation plan and who will implement it
- We should aim to have steps 1 and 2 (tracked by this issue) in for the June release, at minimum.
- We can do steps 3 (Automl: infer problem type from target data #826) and 4 (more validation) in the future if needed.
- Theoretically, someone could do the implementation for step 3 in parallel, meaning we could get it in for the release too. So Clara could do 1+2, and Dylan could do 3, for example
- We have 1.5 weeks to make this happen.
- Question to be resolved: will @ctduffy have enough bandwidth to meet the release deadline while also working on notebooks with @gsheni ?

dsherry · 2020-06-19T19:48:23Z

@ctduffy and I synced an hour ago. The design doc is done! Next step is @ctduffy is going to make an epic for this and #826 , and we'll get this issue done for the June milestone and the rest for July.

We estimated this issue will take 6 days to complete. So we have just enough time to get it done before the June release on Tues the 30th.

SydneyAyx added bug Issues tracking problems with existing features. good first issue Issues which would be a good starting point for new hires. labels Jun 1, 2020

dsherry removed the good first issue Issues which would be a good starting point for new hires. label Jun 5, 2020

dsherry mentioned this issue Jun 5, 2020

Automl: infer problem type from target data #826

Closed

ctduffy self-assigned this Jun 12, 2020

dsherry added this to the June 2020 milestone Jun 12, 2020

ctduffy mentioned this issue Jun 16, 2020

Autoclassification divided into Binary and Multiclass #857

Closed

dsherry mentioned this issue Jun 19, 2020

AutoML Search API update #866

Closed

ctduffy mentioned this issue Jun 22, 2020

AutoMLSearch API update #871

Merged

dsherry changed the title ~~Default Objective in AutoClassificationSearch Fails for Multiclass~~ Update automl search API: AutoMLSearch class Jun 26, 2020

dsherry added enhancement An improvement to an existing feature. and removed bug Issues tracking problems with existing features. labels Jun 26, 2020

ctduffy closed this as completed in #871 Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update automl search API: AutoMLSearch class #825

Update automl search API: AutoMLSearch class #825

SydneyAyx commented Jun 1, 2020 •

edited by kmax12

Loading

dsherry commented Jun 5, 2020 •

edited

Loading

dsherry commented Jun 12, 2020 •

edited

Loading

kmax12 commented Jun 12, 2020 •

edited

Loading

dsherry commented Jun 12, 2020 •

edited

Loading

dsherry commented Jun 16, 2020

dsherry commented Jun 19, 2020

Update automl search API: AutoMLSearch class #825

Update automl search API: AutoMLSearch class #825

Comments

SydneyAyx commented Jun 1, 2020 • edited by kmax12 Loading

dsherry commented Jun 5, 2020 • edited Loading

dsherry commented Jun 12, 2020 • edited Loading

kmax12 commented Jun 12, 2020 • edited Loading

dsherry commented Jun 12, 2020 • edited Loading

dsherry commented Jun 16, 2020

dsherry commented Jun 19, 2020

SydneyAyx commented Jun 1, 2020 •

edited by kmax12

Loading

dsherry commented Jun 5, 2020 •

edited

Loading

dsherry commented Jun 12, 2020 •

edited

Loading

kmax12 commented Jun 12, 2020 •

edited

Loading

dsherry commented Jun 12, 2020 •

edited

Loading