Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoMLSearch API update #871

Merged
merged 26 commits into from
Jun 26, 2020
Merged

AutoMLSearch API update #871

merged 26 commits into from
Jun 26, 2020

Conversation

ctduffy
Copy link
Contributor

@ctduffy ctduffy commented Jun 22, 2020

fixes #825, combines AutoRegressionSearch and AutoClassificationSearch into one call with an argument that dictates the problem type.

@codecov
Copy link

codecov bot commented Jun 22, 2020

Codecov Report

Merging #871 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #871      +/-   ##
==========================================
+ Coverage   99.75%   99.76%   +0.01%     
==========================================
  Files         195      193       -2     
  Lines        8585     8603      +18     
==========================================
+ Hits         8564     8583      +19     
+ Misses         21       20       -1     
Impacted Files Coverage Δ
evalml/automl/pipeline_search_plots.py 100.00% <ø> (ø)
...ts/estimators/classifiers/elasticnet_classifier.py 100.00% <ø> (ø)
...ents/estimators/classifiers/logistic_regression.py 100.00% <ø> (ø)
evalml/__init__.py 100.00% <100.00%> (ø)
evalml/automl/__init__.py 100.00% <100.00%> (ø)
evalml/automl/automl_search.py 98.32% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.00% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52e0959...34958b5. Read the comment docs.

@ctduffy ctduffy marked this pull request as ready for review June 24, 2020 14:48
@ctduffy ctduffy requested review from dsherry, angela97lin and jeremyliweishih and removed request for angela97lin June 24, 2020 14:51
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy this is awesome! I left a bunch of comments. LMK if you wanna talk about any of it.

I like the new tests you added. May have suggestions for more in the next round.

docs/source/automl/search_results.ipynb Show resolved Hide resolved
docs/source/changelog.rst Outdated Show resolved Hide resolved
docs/source/demos/fraud.ipynb Outdated Show resolved Hide resolved
docs/source/demos/fraud.ipynb Outdated Show resolved Hide resolved
evalml/automl/__init__.py Outdated Show resolved Hide resolved
evalml/automl/auto_ml_search.py Outdated Show resolved Hide resolved
evalml/automl/auto_ml_search.py Outdated Show resolved Hide resolved
evalml/automl/auto_ml_search.py Outdated Show resolved Hide resolved
evalml/automl/auto_search_base.py Outdated Show resolved Hide resolved
evalml/automl/auto_search_base.py Outdated Show resolved Hide resolved
docs/source/changelog.rst Outdated Show resolved Hide resolved
docs/source/automl/pipeline_search.ipynb Outdated Show resolved Hide resolved
@ctduffy ctduffy requested a review from dsherry June 25, 2020 19:04
@dsherry
Copy link
Contributor

dsherry commented Jun 25, 2020

@clara reminder to please connect this with the parent issue and move to Review when in review! :)

@ctduffy
Copy link
Contributor Author

ctduffy commented Jun 26, 2020

@dsherry sorry about that! I thought I had connected it to the issue, but I suppose I was mistaken. I will be more diligent with updating the Zenhub status in the future. It should be ready for more review now.

else:
self.cv = StratifiedKFold(n_splits=3, random_state=random_state, shuffle=True)
else:
self.cv = cv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay heads up this will cause a conflict in your PR #877 , but I think you can just delete it in favor of your changes to move CV to search :)

@@ -327,14 +393,16 @@ def _check_stopping_condition(self, start):
return False
return should_continue

def _check_multiclass(self, y):
if y.nunique() <= 2:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of removing this check in favor of our DataChecks API. But before we do let's confirm that our DefaultDataChecks will catch the case when there's 1 unique value. @ctduffy could you please make sure that is currently true and that we have unit test coverage for that? It'd be ok to do so in a separate PR if changes are needed. @angela97lin would have context on this too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry here, do you mean the case where, for example, all of the y label values are all one single value (say, 0), to check if there will be an error when search is called on that (say, in a binary search)? If yes, I tried this and there is a strange error that occurs that comes from outside of evalml. Just want to clarify that this is what you mean before I put up an issue for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy , yep, the case here is: what happens if the target contains only a single unique value, i.e. np.array([42, 42, 42, 42]) ? I'd want us to add/augment a DataCheck to raise a good error for this case.

+1 for filing as a separate issue and doing it in a separate PR. But we should aim to get the fix for that in before releasing. Please add the issue to the June milestone.

The fix won't be too hard! We just need to figure out if we wanna add a new data check to our defaults, or update an existing one. We do have a target checker in place. We can discuss on the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Cool, just created this as issue #889.
I think now that this has been created, all of your comments have been addressed, and this is likely ready to merge. I will go through to make sure all of your recent comments have been addressed, but let me know if you see anything or think its good to go


for pipeline in self.allowed_pipelines:
if not pipeline.problem_type == self.problem_type:
raise ValueError("Given pipeline {} is not compatible with problem_type {}.".format(pipeline.name, self.problem_type.value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks, these checks are helpful. Do we have unit test coverage for each of the ValueErrors here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do! 🎉

evalml/objectives/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy wow, great job on this!! The unit tests you added are 💯 👍 And the validation logic is simple and helpful!

I left a few comments. The only ones I think are blocking are the one about the unique check we're removing, moving get_default_objective to private/inside AutoMLSearch.

@dsherry
Copy link
Contributor

dsherry commented Jun 26, 2020

@dsherry sorry about that! I thought I had connected it to the issue, but I suppose I was mistaken. I will be more diligent with updating the Zenhub status in the future. It should be ready for more review now.

No prob :)

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy yeah good to go! Left one more cleanup comment

_MAX_NAME_LEN = 40

# Necessary for "Plotting" documentation, since Sphinx does not work well with instance attributes.
plot = PipelineSearchPlots

_DEFAULT_OBJECTIVES = {'binary': get_objective('log_loss_binary'),
'multiclass': get_objective('log_loss_multi'),
'regression': get_objective('r2')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call get_objective in this dict since we do so already in init

@ctduffy ctduffy merged commit 1e2fcb5 into master Jun 26, 2020
@ctduffy ctduffy deleted the 825-autosearchml branch June 26, 2020 17:45
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update automl search API: AutoMLSearch class
4 participants