-
Notifications
You must be signed in to change notification settings - Fork 89
Delete data check code from AutoML search #1935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1935 +/- ##
=========================================
- Coverage 100.0% 100.0% -0.0%
=========================================
Files 273 273
Lines 22406 22288 -118
=========================================
- Hits 22400 22282 -118
Misses 6 6
Continue to review full report at Codecov.
|
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### High Variance Cross-Validation Scores\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, forgot to delete this before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin The code changes look great! I think we should synch about whether or not we should delete the data checks section from the automl user guide before merging!
@@ -142,17 +142,6 @@ | |||
"print(regression_objective.name)" | |||
] | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should mention data checks in the automl user guide.
If I'm following the intention of the data check actions epic, we're moving from "AutoML will check your data for you" to "Evalml has functionality to check your data and recommend fixes before you start AutoML".
I think it's important to document the relationship between data checks and automl and that it makes sense to document this relationship here.
So I think we should:
- Add back the data checks section (changing it a bit to make it clear that
AutoMLSearch
is not running the checks) - Import
DefaultDataChecks
and run on the data. - Say that the results of the data checks don't reveal anything that would cause problems so we can run search.
- Link to the data checks user guide page for more information
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I like how this sounds... my only qualm about this is it would make sense for me that the AutoML user guide currently focuses on the functionality within AutoMLSearch (we currently list out error callbacks, passing in custom pipelines, etc.). This seems more like a flow of how the user should use EvalML.
I guess TLDR is: for what our docs currently look like, I like this change. Maybe this also means we should separate out our docs and not shove everything into the AutoML guide; we could have a intro user guide that steps through the steps a user would take, and then a dedicated AutoML guide that deep-dives into AutoMLSearch specifics.
Will update this, thank you for the great feedback @freddyaboulton!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Agree this should be a breaking change and should discuss what happens with automl.ipynb
Addressed requested changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks great! This change add two uncovered lines to our repo.
Can you please add coverage in test_data_checks.py
It should be quick but we can also handle that in a separate issue.
In my opinion, only the first uncovered line is important: raise ValueError(f"Parameter data_checks must be a list. Received {type(data_check_classes).__name__}.")
. We're probably going to get rid of the AutoMLDataChecks
class (I can file an issue) since AutoMLSearch no longer knows about data checks so the second uncovered line is not high priority.
"source": [ | ||
"### Data Checks\n", | ||
"\n", | ||
"Before calling `AutoMLSearch.search`, we should run some sanity checks on our data to ensure that the input data being passed will not run into some common issues before running a potentially time-consuming search. EvalML has various data checks that makes this easy. Each data check will return a collection of warnings and errors if it detects potential issues with the input data. This allows users to inspect their data to avoid confusing errors that may arise during the search process. You can learn about each of the data checks available through our [data checks guide](data_checks.ipynb) \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Thanks for catching this! Sucks that codecov let me slip this by but I added a test and just decided to remove |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Since there were no warnings or errors returned, we can safely continue with the search process." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Gonna be really cool when we can replace this with the data check actions :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeeeed 🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Closes #1882