-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AutoBase with data_checks
parameter
#765
Conversation
Codecov Report
@@ Coverage Diff @@
## master #765 +/- ##
=======================================
Coverage 99.51% 99.51%
=======================================
Files 150 150
Lines 5781 5781
=======================================
Hits 5753 5753
Misses 28 28 Continue to review full report at Codecov.
|
this may be known, but wanted to chime in case it slipped through. I was looking at the docs and it appears the text doesn't match the code output. it says Beyond that, do we really want the the search to quit if the data check simply raises a warning? |
@kmax12 Yup, thanks for chiming in! I think when @dsherry reviewed, we came to the conclusion that it'd be better--now that we have some basic logger functionality in--to just warn for warnings and raise an error if there are any errors. This differs from our previous design, which was to raise a ValueError if any warnings or errors occurred, so I still need to go back and update the examples to use data that won't error out during the search! |
Yeah @kmax12 when I saw the code which said no tests will run if there's any warnings, I got cold feet, lol. I think part of the value of distinguishing between errors and warnings is that an error feels serious enough to interrupt the search, but a warning should just be surfaced to the user. As @angela97lin said we can always change our minds on this later without any breaking changes when we have user feedback. |
Thanks for the comments--I've addressed all of them, but would love if you guys could take one last pass before I merge it in. Here's the new doc I added in this PR: https://evalml.featurelabs.com/en/709_autobase/guardrails/data_checks.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything looks good, but 2 suggestions
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If you'd prefer not to run any data checks before running search, you can provide an `EmptyDataChecks` instance to `search()` instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if you could disable data checks with data_checks=False
? Seem a bit weird you have to import a class to turn something off. and then just get rid of EmptyDataChecks
from our codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmax12 Yeah, I agree with this sentiment that it's weird to need to import to disable haha @dsherry and I discussed this previously and thought that it'd be best to introduce something like EmptyDataChecks
because its the most consistent (always either None / DataChecks object), rather than introducing bool/DataCheck object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other option could to to default to data_checks=True
and then pass in data_checks=None
to disable. I'm inclined to go this way because it's less ambiguous. It's not intuitive that data_checks=None
would give you the default datachecks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's okay to have mixed types with the bool for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm so are you suggesting:
data_checks = True
--> will runDefaultDataChecks
data_checks = SomeDataChecksClass
--> will runSomeDataChecksClass
data_checks = None
--> will not run anything at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that is my suggestion. one amendment would be False
would also not run any thing.
before we finalize, let's see what @dsherry thinks since I know you two already spoke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there's probably a way to make this API easier to use. I'm not sure how I feel about mixing bool/None/class in one arg. If we were going to do that, we could also use string params instead of a bool, i.e. 'disabled' or 'default' or something.
The current interface satisfies the requirements, so let's merge this PR and discuss separately, perhaps in the epic #49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed as #799 let's continue talking there
@@ -136,7 +141,7 @@ def _get_funct_name(function): | |||
|
|||
return search_desc + rankings_desc | |||
|
|||
def search(self, X, y, feature_types=None, raise_errors=True, show_iteration_plot=True): | |||
def search(self, X, y, data_checks=None, feature_types=None, raise_errors=True, show_iteration_plot=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning to this after our discussion with max on defaults, I see what he means about this being potentially misleading. Having None
as the default implies to me that nothing will happen, haha. We have this problem elsewhere too (with the tuner class after #793 ). Perhaps the way around this is to have the default be 'default'. Although at that point, it could simply be DefaultDataChecks
too.
I'll file it separately and let's get this PR merged :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ready to go to me! I answered a couple comments and left two testing suggestions. Also filed #799 for separate discussion. I say 🚢 ! 🥳
Closes #709
See https://evalml.featurelabs.com/en/709_autobase/guardrails/data_checks.html for updated docs.