New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parameterization of data checks; have InvalidTargetDataCheck validate target using problem_type #931
Comments
@angela97lin could you please describe the use-case for this? |
@dsherry Sure! In #929, @freddyaboulton and I were discussing how it'd be nice if the Alternatively, we could create data check classes for each problem type, such as BinaryClassificationInvalidTargetDataCheck but this could get pretty hairy too, when determining what DefaultDataChecks should include (or should this too be broken down to DefaultBinaryClassificationDataChecks?) |
Just discussed with @angela97lin @freddyaboulton We like the idea of mirroring the pattern we use for
Here's a sketch of how this could look in automl search: # today this helper standardizes the input to a list of `DataCheck` instances, and wraps that in a `DataChecks` instance
# after this work, this would standardize the input to a `DataChecks` class.
# if `data_checks` was already a `DataChecks` class, do nothing. else if `data_checks` is a list of `DataCheck` classes, define a `AutoMLDataChecks` class to wrap and return that
data_checks_class = self._validate_data_checks(data_checks)
# next we create the `DataChecks` instance by passing in data checks parameters
data_check_parameters = {'Target Datatype Data Check': {'problem_type': self.problem_type}}
data_checks = data_checks_class(data_check_parameters)
data_check_results = data_checks.validate(X, y) Direct usage would look similar. Next steps
|
@dsherry The plan looks good to to me! The only thing I would add is that I prefer to augment the already existing
|
You know what, @angela97lin @freddyaboulton let's use this issue to track both a) updating automl and the data checks API to support parameterization and b) updating Mentioning because I just filed a bug #970 and on closer look the issue would be fixed by the above. So this will close #970. |
@dsherry How timely! That sounds good to me 😊 |
Fixes #970 .
Per the discussion with @freddyaboulton in #929, it would be nice if we could pass along extra information to DataChecks. This would require updating the DataCheck API and considering how it interacts with AutoML, since we do not instantiate an instance of DataChecks and only pass along a DataChecks class as a parameter to
search()
.The text was updated successfully, but these errors were encountered: