New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1570 AutoML: pass number of CV folds to default data checks #1619
Conversation
@@ -328,7 +328,7 @@ def _validate_data_checks(self, data_checks): | |||
return AutoMLDataChecks(data_checks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this PR is for the DefaultDataChecks, I spent a while trying to figure out how we might pass it to DataChecks or AutoMLDataChecks. It becomes a bigger issue in which we have to check and see if ClassImbalanceDataCheck is one of the Data Checks, then determine what parameters were passed by the user, and if num_cv_folds
is one of them. Then overwriting that to reflect self.data_splitter.get_n_splits()
. There's probably an easier way I'm sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to only make the change for DefaultDataChecks
! If the user is providing their own data checks, it's assumed they init them with the appropriate parameters!
# Conflicts: # evalml/automl/automl_search.py # evalml/tests/automl_tests/test_automl.py
# Conflicts: # evalml/automl/automl_search.py # evalml/tests/automl_tests/test_automl.py
Codecov Report
@@ Coverage Diff @@
## main #1619 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18247 18255 +8
=========================================
+ Hits 18239 18247 +8
Misses 8 8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about testing: We want to confirm that the data check got the appropriate parameter!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh Looks good! I just have a small suggestion on how to improve the unit test. Implementation looks great :)
@@ -328,7 +328,7 @@ def _validate_data_checks(self, data_checks): | |||
return AutoMLDataChecks(data_checks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to only make the change for DefaultDataChecks
! If the user is providing their own data checks, it's assumed they init them with the appropriate parameters!
@@ -412,6 +413,15 @@ def test_automl_bad_data_check_parameter_type(): | |||
automl.search(data_checks=[MockDataCheckErrorAndWarning]) | |||
|
|||
|
|||
def test_validate_data_check_n_splits(X_y_multi): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better way of testing this is to run search(data_checks='auto')
on a dataset where one of the classes only has 7 observations.
That way we don't have to call the private _validate_data_checks
or rely on the fact that the ClassImbalanceDataCheck
is the last one in the list or that cv_folds
actually corresponds to 2 * n_splits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I got it, so you're recommending I validate that the parameter is being passed by triggering the ClassImbalanceDataCheck error, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, thanks for updating this! LGTM
Pull Request Description
Fixes #1570