Update validate() API#3142
Conversation
…x/evalml into 3116_add_data_check_action_option
Codecov Report
@@ Coverage Diff @@
## update_data_check_action_API #3142 +/- ##
============================================================
Coverage 99.7% 99.7%
============================================================
Files 324 324
Lines 31232 31232
============================================================
Hits 31128 31128
Misses 104 104
Continue to review full report at Codecov.
|
ParthivNaresh
left a comment
There was a problem hiding this comment.
Changes look great! I think going from data_check_output["actions"] to data_check_output["actions"]["action_list"] is another reason that we also need to look at a cleaner way to access warnings, errors, and actions from data checks for us and users, but that's another issue entirely. Great work!
|
|
||
|
|
||
| def test_n_splits_passed_to_ts_splitting_data_check(ts_data): | ||
| def test_n_splits_passed_to_ts_splitting_data_check(): |
bchen1116
left a comment
There was a problem hiding this comment.
Left a nit that applies to multiple files, but otherwise LGTM!
Do we have a plan on filling in default_actions? Currently, they're all None, and I'm curious 1) when their values are expected to change and 2) what values they'll hold. They don't seem too useful as of now
| ... }], | ||
| ... "warnings": [], | ||
| ... "actions": []} | ||
| ... "actions": {"action_list":[], "default_action": None}} |
There was a problem hiding this comment.
Big nit: Sometimes there's a space between "action_list": and [] and sometimes there isn't. Can we standardize this to have the space? Occurs in multiple files.
There was a problem hiding this comment.
sobs in wishing there were a doc linter
eccabay
left a comment
There was a problem hiding this comment.
LGTM, just left some style nitpicks!
Out of curiosity, is there currently a plan laid out for passing useful information through the "default_action" entries?
| ... | ||
| ... |
There was a problem hiding this comment.
Are these necessary? I'm not sure I understand the details of doctest syntax.
There was a problem hiding this comment.
Don't think so, removing!
| ... 'pct_null_rows': 50.0}, | ||
| ... 'code': 'TARGET_HAS_NULL'}], | ||
| ... 'actions': [{'code': 'IMPUTE_COL', | ||
| ... 'actions': {"action_list": [{'code': 'IMPUTE_COL', |
There was a problem hiding this comment.
sobs in wishing there were a doc linter
Just went through and changed everything to double quotes :))
| ... 'rows': None, | ||
| ... 'is_target': True, | ||
| ... 'impute_strategy': 'mean'}}]} | ||
| ... 'impute_strategy': 'mean'}}], "default_action": None}} |
There was a problem hiding this comment.
Another mega nit: can the "default_action" go on a new line for readability? (Same applies in a couple other places, but this is the most confusing one I think)
| >>> target_leakage_check = TargetLeakageDataCheck(pct_corr_threshold=0.8, method='pearson') | ||
| >>> assert target_leakage_check.validate(X, y) == { | ||
| ... 'warnings': [{'message': "Columns 'leak', 'x' are 80.0% or more correlated with the target", | ||
| ... "warnings": [{'message': "Columns 'leak', 'x' are 80.0% or more correlated with the target", |
There was a problem hiding this comment.
I love that you fixed this, but now the other quotes are inconsistent 😭
There was a problem hiding this comment.
You're right, I must have done a replace-all somewhere--I'll go back and try to fix the other places 😭
|
@bchen1116 @eccabay Good question about the default action! As you're probably noticing, this PR still uses DataCheckActions as the output. The next step will be to update the internals to use |
…d add functionality to suggest and take action on columns with null values (#3182) * Update `validate()` API (#3142) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * fix doctests * fix ts splitting test and impl * fix more ts splitting tests * fix doctest for ts data check * release notes * update target leakage data check docstring for consistency * doctest linting and cleanup * linting * fix merging main issues * Update `validate()` API to use `DataCheckActionOption` instead of `DataCheckAction` (#3152) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * add more tests * Rename `HighlyNullDataCheck` to `NullDataCheck` and update data check to return impute action for non-highly null columns. (#3197) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * oops update action code * oops fix test * update wording of messages * move logic out of dict * update mode to most_frequent for impute strategies * oops fix linting and doctests * Flatten data check action ``validate`` API (#3244) * init * more cleanup * begin to clean up tests * fix more tests * updating naming to action_options and fixing more tests * fix no variance * fix another test * fixing automl tests * oops actually fix automl tests * fix the other automl tests * fixing notebook * fix data check tests * fix doctests and docs * integration tests and cleanup * cleanup based on comments * linting * clean up notebook and tests * Update `make_pipeline_from_actions` to handle null column imputation (#3237) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * init * init and fix testing * fix integration test * updates test * clean up docs * lint notebook * fix tests with types * linting * release notes * update wording * update impl for natural language and datetimes, remove old tests * fix tests * fix doctest * release notes * minor cleanup * update release notes * remove impute all * linting
…3260) * Update `validate()` API (#3142) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * fix doctests * fix ts splitting test and impl * fix more ts splitting tests * fix doctest for ts data check * release notes * update target leakage data check docstring for consistency * doctest linting and cleanup * linting * fix merging main issues * Update `validate()` API to use `DataCheckActionOption` instead of `DataCheckAction` (#3152) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * add more tests * Rename `HighlyNullDataCheck` to `NullDataCheck` and update data check to return impute action for non-highly null columns. (#3197) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * oops update action code * oops fix test * update wording of messages * move logic out of dict * update mode to most_frequent for impute strategies * oops fix linting and doctests * Flatten data check action ``validate`` API (#3244) * init * more cleanup * begin to clean up tests * fix more tests * updating naming to action_options and fixing more tests * fix no variance * fix another test * fixing automl tests * oops actually fix automl tests * fix the other automl tests * fixing notebook * fix data check tests * fix doctests and docs * integration tests and cleanup * cleanup based on comments * linting * clean up notebook and tests * Update `make_pipeline_from_actions` to handle null column imputation (#3237) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * init * init and fix testing * fix integration test * updates test * clean up docs * lint notebook * fix tests with types * linting * release notes * update wording * update impl for natural language and datetimes, remove old tests * fix tests * fix doctest * release notes * minor cleanup * init * release notes * oops delete file * oops adding line removal Co-authored-by: chukarsten <64713315+chukarsten@users.noreply.github.com>
* Update `validate()` API (#3142) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * fix doctests * fix ts splitting test and impl * fix more ts splitting tests * fix doctest for ts data check * release notes * update target leakage data check docstring for consistency * doctest linting and cleanup * linting * fix merging main issues * Update `validate()` API to use `DataCheckActionOption` instead of `DataCheckAction` (#3152) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * add more tests * Rename `HighlyNullDataCheck` to `NullDataCheck` and update data check to return impute action for non-highly null columns. (#3197) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * oops update action code * oops fix test * update wording of messages * move logic out of dict * update mode to most_frequent for impute strategies * oops fix linting and doctests * Flatten data check action ``validate`` API (#3244) * init * more cleanup * begin to clean up tests * fix more tests * updating naming to action_options and fixing more tests * fix no variance * fix another test * fixing automl tests * oops actually fix automl tests * fix the other automl tests * fixing notebook * fix data check tests * fix doctests and docs * integration tests and cleanup * cleanup based on comments * linting * clean up notebook and tests * Update `make_pipeline_from_actions` to handle null column imputation (#3237) * init * init * start updating tests * add validation code for option * remove data check updates for now * start to clean up tests and add validate_parameter tests * revert highly null dc * add in more valueerror test checking * fix test and logic for column parameters * init * update some tests to new API * fix more tests * fix doc and sparsity test * fix integration tests * fix doctests * init, update no variance * update highly null dc * fix id column dc * update target leakage dc * update sparsity dc * outliers and uniqueness dc * update target dis dc * update invalid target data check * fix doctests * fix ts splitting test and impl * fix dc validate and tests * fix more ts splitting tests * fix data check actions notebook * update to remove columns to drop * freeze * remove rows to drop * update dc tests * update doctests * fix integration tests * revert requirements * update parameters to set to empty dict * fix tests * fix doctests * retrigger * revert parameter for data check option * fix data check option test * fix more tests from updating default parameters * release notes * use empty instead of none * release notes * cleanup unnecessary code * move logic to data check option class and rename * init rename * retrigger * revert release nots * add new logic for detecting null cols * update integration tests * add tests and some cleanup * fix tests * fix pipeline util test * remove unnecessary conditional * fix test and iteration for null data check * fix naming, need to fix tests * fix tests * fix doctest * add new enums * add logic for enums * update files to use enum * add in testing for invalid enum * fix doctest by updating to_dict impl * linting * try with different base * oops revert yaml * fix tests * remove outdated code * oops fix merge * add more tests * fix null data check tests * update to use per column strategy and fix tests * fix tests for data checks * fix tests and doctests * release notes * fix release notes * init * init and fix testing * fix integration test * updates test * clean up docs * lint notebook * fix tests with types * linting * release notes * update wording * update impl for natural language and datetimes, remove old tests * fix tests * fix doctest * release notes * minor cleanup * init * lint and release notes * move release note * clean up notebook
Part of #3116, note that this is not being merged to main but instead, a separate branch (
update_data_check_action_API). Separated out from #3152 to make it easier to review, but will merge in after to avoid multiple API changes.This PR just updates the "action" key returned to be a dictionary that looks like
"actions": {"action_list":[], "default_action": None},rather than a list of actions.