Skip to content

Removes DateTimeNaNDataCheck and NaturalLanguageNaNDataCheck#3260

Merged
angela97lin merged 32 commits intomainfrom
3242_remove_datetime_nl_dc
Jan 26, 2022
Merged

Removes DateTimeNaNDataCheck and NaturalLanguageNaNDataCheck#3260
angela97lin merged 32 commits intomainfrom
3242_remove_datetime_nl_dc

Conversation

@angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jan 19, 2022

Closes #3242. Currently pointing to my branch right now but will wait to merge into main after #3182 is merged!

* init

* init

* start updating tests

* add validation code for option

* remove data check updates for now

* start to clean up tests and add validate_parameter tests

* revert highly null dc

* add in more valueerror test checking

* fix test and logic for column parameters

* init

* update some tests to new API

* fix more tests

* fix doc and sparsity test

* fix integration tests

* fix doctests

* fix doctests

* fix ts splitting test and impl

* fix more ts splitting tests

* fix doctest for ts data check

* release notes

* update target leakage data check docstring for consistency

* doctest linting and cleanup

* linting
…taCheckAction` (#3152)

* init

* init

* start updating tests

* add validation code for option

* remove data check updates for now

* start to clean up tests and add validate_parameter tests

* revert highly null dc

* add in more valueerror test checking

* fix test and logic for column parameters

* init

* update some tests to new API

* fix more tests

* fix doc and sparsity test

* fix integration tests

* fix doctests

* init, update no variance

* update highly null dc

* fix id column dc

* update target leakage dc

* update sparsity dc

* outliers and uniqueness dc

* update target dis dc

* update invalid target data check

* fix doctests

* fix ts splitting test and impl

* fix dc validate and tests

* fix more ts splitting tests

* fix data check actions notebook

* update to remove columns to drop

* freeze

* remove rows to drop

* update dc tests

* update doctests

* fix integration tests

* revert requirements

* update parameters to set to empty dict

* fix tests

* fix doctests

* retrigger

* revert parameter for data check option

* fix data check option test

* fix more tests from updating default parameters

* release notes

* use empty instead of none

* release notes

* cleanup unnecessary code

* move logic to data check option class and rename

* update integration tests

* add tests and some cleanup

* fix tests

* fix pipeline util test

* remove unnecessary conditional

* fix naming, need to fix tests

* fix tests

* fix doctest

* add new enums

* add logic for enums

* update files to use enum

* add in testing for invalid enum

* fix doctest by updating to_dict impl

* linting

* try with different base

* oops revert yaml

* fix tests

* remove outdated code

* add more tests
… to return impute action for non-highly null columns. (#3197)

* init

* init

* start updating tests

* add validation code for option

* remove data check updates for now

* start to clean up tests and add validate_parameter tests

* revert highly null dc

* add in more valueerror test checking

* fix test and logic for column parameters

* init

* update some tests to new API

* fix more tests

* fix doc and sparsity test

* fix integration tests

* fix doctests

* init, update no variance

* update highly null dc

* fix id column dc

* update target leakage dc

* update sparsity dc

* outliers and uniqueness dc

* update target dis dc

* update invalid target data check

* fix doctests

* fix ts splitting test and impl

* fix dc validate and tests

* fix more ts splitting tests

* fix data check actions notebook

* update to remove columns to drop

* freeze

* remove rows to drop

* update dc tests

* update doctests

* fix integration tests

* revert requirements

* update parameters to set to empty dict

* fix tests

* fix doctests

* retrigger

* revert parameter for data check option

* fix data check option test

* fix more tests from updating default parameters

* release notes

* use empty instead of none

* release notes

* cleanup unnecessary code

* move logic to data check option class and rename

* init rename

* retrigger

* revert release nots

* add new logic for detecting null cols

* update integration tests

* add tests and some cleanup

* fix tests

* fix pipeline util test

* remove unnecessary conditional

* fix test and iteration for null data check

* fix naming, need to fix tests

* fix tests

* fix doctest

* add new enums

* add logic for enums

* update files to use enum

* add in testing for invalid enum

* fix doctest by updating to_dict impl

* linting

* try with different base

* oops revert yaml

* fix tests

* remove outdated code

* oops fix merge

* add more tests

* fix null data check tests

* update to use per column strategy and fix tests

* fix tests for data checks

* fix tests and doctests

* release notes

* fix release notes

* oops update action code

* oops fix test

* update wording of messages

* move logic out of dict

* update mode to most_frequent for impute strategies

* oops fix linting and doctests
* init

* more cleanup

* begin to clean up tests

* fix more tests

* updating naming to action_options and fixing more tests

* fix no variance

* fix another test

* fixing automl tests

* oops actually fix automl tests

* fix the other automl tests

* fixing notebook

* fix data check tests

* fix doctests and docs

* integration tests and cleanup

* cleanup based on comments

* linting

* clean up notebook and tests
…3237)

* init

* init

* start updating tests

* add validation code for option

* remove data check updates for now

* start to clean up tests and add validate_parameter tests

* revert highly null dc

* add in more valueerror test checking

* fix test and logic for column parameters

* init

* update some tests to new API

* fix more tests

* fix doc and sparsity test

* fix integration tests

* fix doctests

* init, update no variance

* update highly null dc

* fix id column dc

* update target leakage dc

* update sparsity dc

* outliers and uniqueness dc

* update target dis dc

* update invalid target data check

* fix doctests

* fix ts splitting test and impl

* fix dc validate and tests

* fix more ts splitting tests

* fix data check actions notebook

* update to remove columns to drop

* freeze

* remove rows to drop

* update dc tests

* update doctests

* fix integration tests

* revert requirements

* update parameters to set to empty dict

* fix tests

* fix doctests

* retrigger

* revert parameter for data check option

* fix data check option test

* fix more tests from updating default parameters

* release notes

* use empty instead of none

* release notes

* cleanup unnecessary code

* move logic to data check option class and rename

* init rename

* retrigger

* revert release nots

* add new logic for detecting null cols

* update integration tests

* add tests and some cleanup

* fix tests

* fix pipeline util test

* remove unnecessary conditional

* fix test and iteration for null data check

* fix naming, need to fix tests

* fix tests

* fix doctest

* add new enums

* add logic for enums

* update files to use enum

* add in testing for invalid enum

* fix doctest by updating to_dict impl

* linting

* try with different base

* oops revert yaml

* fix tests

* remove outdated code

* oops fix merge

* add more tests

* fix null data check tests

* update to use per column strategy and fix tests

* fix tests for data checks

* fix tests and doctests

* release notes

* fix release notes

* init

* init and fix testing

* fix integration test

* updates test

* clean up docs

* lint notebook

* fix tests with types

* linting

* release notes

* update wording

* update impl for natural language and datetimes, remove old tests

* fix tests

* fix doctest
@angela97lin angela97lin self-assigned this Jan 19, 2022
@angela97lin angela97lin changed the base branch from main to update_data_check_action_API January 19, 2022 20:40
@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #3260 (a992cdb) into main (b6f290f) will decrease coverage by 0.1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3260     +/-   ##
=======================================
- Coverage   99.8%   99.8%   -0.0%     
=======================================
  Files        326     322      -4     
  Lines      31734   31611    -123     
=======================================
- Hits       31643   31520    -123     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/data_checks/__init__.py 100.0% <ø> (ø)
evalml/data_checks/default_data_checks.py 100.0% <ø> (ø)
evalml/tests/data_checks_tests/test_data_checks.py 100.0% <ø> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.7% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6f290f...a992cdb. Read the comment docs.

evalml.data_checks.NaturalLanguageNaNDataCheck
evalml.data_checks.DateTimeFormatDataCheck
evalml.data_checks.TimeSeriesParametersDataCheck
evalml.data_checks.TimeSeriesSplittingDataCheck
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, just saw this was missing :)

@angela97lin angela97lin marked this pull request as ready for review January 20, 2022 19:35
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Are we going to delete evalml/data_checks/datetime_nan_data_check.py as well?

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I think we also need to remove DateTimeNaNDataCheck implementation. Thanks @angela97lin !

@chukarsten
Copy link
Contributor

@angela97lin I fiddled with the conflict with the release notes so if you have anymore work, you might want to pull

@angela97lin
Copy link
Contributor Author

@freddyaboulton Oops, good catch! Mildly concerning that codecov didn't fail even though the entire file was not covered...

And thank you @chukarsten! I've decided not to add any more PRs to my branch since I've gotten a substantial number of approvals on it so will wait until it's in main to merge this :)

Base automatically changed from update_data_check_action_API to main January 25, 2022 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NullDataCheck vs DateTimeNaNDataCheck and NaturalLanguageNaNDataCheck

5 participants