New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include recommended actions for each data check's set of actions #1968
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1968 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 273 273
Lines 22356 22381 +25
=========================================
+ Hits 22350 22375 +25
Misses 6 6
Continue to review full report at Codecov.
|
|
||
# test 2D list | ||
assert highly_null_check.validate([[None, None, None, None, 0], [None, None, None, "hi", 5]]) == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cleaning up. We were using the same expected
value for each of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I like that the data checks class handles removing duplicate columns names that we want to drop.
|
||
# test 2D list | ||
assert highly_null_check.validate([[None, None, None, None, 0], [None, None, None, "hi", 5]]) == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
|
||
new_actions = messages_new["actions"] | ||
for new_action in new_actions: | ||
if new_action not in messages["actions"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks solid as is. If you want to add the explicit test to show the removal of anticipated redundant columns, that would be cool. But you probably have better things to do lol
@chukarsten There's no better way to spend time than to write more unit tests 😂 Thanks for the suggestion, added! |
First half of #1881.
Addresses:
DataChecks
UniquenessDataCheck
SparsityDataCheck
TargetLeakageDataCheck
HighlyNullDataCheck
IDColumnsDataCheck
NoVarianceDataCheck
These data checks are relatively simple as their recommended action is just to drop the problem column.
Currently, different data checks could result in the same data check action (drop a specific col). Should the
DataChecks
class address this and remove duplicates?In this PR, I address this question with "yes", because I imagine this will be useful when creating components--we don't want the second DropColumns component to error out when it can't find the column, but open to thoughts / concerns! Note that this doesn't affect different transformations on the same col (impute then drop).