-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize data check messages by adding default "rows" and "columns" metadata #2869
Conversation
…ml into 2792_standardize_dc_cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left some nits, but I like that we have now simplified the returns and combined the column/rows when needed!
@@ -86,14 +86,14 @@ def validate(self, X, y): | |||
>>> y = pd.Series([10, 42, 31, 51, 40]) | |||
>>> target_leakage_check = TargetLeakageDataCheck(pct_corr_threshold=0.95) | |||
>>> assert target_leakage_check.validate(X, y) == { | |||
... "warnings": [{"message": "Column 'leak' is 95.0% or more correlated with the target", | |||
... "warnings": [{"message": "Columns 'leak' are 95.0% or more correlated with the target", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird that this is plural when there's only one column. Not a nit, but would be nice to fix this
…ml into 2792_standardize_dc_cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, it's definitely odd that the docstrings aren't failing in HighlyNullDataCheck
and elsewhere. Getting this compatible with tempo health shouldn't be a heavy lift from here for me, thanks for the changes!
).to_dict(), | ||
DataCheckWarning( | ||
message="Column 'd' is 80.0% or more correlated with the target", | ||
message="Columns 'a', 'b', 'c', 'd' are 80.0% or more correlated with the target", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beauty of consolidation
results["actions"].append( | ||
DataCheckAction( | ||
DataCheckActionCode.DROP_ROWS, | ||
metadata={"indices": all_rows_with_indices}, | ||
metadata={"rows": all_rows_with_indices}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a follow up comment, do we want to repeat the all_rows_with_indices
over here? Maybe it should exist only in the DataCheckAction and not the Warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof I missed this comment before merging, but I'm pretty indifferent about this either way!
Closes #2792.
I still need to update docstring tests, will wait until #2933 is merged to avoid unnecessary changes and conflicts.