-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DROP_ROWS
to _make_component_list_from_actions
#2694
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2694 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 300 300
Lines 27448 27459 +11
=======================================
+ Hits 27404 27415 +11
Misses 44 44
Continue to review full report at Codecov.
|
@@ -330,15 +331,13 @@ def test_stacked_estimator_in_pipeline( | |||
def test_make_component_list_from_actions(): | |||
assert _make_component_list_from_actions([]) == [] | |||
|
|||
actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ["some col"]})] | |||
actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"column": "some col"})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to column
since that's what most of our data checks output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! just some small suggestions
metadata = action.metadata | ||
if metadata["is_target"]: | ||
components.append( | ||
TargetImputer(impute_strategy=metadata["impute_strategy"]) | ||
) | ||
elif action.action_code == DataCheckActionCode.DROP_ROWS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could there be multiple DROP_ROWS
like there can be multiple DROP_COL
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks good to me. Just a question about whether the order of the drop columns action matters! But nothing blocking. Do what you want with it lol.
elif action.action_code == DataCheckActionCode.DROP_ROWS: | ||
indices = action.metadata["indices"] | ||
components.append(DropRowsTransformer(indices_to_drop=indices)) | ||
if cols_to_drop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that this isn't just in the if action.action_code == DataCheckActionCode.DROP_COL:
branch on L329? Or is there an order you're trying to get in the components list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main reason was to just consolidate all of the columns to drop in one component, but otherwise no difference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for consistency with the rows, I could keep these separate. However, unlike rows, data checks currently dump each column to drop as a separate DROP_COL action, causing one data check to return multiple DROP_COL actions (unlike drop rows which returns one per data check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep as is for now--I suspect perhaps we might want to return the actions returned by the data checks, rather than this code here.
Closes #2676