Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data check actions: don't produce duplicate components #1998

Closed
angela97lin opened this issue Mar 19, 2021 · 3 comments · Fixed by #2883
Closed

Data check actions: don't produce duplicate components #1998

angela97lin opened this issue Mar 19, 2021 · 3 comments · Fixed by #2883
Labels
enhancement An improvement to an existing feature.

Comments

@angela97lin
Copy link
Contributor

Currently, we could add multiple duplicate components from the same actions. We should try to prune duplicate components, if possible:

# current behavior. But this might cause errors, trying to drop 'some col' multiple times
actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ['some col']}),
                  DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ['some col']})]
_make_component_list_from_actions(actions) == [DropColumns(columns=['some col']),
                                                                    DropColumns(columns=['some col'])] 
# desired behavior
actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ['some col']}),
                  DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ['some col']})]
_make_component_list_from_actions(actions) == [DropColumns(columns=['some col'])] # only once

There's some support in the DataChecks class to not return duplicate actions, but if a user just used multiple DataCheck objects, we will still end up with duplicates in _make_component_list_from_actions

@angela97lin angela97lin added the enhancement An improvement to an existing feature. label Mar 19, 2021
@dsherry dsherry changed the title Update _make_component_list_from_actions to not produce duplicate components Data check actions: don't produce duplicate components Mar 25, 2021
@dsherry
Copy link
Contributor

dsherry commented Mar 25, 2021

@angela97lin is this blocked on #1730 ?

@angela97lin
Copy link
Contributor Author

@dsherry Nope, not necessarily--just an additional enhancement / cleanup!

@dsherry
Copy link
Contributor

dsherry commented Apr 19, 2021

This is not blocked on #1929 -- we can lump the dropped cols into one DropColumns component. However, it may make more sense if each action generates its own component, which we'd need to do #1929 for in order to put multiple DropColumns into a pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants