Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DROP_ROWS to _make_component_list_from_actions #2694

Merged
merged 7 commits into from
Aug 26, 2021
Merged

Conversation

angela97lin
Copy link
Contributor

Closes #2676

@angela97lin angela97lin self-assigned this Aug 25, 2021
@codecov
Copy link

codecov bot commented Aug 25, 2021

Codecov Report

Merging #2694 (305d0cf) into main (1e32e05) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2694     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        300     300             
  Lines      27448   27459     +11     
=======================================
+ Hits       27404   27415     +11     
  Misses        44      44             
Impacted Files Coverage Δ
evalml/pipelines/utils.py 99.2% <100.0%> (+0.1%) ⬆️
evalml/tests/pipeline_tests/test_pipeline_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e32e05...305d0cf. Read the comment docs.

@@ -330,15 +331,13 @@ def test_stacked_estimator_in_pipeline(
def test_make_component_list_from_actions():
assert _make_component_list_from_actions([]) == []

actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"columns": ["some col"]})]
actions = [DataCheckAction(DataCheckActionCode.DROP_COL, {"column": "some col"})]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to column since that's what most of our data checks output.

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! just some small suggestions

metadata = action.metadata
if metadata["is_target"]:
components.append(
TargetImputer(impute_strategy=metadata["impute_strategy"])
)
elif action.action_code == DataCheckActionCode.DROP_ROWS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could there be multiple DROP_ROWS like there can be multiple DROP_COLs?

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks good to me. Just a question about whether the order of the drop columns action matters! But nothing blocking. Do what you want with it lol.

elif action.action_code == DataCheckActionCode.DROP_ROWS:
indices = action.metadata["indices"]
components.append(DropRowsTransformer(indices_to_drop=indices))
if cols_to_drop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this isn't just in the if action.action_code == DataCheckActionCode.DROP_COL: branch on L329? Or is there an order you're trying to get in the components list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main reason was to just consolidate all of the columns to drop in one component, but otherwise no difference!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for consistency with the rows, I could keep these separate. However, unlike rows, data checks currently dump each column to drop as a separate DROP_COL action, causing one data check to return multiple DROP_COL actions (unlike drop rows which returns one per data check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep as is for now--I suspect perhaps we might want to return the actions returned by the data checks, rather than this code here.

@angela97lin angela97lin merged commit aa57a39 into main Aug 26, 2021
@angela97lin angela97lin deleted the 2676_drop_rows branch August 26, 2021 20:25
@chukarsten chukarsten mentioned this pull request Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add DROP_ROWS to _make_component_list_from_actions
3 participants