Add method to convert actions to a preprocessing pipeline #2968

angela97lin · 2021-10-27T04:26:57Z

Closes #2058

codecov · 2021-10-27T04:34:08Z

Codecov Report

Merging #2968 (e194778) into main (b95f40b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2968     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        307     307             
  Lines      29265   29283     +18     
=======================================
+ Hits       29174   29192     +18     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/automl/utils.py	`100.0% <ø> (ø)`
...ransformers/preprocessing/drop_rows_transformer.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`99.5% <100.0%> (+0.1%)`	⬆️
...ests/component_tests/test_drop_rows_transformer.py	`100.0% <100.0%> (ø)`
...ta_checks_tests/test_invalid_targets_data_check.py	`100.0% <100.0%> (ø)`
evalml/tests/pipeline_tests/test_pipeline_utils.py	`99.7% <100.0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b95f40b...e194778. Read the comment docs.

angela97lin · 2021-10-27T20:58:27Z

evalml/pipelines/components/transformers/preprocessing/drop_rows_transformer.py

@@ -23,7 +23,10 @@ def __init__(self, indices_to_drop=None, random_seed=0):
        ):
            raise ValueError("All input indices must be unique.")
        self.indices_to_drop = indices_to_drop
-        super().__init__(parameters=None, component_obj=None, random_seed=random_seed)
+        parameters = {"indices_to_drop": self.indices_to_drop}


Adding indices to parameters so they can be accessed when creating a pipeline :)

angela97lin · 2021-10-27T20:59:14Z

evalml/pipelines/utils.py

@@ -389,10 +413,14 @@ def _make_component_list_from_actions(actions):
                    TargetImputer(impute_strategy=metadata["impute_strategy"])
                )
        elif action.action_code == DataCheckActionCode.DROP_ROWS:
-            indices = action.metadata["indices"]
-            components.append(DropRowsTransformer(indices_to_drop=indices))
+            indices_to_drop.extend(action.metadata["indices"])


Some cleanup here: updating code to just return one Drop Rows Transformer, similar to Drop Columns.

angela97lin · 2021-10-28T02:46:37Z

evalml/tests/data_checks_tests/test_invalid_targets_data_check.py

@@ -491,8 +491,7 @@ def test_invalid_target_data_check_initialize_with_none_objective():
        )


-@pytest.mark.parametrize("problem_type", ["regression"])
-def test_invalid_target_data_check_regression_problem_nonnumeric_data(problem_type):
+def test_invalid_target_data_check_regression_problem_nonnumeric_data():


No need to parametrize one value 😛

angela97lin · 2021-10-28T02:47:03Z

evalml/tests/pipeline_tests/test_pipeline_utils.py

@@ -35,11 +35,11 @@
 )
 from evalml.pipelines.utils import (
    _get_pipeline_base_class,
-    _make_component_list_from_actions,


Replacing tests of private method with our new method!

eccabay · 2021-10-29T12:58:00Z

evalml/pipelines/utils.py

@@ -166,6 +166,7 @@ def _get_preprocessing_components(

 def _get_pipeline_base_class(problem_type):
    """Returns pipeline base class for problem_type."""
+    problem_type = handle_problem_types(problem_type)


Interesting catch. Did this not cause us problems before?

Nope--but only because it's a private method that we use in tests 😂

Just happened to be that everywhere where we used it, we used the ProblemTypes enum so this didn't cause problems before but I noticed it since I tried using strings!

evalml/pipelines/utils.py

eccabay · 2021-10-29T13:35:15Z

evalml/pipelines/utils.py

+    for component in component_list:
+        parameters[component.name] = component.parameters
+    component_dict = PipelineBase._make_component_dict_from_component_list(
+        [component.name for component in component_list]


Is there a specific reason we can't pass the component_list in directly here? _make_component_dict_from_component_list calls handle_component_class, so if I understand correctly it should be able to handle the list directly.

The slight difference, I think, is that passing in a list of components directly will create a pipeline where the component class is the key value, whereas this uses the name instead!

chukarsten

hey Angela! Love this PR as I Iove when we use our own code to do things and do them intuitively, which I think this does. I am approving even though I see the description mentions the merging of the pre-processing pipeline with the standard pipelines. I am just assuming that that work got split out somewhere else! If it was something we wanted to tackle in this PR, though, I would expect some tests indicating the intended functionality of the helper function and documentation support in the user guide to show the way we intend the user to use DataCheckActions! If this is happening in a subsequent PR, cool beans!

chukarsten · 2021-10-29T16:25:32Z

evalml/tests/pipeline_tests/test_pipeline_utils.py

+    )
+
+
+@pytest.mark.parametrize("problem_type", ["binary", "multiclass", "regression"])


Is this PR supposed to encompass the helper function to merge the two pipelines together?

Good catch! I filed #2997 for this :')

…ml into 2058_preprocessing_pipeline

angela97lin added 2 commits October 26, 2021 19:14

init

eb89bb1

clean up impl and tests

50e05f8

angela97lin self-assigned this Oct 27, 2021

angela97lin mentioned this pull request Oct 27, 2021

Add integration tests for end to end flow for data checks --> data check actions #2883

Merged

angela97lin added 4 commits October 27, 2021 14:19

remove debugger statements

2cc65f6

fix test

dc07309

Merge branch 'main' into 2058_preprocessing_pipeline

e251b1c

release notes

be1f028

angela97lin commented Oct 27, 2021

View reviewed changes

angela97lin added 2 commits October 27, 2021 17:10

clean up and add more testing

c09e896

linting

33fa48e

angela97lin commented Oct 28, 2021

View reviewed changes

angela97lin marked this pull request as ready for review October 28, 2021 02:47

angela97lin requested review from freddyaboulton, chukarsten, bchen1116, dsherry, eccabay and jeremyliweishih October 28, 2021 02:47

eccabay reviewed Oct 29, 2021

View reviewed changes

chukarsten approved these changes Oct 29, 2021

View reviewed changes

angela97lin mentioned this pull request Oct 31, 2021

Add helper method to combine pipelines #2997

Open

angela97lin added 3 commits October 31, 2021 00:20

Merge branch 'main' into 2058_preprocessing_pipeline

f45540d

remove unnecessary

8d1f7e4

Merge branch '2058_preprocessing_pipeline' of github.com:alteryx/eval…

e194778

…ml into 2058_preprocessing_pipeline

angela97lin merged commit 6d16f96 into main Oct 31, 2021

angela97lin deleted the 2058_preprocessing_pipeline branch October 31, 2021 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to convert actions to a preprocessing pipeline #2968

Add method to convert actions to a preprocessing pipeline #2968

angela97lin commented Oct 27, 2021

codecov bot commented Oct 27, 2021 •

edited

Loading

angela97lin Oct 27, 2021

angela97lin Oct 27, 2021

angela97lin Oct 28, 2021

angela97lin Oct 28, 2021

eccabay Oct 29, 2021

angela97lin Oct 31, 2021

eccabay Oct 29, 2021

angela97lin Oct 31, 2021

chukarsten left a comment

chukarsten Oct 29, 2021

angela97lin Oct 31, 2021

		)


		@pytest.mark.parametrize("problem_type", ["binary", "multiclass", "regression"])

Add method to convert actions to a preprocessing pipeline #2968

Add method to convert actions to a preprocessing pipeline #2968

Conversation

angela97lin commented Oct 27, 2021

codecov bot commented Oct 27, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 27, 2021 •

edited

Loading