Add method to convert actions to a preprocessing pipeline#2968
Add method to convert actions to a preprocessing pipeline#2968angela97lin merged 11 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2968 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 307 307
Lines 29265 29283 +18
=======================================
+ Hits 29174 29192 +18
Misses 91 91
Continue to review full report at Codecov.
|
| raise ValueError("All input indices must be unique.") | ||
| self.indices_to_drop = indices_to_drop | ||
| super().__init__(parameters=None, component_obj=None, random_seed=random_seed) | ||
| parameters = {"indices_to_drop": self.indices_to_drop} |
There was a problem hiding this comment.
Adding indices to parameters so they can be accessed when creating a pipeline :)
| elif action.action_code == DataCheckActionCode.DROP_ROWS: | ||
| indices = action.metadata["indices"] | ||
| components.append(DropRowsTransformer(indices_to_drop=indices)) | ||
| indices_to_drop.extend(action.metadata["indices"]) |
There was a problem hiding this comment.
Some cleanup here: updating code to just return one Drop Rows Transformer, similar to Drop Columns.
|
|
||
| @pytest.mark.parametrize("problem_type", ["regression"]) | ||
| def test_invalid_target_data_check_regression_problem_nonnumeric_data(problem_type): | ||
| def test_invalid_target_data_check_regression_problem_nonnumeric_data(): |
There was a problem hiding this comment.
No need to parametrize one value 😛
| ) | ||
| from evalml.pipelines.utils import ( | ||
| _get_pipeline_base_class, | ||
| _make_component_list_from_actions, |
There was a problem hiding this comment.
Replacing tests of private method with our new method!
|
|
||
| def _get_pipeline_base_class(problem_type): | ||
| """Returns pipeline base class for problem_type.""" | ||
| problem_type = handle_problem_types(problem_type) |
There was a problem hiding this comment.
Interesting catch. Did this not cause us problems before?
There was a problem hiding this comment.
Nope--but only because it's a private method that we use in tests 😂
Just happened to be that everywhere where we used it, we used the ProblemTypes enum so this didn't cause problems before but I noticed it since I tried using strings!
| for component in component_list: | ||
| parameters[component.name] = component.parameters | ||
| component_dict = PipelineBase._make_component_dict_from_component_list( | ||
| [component.name for component in component_list] |
There was a problem hiding this comment.
Is there a specific reason we can't pass the component_list in directly here? _make_component_dict_from_component_list calls handle_component_class, so if I understand correctly it should be able to handle the list directly.
There was a problem hiding this comment.
The slight difference, I think, is that passing in a list of components directly will create a pipeline where the component class is the key value, whereas this uses the name instead!
chukarsten
left a comment
There was a problem hiding this comment.
hey Angela! Love this PR as I Iove when we use our own code to do things and do them intuitively, which I think this does. I am approving even though I see the description mentions the merging of the pre-processing pipeline with the standard pipelines. I am just assuming that that work got split out somewhere else! If it was something we wanted to tackle in this PR, though, I would expect some tests indicating the intended functionality of the helper function and documentation support in the user guide to show the way we intend the user to use DataCheckActions! If this is happening in a subsequent PR, cool beans!
| ) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("problem_type", ["binary", "multiclass", "regression"]) |
There was a problem hiding this comment.
Is this PR supposed to encompass the helper function to merge the two pipelines together?
…ml into 2058_preprocessing_pipeline
Closes #2058