Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454

ParthivNaresh · 2022-04-08T15:42:45Z

codecov · 2022-04-08T15:53:34Z

Codecov Report

Merging #3454 (9b17265) into main (eebacf1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3454     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        336     336             
  Lines      33297   33375     +78     
=======================================
+ Hits       33165   33243     +78     
  Misses       132     132

Impacted Files	Coverage Δ
evalml/data_checks/default_data_checks.py	`100.0% <ø> (ø)`
evalml/data_checks/data_check_action_code.py	`100.0% <100.0%> (ø)`
evalml/data_checks/data_check_message_code.py	`100.0% <100.0%> (ø)`
evalml/data_checks/datetime_format_data_check.py	`100.0% <100.0%> (ø)`
...nsformers/preprocessing/time_series_regularizer.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`99.5% <100.0%> (+0.1%)`	⬆️
...ts/component_tests/test_time_series_regularizer.py	`100.0% <100.0%> (ø)`
...ta_checks_tests/test_datetime_format_data_check.py	`100.0% <100.0%> (ø)`
..._tests/test_data_checks_and_actions_integration.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eebacf1...9b17265. Read the comment docs.

ParthivNaresh · 2022-04-11T15:24:58Z

evalml/pipelines/utils.py

+                    TimeSeriesRegularizer(time_index=parameters["time_index"]),
+                    TimeSeriesImputer(),
+                ]
+            )


Open question, should we have a break statement or something similar here? If we're adding the ts regularizer and imputer I'm not sure how relevant the rest of the actions might be.

I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).

freddyaboulton

Thanks @ParthivNaresh ! Code looks good but left some comments on the UX implications + a refactor to not have to infer frequency twice.

freddyaboulton · 2022-04-11T17:22:13Z

evalml/pipelines/utils.py

+                    TimeSeriesRegularizer(time_index=parameters["time_index"]),
+                    TimeSeriesImputer(),
+                ]
+            )


I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).

evalml/pipelines/utils.py

freddyaboulton · 2022-04-11T17:52:54Z

evalml/data_checks/datetime_format_data_check.py

+            )
+        else:
+            messages.append(
+                DataCheckError(


We're adding this new error instead of adding it to everyone of the already existing data check errors to avoid having duplicate data check actions right?

I think this may be confusing UX to users because they'll see multiple errors but only the "DATETIME_HAS_UNEVEN_INTERVALS" will appear "fixable" via an action even though this action will fix all other errors.

This may be the best we can do for now. Tagging @Cmancuso so we can discuss further.

@ParthivNaresh and I talked about this - errors will be consolidated in the future.

freddyaboulton · 2022-04-11T17:55:31Z

evalml/data_checks/datetime_format_data_check.py

+                                    "default_value": col_name,
+                                }
+                            },
+                            metadata={"is_target": True},


We're not using is_target anywhere right?

An EvalML consumer might check for is_target when running data check actions to determine if the target has been passed and to raise an error if it hasn't when the target is being modified. I felt like that case needed to be covered but if it doesn't I have no problem taking that out.

Happy to keep it! just wondering why since it didn't see it being "used"

eccabay

This is awesome @ParthivNaresh, thanks for doing it! I just left a couple nits. I also agree with Freddy about the potential confusion with how we tie the errors to actions. It might be helpful to discuss the best way to do this before moving forward.

evalml/tests/data_checks_tests/test_datetime_format_data_check.py

evalml/data_checks/data_check_message_code.py

chukarsten

LGTM, thanks for the follow up Parthiv

evalml/data_checks/data_check_message_code.py

chukarsten · 2022-04-12T20:05:38Z

evalml/tests/component_tests/test_time_series_regularizer.py

@@ -152,10 +160,24 @@ def test_ts_regularizer_no_issues(ts_data):


 @pytest.mark.parametrize("y_passed", [True, False])
-def test_ts_regularizer_X_only(y_passed, combination_of_faulty_datetime):
+def test_ts_regularizer_X_only_equal_payload(y_passed, combination_of_faulty_datetime):


Just curious what you mean by "equal_payload"

This is verifying that if a payload is explicitly passed in through the parameters to the class, it provides an equivalent output to the payload inferred in fit.

eccabay

🚢

evalml/pipelines/components/transformers/preprocessing/time_series_regularizer.py

freddyaboulton

Thanks @ParthivNaresh !

evalml/pipelines/utils.py

freddyaboulton · 2022-04-14T15:02:20Z

evalml/data_checks/datetime_format_data_check.py

+            )
+        else:
+            messages.append(
+                DataCheckError(


@ParthivNaresh and I talked about this - errors will be consolidated in the future.

… dfdc_actions

ParthivNaresh added 2 commits April 8, 2022 11:40

Rebase commit

a40a9d8

Update release notes

81bb862

Merge branch 'main' into dfdc_actions

d84e8ac

ParthivNaresh commented Apr 11, 2022

View reviewed changes

ParthivNaresh marked this pull request as ready for review April 11, 2022 15:29

auto-assign bot assigned ParthivNaresh Apr 11, 2022

ParthivNaresh requested review from chukarsten, eccabay, christopherbunn, freddyaboulton and jeremyliweishih April 11, 2022 15:33

freddyaboulton reviewed Apr 11, 2022

View reviewed changes

ParthivNaresh requested review from bchen1116 and jeff-hernandez April 11, 2022 19:48

ParthivNaresh added 3 commits April 12, 2022 10:41

Add frequency_payload as parameter to ts_regularizer

6f30b96

Lint

3ab5932

Lint

03b1ad6

eccabay reviewed Apr 12, 2022

View reviewed changes

evalml/tests/data_checks_tests/test_datetime_format_data_check.py Outdated Show resolved Hide resolved

evalml/data_checks/data_check_message_code.py Outdated Show resolved Hide resolved

ParthivNaresh added 4 commits April 12, 2022 11:18

test updates

f687fd5

address changes

1d4d00b

Merge branch 'main' into dfdc_actions

37d08c5

changes

0eb1df0

chukarsten approved these changes Apr 12, 2022

View reviewed changes

ParthivNaresh added 2 commits April 12, 2022 16:18

changes

dbdecc4

Merge branch 'main' into dfdc_actions

599f2d6

eccabay approved these changes Apr 14, 2022

View reviewed changes

evalml/pipelines/components/transformers/preprocessing/time_series_regularizer.py Outdated Show resolved Hide resolved

freddyaboulton approved these changes Apr 14, 2022

View reviewed changes

ParthivNaresh added 2 commits April 14, 2022 11:26

changes

132863f

Merge branch 'dfdc_actions' of https://github.com/alteryx/evalml into…

a61abd4

… dfdc_actions

ParthivNaresh added 2 commits April 14, 2022 11:28

Merge branch 'main' into dfdc_actions

3bfc669

Merge branch 'main' into dfdc_actions

9b17265

ParthivNaresh merged commit 6829737 into main Apr 14, 2022

chukarsten mentioned this pull request Apr 29, 2022

Release v0.51.0. #3489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454

Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454

ParthivNaresh commented Apr 8, 2022 •

edited

codecov bot commented Apr 8, 2022 •

edited

ParthivNaresh Apr 11, 2022

freddyaboulton Apr 11, 2022

freddyaboulton left a comment

freddyaboulton Apr 11, 2022

freddyaboulton Apr 11, 2022

freddyaboulton Apr 14, 2022

freddyaboulton Apr 11, 2022

ParthivNaresh Apr 12, 2022 •

edited

freddyaboulton Apr 12, 2022

eccabay left a comment

chukarsten left a comment

chukarsten Apr 12, 2022

ParthivNaresh Apr 12, 2022

eccabay left a comment

freddyaboulton left a comment

freddyaboulton Apr 14, 2022

Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454

Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454

Conversation

ParthivNaresh commented Apr 8, 2022 • edited

codecov bot commented Apr 8, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParthivNaresh Apr 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParthivNaresh commented Apr 8, 2022 •

edited

codecov bot commented Apr 8, 2022 •

edited

ParthivNaresh Apr 12, 2022 •

edited