Skip to content

Update DateTimeFormatDataCheck with actions and make pipeline from actions#3454

Merged
ParthivNaresh merged 16 commits into
mainfrom
dfdc_actions
Apr 14, 2022
Merged

Update DateTimeFormatDataCheck with actions and make pipeline from actions#3454
ParthivNaresh merged 16 commits into
mainfrom
dfdc_actions

Conversation

@ParthivNaresh
Copy link
Copy Markdown
Contributor

@ParthivNaresh ParthivNaresh commented Apr 8, 2022

Fixes #3437

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2022

Codecov Report

Merging #3454 (9b17265) into main (eebacf1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3454     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        336     336             
  Lines      33297   33375     +78     
=======================================
+ Hits       33165   33243     +78     
  Misses       132     132             
Impacted Files Coverage Δ
evalml/data_checks/default_data_checks.py 100.0% <ø> (ø)
evalml/data_checks/data_check_action_code.py 100.0% <100.0%> (ø)
evalml/data_checks/data_check_message_code.py 100.0% <100.0%> (ø)
evalml/data_checks/datetime_format_data_check.py 100.0% <100.0%> (ø)
...nsformers/preprocessing/time_series_regularizer.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.5% <100.0%> (+0.1%) ⬆️
...ts/component_tests/test_time_series_regularizer.py 100.0% <100.0%> (ø)
...ta_checks_tests/test_datetime_format_data_check.py 100.0% <100.0%> (ø)
..._tests/test_data_checks_and_actions_integration.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eebacf1...9b17265. Read the comment docs.

Comment thread evalml/pipelines/utils.py
TimeSeriesRegularizer(time_index=parameters["time_index"]),
TimeSeriesImputer(),
]
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question, should we have a break statement or something similar here? If we're adding the ts regularizer and imputer I'm not sure how relevant the rest of the actions might be.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).

Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ParthivNaresh ! Code looks good but left some comments on the UX implications + a refactor to not have to infer frequency twice.

Comment thread evalml/pipelines/utils.py
TimeSeriesRegularizer(time_index=parameters["time_index"]),
TimeSeriesImputer(),
]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).

Comment thread evalml/pipelines/utils.py Outdated
)
else:
messages.append(
DataCheckError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding this new error instead of adding it to everyone of the already existing data check errors to avoid having duplicate data check actions right?

I think this may be confusing UX to users because they'll see multiple errors but only the "DATETIME_HAS_UNEVEN_INTERVALS" will appear "fixable" via an action even though this action will fix all other errors.

This may be the best we can do for now. Tagging @Cmancuso so we can discuss further.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh and I talked about this - errors will be consolidated in the future.

"default_value": col_name,
}
},
metadata={"is_target": True},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using is_target anywhere right?

Copy link
Copy Markdown
Contributor Author

@ParthivNaresh ParthivNaresh Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An EvalML consumer might check for is_target when running data check actions to determine if the target has been passed and to raise an error if it hasn't when the target is being modified. I felt like that case needed to be covered but if it doesn't I have no problem taking that out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to keep it! just wondering why since it didn't see it being "used"

Copy link
Copy Markdown
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @ParthivNaresh, thanks for doing it! I just left a couple nits. I also agree with Freddy about the potential confusion with how we tie the errors to actions. It might be helpful to discuss the best way to do this before moving forward.

Comment thread evalml/tests/data_checks_tests/test_datetime_format_data_check.py Outdated
Comment thread evalml/data_checks/data_check_message_code.py Outdated
Copy link
Copy Markdown
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the follow up Parthiv

Comment thread evalml/data_checks/data_check_message_code.py Outdated

@pytest.mark.parametrize("y_passed", [True, False])
def test_ts_regularizer_X_only(y_passed, combination_of_faulty_datetime):
def test_ts_regularizer_X_only_equal_payload(y_passed, combination_of_faulty_datetime):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what you mean by "equal_payload"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is verifying that if a payload is explicitly passed in through the parameters to the class, it provides an equivalent output to the payload inferred in fit.

Copy link
Copy Markdown
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Comment thread evalml/pipelines/components/transformers/preprocessing/time_series_regularizer.py Outdated
Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ParthivNaresh !

Comment thread evalml/pipelines/utils.py Outdated
)
else:
messages.append(
DataCheckError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh and I talked about this - errors will be consolidated in the future.

@ParthivNaresh ParthivNaresh merged commit 6829737 into main Apr 14, 2022
@chukarsten chukarsten mentioned this pull request Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Data Check Actions for DateTimeFormatDataCheck to support regularizer and imputer work

4 participants