Update DateTimeFormatDataCheck with actions and make pipeline from actions#3454
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3454 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 336 336
Lines 33297 33375 +78
=======================================
+ Hits 33165 33243 +78
Misses 132 132
Continue to review full report at Codecov.
|
| TimeSeriesRegularizer(time_index=parameters["time_index"]), | ||
| TimeSeriesImputer(), | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Open question, should we have a break statement or something similar here? If we're adding the ts regularizer and imputer I'm not sure how relevant the rest of the actions might be.
There was a problem hiding this comment.
I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).
freddyaboulton
left a comment
There was a problem hiding this comment.
Thanks @ParthivNaresh ! Code looks good but left some comments on the UX implications + a refactor to not have to infer frequency twice.
| TimeSeriesRegularizer(time_index=parameters["time_index"]), | ||
| TimeSeriesImputer(), | ||
| ] | ||
| ) |
There was a problem hiding this comment.
I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).
| ) | ||
| else: | ||
| messages.append( | ||
| DataCheckError( |
There was a problem hiding this comment.
We're adding this new error instead of adding it to everyone of the already existing data check errors to avoid having duplicate data check actions right?
I think this may be confusing UX to users because they'll see multiple errors but only the "DATETIME_HAS_UNEVEN_INTERVALS" will appear "fixable" via an action even though this action will fix all other errors.
This may be the best we can do for now. Tagging @Cmancuso so we can discuss further.
There was a problem hiding this comment.
@ParthivNaresh and I talked about this - errors will be consolidated in the future.
| "default_value": col_name, | ||
| } | ||
| }, | ||
| metadata={"is_target": True}, |
There was a problem hiding this comment.
We're not using is_target anywhere right?
There was a problem hiding this comment.
An EvalML consumer might check for is_target when running data check actions to determine if the target has been passed and to raise an error if it hasn't when the target is being modified. I felt like that case needed to be covered but if it doesn't I have no problem taking that out.
There was a problem hiding this comment.
Happy to keep it! just wondering why since it didn't see it being "used"
eccabay
left a comment
There was a problem hiding this comment.
This is awesome @ParthivNaresh, thanks for doing it! I just left a couple nits. I also agree with Freddy about the potential confusion with how we tie the errors to actions. It might be helpful to discuss the best way to do this before moving forward.
chukarsten
left a comment
There was a problem hiding this comment.
LGTM, thanks for the follow up Parthiv
|
|
||
| @pytest.mark.parametrize("y_passed", [True, False]) | ||
| def test_ts_regularizer_X_only(y_passed, combination_of_faulty_datetime): | ||
| def test_ts_regularizer_X_only_equal_payload(y_passed, combination_of_faulty_datetime): |
There was a problem hiding this comment.
Just curious what you mean by "equal_payload"
There was a problem hiding this comment.
This is verifying that if a payload is explicitly passed in through the parameters to the class, it provides an equivalent output to the payload inferred in fit.
freddyaboulton
left a comment
There was a problem hiding this comment.
Thanks @ParthivNaresh !
| ) | ||
| else: | ||
| messages.append( | ||
| DataCheckError( |
There was a problem hiding this comment.
@ParthivNaresh and I talked about this - errors will be consolidated in the future.
Fixes #3437