-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DateTimeFormatDataCheck with actions and make pipeline from actions #3454
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3454 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 336 336
Lines 33297 33375 +78
=======================================
+ Hits 33165 33243 +78
Misses 132 132
Continue to review full report at Codecov.
|
TimeSeriesRegularizer(time_index=parameters["time_index"]), | ||
TimeSeriesImputer(), | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question, should we have a break statement or something similar here? If we're adding the ts regularizer and imputer I'm not sure how relevant the rest of the actions might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ParthivNaresh ! Code looks good but left some comments on the UX implications + a refactor to not have to infer frequency twice.
TimeSeriesRegularizer(time_index=parameters["time_index"]), | ||
TimeSeriesImputer(), | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best if we keep this "dumb" (spit out pipeline from actions) and have the caller of this function "smart" (knowing which datacheck actions are relevant for time series).
) | ||
else: | ||
messages.append( | ||
DataCheckError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding this new error instead of adding it to everyone of the already existing data check errors to avoid having duplicate data check actions right?
I think this may be confusing UX to users because they'll see multiple errors but only the "DATETIME_HAS_UNEVEN_INTERVALS" will appear "fixable" via an action even though this action will fix all other errors.
This may be the best we can do for now. Tagging @Cmancuso so we can discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh and I talked about this - errors will be consolidated in the future.
"default_value": col_name, | ||
} | ||
}, | ||
metadata={"is_target": True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using is_target anywhere right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An EvalML consumer might check for is_target
when running data check actions to determine if the target has been passed and to raise an error if it hasn't when the target is being modified. I felt like that case needed to be covered but if it doesn't I have no problem taking that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to keep it! just wondering why since it didn't see it being "used"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @ParthivNaresh, thanks for doing it! I just left a couple nits. I also agree with Freddy about the potential confusion with how we tie the errors to actions. It might be helpful to discuss the best way to do this before moving forward.
evalml/tests/data_checks_tests/test_datetime_format_data_check.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the follow up Parthiv
@@ -152,10 +160,24 @@ def test_ts_regularizer_no_issues(ts_data): | |||
|
|||
|
|||
@pytest.mark.parametrize("y_passed", [True, False]) | |||
def test_ts_regularizer_X_only(y_passed, combination_of_faulty_datetime): | |||
def test_ts_regularizer_X_only_equal_payload(y_passed, combination_of_faulty_datetime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what you mean by "equal_payload"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is verifying that if a payload is explicitly passed in through the parameters to the class, it provides an equivalent output to the payload inferred in fit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
evalml/pipelines/components/transformers/preprocessing/time_series_regularizer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ParthivNaresh !
) | ||
else: | ||
messages.append( | ||
DataCheckError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh and I talked about this - errors will be consolidated in the future.
Fixes #3437