-
Notifications
You must be signed in to change notification settings - Fork 91
DatetimeFormatDataCheck for equal interval and sorting #2603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2603 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 295 297 +2
Lines 26895 27027 +132
=======================================
+ Hits 26851 26983 +132
Misses 44 44
Continue to review full report at Codecov.
|
| y_train = infer_feature_types(y_train) | ||
| problem_type = handle_problem_types(problem_type) | ||
|
|
||
| datetime_column = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the outcome of our time series discussions, we can change this if we want to consider the index as the default placement of datetime data for time series. Until then, we'd need the user to pass in a minimum of date_index for time series default objectives.
| y = infer_feature_types(y) | ||
|
|
||
| if self.datetime_column != "index": | ||
| datetime_values = X[self.datetime_column] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not trying to extract every datetime column, just the one that indexes the datetime information
| datetime_values = X.index | ||
| if not isinstance(datetime_values, pd.DatetimeIndex): | ||
| datetime_values = y.index | ||
| if not isinstance(datetime_values, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows the user to just specify index, and we can check the X index first followed by the y index.
| def __init__(self, problem_type, objective, n_splits=3): | ||
| def __init__(self, problem_type, objective, n_splits=3, datetime_column="index"): | ||
| default_checks = self._DEFAULT_DATA_CHECK_CLASSES | ||
| data_check_params = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the layout of this to make it cleaner and more easily understandable as to what data checks and params are being passed across different problem_types.
|
|
||
| @pytest.mark.parametrize("input_type", ["pd", "ww"]) | ||
| @pytest.mark.parametrize( | ||
| "uneven,type_errors", [(True, False), (False, True), (False, False)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks across uneven frequencies and incorrect datetime types
evalml/tests/data_checks_tests/test_datetime_format_data_check.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a couple of nit-picking comments :)
evalml/tests/data_checks_tests/test_datetime_format_data_check.py
Outdated
Show resolved
Hide resolved
evalml/tests/data_checks_tests/test_datetime_format_data_check.py
Outdated
Show resolved
Hide resolved
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh This looks great! I think there are two changes I'd like to make before merge though. The first is disallowing monotonic decreasing columns, the second is returning DataCheckErrors rather than raising TypeErrors in the data check.
…yx/evalml into DataCheck-For-Equal-Interval
Fixes: #2124