Skip to content

Conversation

@ParthivNaresh
Copy link
Contributor

Fixes: #2124

@codecov
Copy link

codecov bot commented Aug 6, 2021

Codecov Report

Merging #2603 (f182076) into main (06d05ed) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2603     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        295     297      +2     
  Lines      26895   27027    +132     
=======================================
+ Hits       26851   26983    +132     
  Misses        44      44             
Impacted Files Coverage Δ
evalml/data_checks/data_checks.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/data_checks/__init__.py 100.0% <100.0%> (ø)
evalml/data_checks/data_check_message_code.py 100.0% <100.0%> (ø)
evalml/data_checks/datetime_format_data_check.py 100.0% <100.0%> (ø)
evalml/data_checks/default_data_checks.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_search.py 100.0% <100.0%> (ø)
evalml/tests/data_checks_tests/test_data_checks.py 100.0% <100.0%> (ø)
...ta_checks_tests/test_datetime_format_data_check.py 100.0% <100.0%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06d05ed...f182076. Read the comment docs.

@ParthivNaresh ParthivNaresh self-assigned this Aug 9, 2021
y_train = infer_feature_types(y_train)
problem_type = handle_problem_types(problem_type)

datetime_column = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the outcome of our time series discussions, we can change this if we want to consider the index as the default placement of datetime data for time series. Until then, we'd need the user to pass in a minimum of date_index for time series default objectives.

y = infer_feature_types(y)

if self.datetime_column != "index":
datetime_values = X[self.datetime_column]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not trying to extract every datetime column, just the one that indexes the datetime information

datetime_values = X.index
if not isinstance(datetime_values, pd.DatetimeIndex):
datetime_values = y.index
if not isinstance(datetime_values, pd.DatetimeIndex):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the user to just specify index, and we can check the X index first followed by the y index.

def __init__(self, problem_type, objective, n_splits=3):
def __init__(self, problem_type, objective, n_splits=3, datetime_column="index"):
default_checks = self._DEFAULT_DATA_CHECK_CLASSES
data_check_params = {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the layout of this to make it cleaner and more easily understandable as to what data checks and params are being passed across different problem_types.


@pytest.mark.parametrize("input_type", ["pd", "ww"])
@pytest.mark.parametrize(
"uneven,type_errors", [(True, False), (False, True), (False, False)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks across uneven frequencies and incorrect datetime types

@ParthivNaresh ParthivNaresh marked this pull request as ready for review August 9, 2021 19:16
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a couple of nit-picking comments :)

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh This looks great! I think there are two changes I'd like to make before merge though. The first is disallowing monotonic decreasing columns, the second is returning DataCheckErrors rather than raising TypeErrors in the data check.

@ParthivNaresh ParthivNaresh merged commit ee9aabd into main Aug 11, 2021
@chukarsten chukarsten mentioned this pull request Aug 12, 2021
@freddyaboulton freddyaboulton deleted the DataCheck-For-Equal-Interval branch May 13, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataCheck for TimeSeries problems - Equal interval data

4 participants