Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeSeriesParametersDataCheck to DefaultDataChecks #3139

Merged
merged 19 commits into from Dec 14, 2021

Conversation

ParthivNaresh
Copy link
Contributor

Fixes #3125


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Dec 9, 2021

Codecov Report

Merging #3139 (15d5d80) into main (2ea5644) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3139     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        318     318             
  Lines      30816   30818      +2     
=======================================
+ Hits       30712   30714      +2     
  Misses       104     104             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.9% <100.0%> (-<0.1%) ⬇️
evalml/data_checks/default_data_checks.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (ø)
evalml/tests/automl_tests/test_search.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_search_iterative.py 100.0% <100.0%> (ø)
evalml/tests/data_checks_tests/test_data_checks.py 100.0% <100.0%> (ø)
...data_checks_tests/test_ts_parameters_data_check.py 100.0% <100.0%> (ø)
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
evalml/utils/gen_utils.py 98.7% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ea5644...15d5d80. Read the comment docs.

# Conflicts:
#	docs/source/release_notes.rst
#	evalml/automl/automl_search.py
#	evalml/tests/automl_tests/test_automl.py
#	evalml/tests/automl_tests/test_search.py
#	evalml/tests/automl_tests/test_search_iterative.py
#	evalml/utils/gen_utils.py
@@ -150,21 +150,6 @@ def search(
y_train = infer_feature_types(y_train)
problem_type = handle_problem_types(problem_type)

datetime_column = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated this time_index check into contains_all_ts_parameters to reduce repeat code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call contains_all_ts_paramters here to keep the same behavior, i.e. validating the problem config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause with this change, if user passes in a dict without time_index for example, they get a KeyError as opposed to us telling them the problem_configuration is not valid.

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Parthiv. Thanks for doing this. Left a minor comment that you can take or leave!

automl, data_check_results = search_iterative(
X_train=X,
y_train=y,
problem_type="time series regression",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I like deleting code, shouldn't these tests still pass?

@@ -7,12 +7,33 @@
)


def test_time_series_param_data_check_raises_value_error():
@pytest.mark.parametrize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could parametrize all 4 variable individually between the values [1, None, "missing"], if you wanted complete coverage!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ParthivNaresh ! This looks good to me. The only thing I think we need to address before merge is validating the problem config in the lower-case search methods like we currently do on main. I think that will mean we don't have to delete the tests @chukarsten asked about.

@@ -150,21 +150,6 @@ def search(
y_train = infer_feature_types(y_train)
problem_type = handle_problem_types(problem_type)

datetime_column = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call contains_all_ts_paramters here to keep the same behavior, i.e. validating the problem config?

@@ -150,21 +150,6 @@ def search(
y_train = infer_feature_types(y_train)
problem_type = handle_problem_types(problem_type)

datetime_column = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause with this change, if user passes in a dict without time_index for example, they get a KeyError as opposed to us telling them the problem_configuration is not valid.

@ParthivNaresh ParthivNaresh merged commit 2732ca6 into main Dec 14, 2021
@angela97lin angela97lin mentioned this pull request Dec 22, 2021
@freddyaboulton freddyaboulton deleted the AddTSDataCheckToDefaultDC branch May 13, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add TimeSeriesParametersDataCheck to DefaultDataChecks
3 participants