Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap separated training and test #3160

Merged
merged 24 commits into from Jan 5, 2022
Merged

Conversation

ParthivNaresh
Copy link
Contributor

Fixes #3078


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Dec 20, 2021

Codecov Report

Merging #3160 (8ccd12f) into main (f3472c2) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3160     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        324     324             
  Lines      31248   31302     +54     
=======================================
+ Hits       31144   31198     +54     
  Misses       104     104             
Impacted Files Coverage Δ
evalml/demos/weather.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 96.2% <100.0%> (ø)
evalml/tests/demo_tests/test_datasets.py 100.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.9% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3472c2...8ccd12f. Read the comment docs.

@@ -15,4 +17,18 @@ def load_weather():
+ evalml.__version__
)
X, y = load_data(filename, index=None, target="Temp")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two dates missing from this dataset which prevented the new implementation in this PR from working properly. Because we no longer rely on the index to determine validity of the training/test separation, the time_index column in X has to be as we expect it to be (ordered, interval has to be detected, datetime based) for the new tools in woodwork 0.10.0 to be usable.

@ParthivNaresh ParthivNaresh marked this pull request as ready for review December 22, 2021 16:26
temporal_columns=[train_copy.ww.time_index]
)
freq = X_frequency_dict[test_copy.ww.time_index]
if freq is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added regarding a conversation with @freddyaboulton. He raised the valid point that we don't want AutoMLSearch to pass only to fail when a user tries and uses predict. In this case, AutoMLSearch will raise this error if the time_index does not have values that can be inferred.
I'm up for a discussion regarding this since I know it touches a lot of time series and is essentially the first time that our whole algorithm will be directly exposed to a proper interval frequency check (outside of DateTimeFormatDataCheck).

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! Left some questions about test coverage that I think need to be addressed first

evalml/tests/conftest.py Show resolved Hide resolved
evalml/tests/demo_tests/test_datasets.py Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions! This looks good to me!

@ParthivNaresh ParthivNaresh merged commit 47d4867 into main Jan 5, 2022
@ParthivNaresh ParthivNaresh mentioned this pull request Jan 10, 2022
@freddyaboulton freddyaboulton deleted the Gap_Separated_Training_Test branch May 13, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure date_index of test set is separated by gap from the training set
2 participants