Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap separation #3208

Merged
merged 56 commits into from
Jan 18, 2022
Merged

Gap separation #3208

merged 56 commits into from
Jan 18, 2022

Conversation

ParthivNaresh
Copy link
Contributor

@ParthivNaresh ParthivNaresh commented Jan 10, 2022

Fixes: #3078

@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #3208 (b176569) into main (ca835e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3208     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        326     326             
  Lines      31405   31467     +62     
=======================================
+ Hits       31314   31376     +62     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/exceptions/__init__.py 100.0% <ø> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <ø> (ø)
evalml/tests/demo_tests/test_datasets.py 100.0% <ø> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.9% <ø> (-<0.1%) ⬇️
evalml/demos/weather.py 100.0% <100.0%> (ø)
evalml/exceptions/exceptions.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 97.0% <100.0%> (+0.1%) ⬆️
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca835e0...b176569. Read the comment docs.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Thank you for this! I think this is looking good. I made a suggestion to move away from PartialDependenceError and PartialDependenceErrorCode since this isn't being used in partial dependence.

Pinging @fjlanasa for visibility as well.

X_train, X, pipeline_params
)
if not (right_length and X_separated_by_gap):
raise PartialDependenceError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'll be better if rather than raising an exception, we return a tuple of bool and List[ValidationErrorCode]?

  • If the dataset is valid, return True, []
  • If the dataset does not have right length but is separated by gap, return False, [NotRightLength],
  • If the dataset has right length but is not separated by gap, return False, [NotSeparatedByGap]
  • If the dataset is not right length and not separated by gap, return False, [NotRightLength, Not SeparatedByGap]

If we do it this way, it might be easier to communicate which of the two criteria was not met.
What do you think? FYI @fjlanasa

def _add_training_data_to_X_Y(self, X, y, X_train, y_train):
"""Append the training data to the holdout data.

Need to do this so that we have all the data we need to compute lagged features on the holdout set.
"""
from evalml.pipelines.utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the import to top of file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up running into a circular dependency issue unfortunately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Let's move it to gen_utils then? That's where are_ts_parameters_valid_for_split so I think its sensible to include it there.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Just have a question about why we seem to have duplicate modifications of the weather data set and a hypernitpick!

evalml/tests/demo_tests/test_datasets.py Outdated Show resolved Hide resolved
evalml/utils/gen_utils.py Outdated Show resolved Hide resolved
@ParthivNaresh ParthivNaresh merged commit af36015 into main Jan 18, 2022
@chukarsten chukarsten mentioned this pull request Jan 18, 2022
@freddyaboulton freddyaboulton deleted the Gap_Separation branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure date_index of test set is separated by gap from the training set
4 participants