Skip to content

Gap separation#3208

Merged
ParthivNaresh merged 56 commits intomainfrom
Gap_Separation
Jan 18, 2022
Merged

Gap separation#3208
ParthivNaresh merged 56 commits intomainfrom
Gap_Separation

Conversation

@ParthivNaresh
Copy link
Copy Markdown
Contributor

@ParthivNaresh ParthivNaresh commented Jan 10, 2022

Fixes: #3078

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 10, 2022

Codecov Report

Merging #3208 (b176569) into main (ca835e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3208     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        326     326             
  Lines      31405   31467     +62     
=======================================
+ Hits       31314   31376     +62     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/exceptions/__init__.py 100.0% <ø> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <ø> (ø)
evalml/tests/demo_tests/test_datasets.py 100.0% <ø> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.9% <ø> (-<0.1%) ⬇️
evalml/demos/weather.py 100.0% <100.0%> (ø)
evalml/exceptions/exceptions.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 97.0% <100.0%> (+0.1%) ⬆️
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca835e0...b176569. Read the comment docs.

Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Thank you for this! I think this is looking good. I made a suggestion to move away from PartialDependenceError and PartialDependenceErrorCode since this isn't being used in partial dependence.

Pinging @fjlanasa for visibility as well.

Comment thread evalml/pipelines/utils.py Outdated
X_train, X, pipeline_params
)
if not (right_length and X_separated_by_gap):
raise PartialDependenceError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'll be better if rather than raising an exception, we return a tuple of bool and List[ValidationErrorCode]?

  • If the dataset is valid, return True, []
  • If the dataset does not have right length but is separated by gap, return False, [NotRightLength],
  • If the dataset has right length but is not separated by gap, return False, [NotSeparatedByGap]
  • If the dataset is not right length and not separated by gap, return False, [NotRightLength, Not SeparatedByGap]

If we do it this way, it might be easier to communicate which of the two criteria was not met.
What do you think? FYI @fjlanasa


Need to do this so that we have all the data we need to compute lagged features on the holdout set.
"""
from evalml.pipelines.utils import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the import to top of file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up running into a circular dependency issue unfortunately

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Let's move it to gen_utils then? That's where are_ts_parameters_valid_for_split so I think its sensible to include it there.

Copy link
Copy Markdown
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Just have a question about why we seem to have duplicate modifications of the weather data set and a hypernitpick!

Comment thread evalml/tests/demo_tests/test_datasets.py Outdated
Comment thread evalml/utils/gen_utils.py Outdated
@ParthivNaresh ParthivNaresh merged commit af36015 into main Jan 18, 2022
@chukarsten chukarsten mentioned this pull request Jan 18, 2022
@freddyaboulton freddyaboulton deleted the Gap_Separation branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure date_index of test set is separated by gap from the training set

4 participants