Gap separation #3208

ParthivNaresh · 2022-01-10T20:15:58Z

Fixes: #3078

…ing data doesn't have valid frequency

# Conflicts: # evalml/tests/dependency_update_check/latest_dependency_versions.txt

# Conflicts: # evalml/tests/dependency_update_check/minimum_core_requirements.txt

# Conflicts: # docs/source/release_notes.rst

codecov · 2022-01-10T20:25:49Z

Codecov Report

Merging #3208 (b176569) into main (ca835e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3208     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        326     326             
  Lines      31405   31467     +62     
=======================================
+ Hits       31314   31376     +62     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/exceptions/__init__.py	`100.0% <ø> (ø)`
.../pipelines/time_series_classification_pipelines.py	`100.0% <ø> (ø)`
evalml/tests/demo_tests/test_datasets.py	`100.0% <ø> (ø)`
.../tests/pipeline_tests/test_time_series_pipeline.py	`99.9% <ø> (-<0.1%)`	⬇️
evalml/demos/weather.py	`100.0% <100.0%> (ø)`
evalml/exceptions/exceptions.py	`100.0% <100.0%> (ø)`
evalml/pipelines/time_series_pipeline_base.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_engine_base.py	`100.0% <100.0%> (ø)`
evalml/tests/conftest.py	`97.0% <100.0%> (+0.1%)`	⬆️
evalml/tests/utils_tests/test_gen_utils.py	`100.0% <100.0%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca835e0...b176569. Read the comment docs.

…to Gap_Separation

freddyaboulton

@ParthivNaresh Thank you for this! I think this is looking good. I made a suggestion to move away from PartialDependenceError and PartialDependenceErrorCode since this isn't being used in partial dependence.

Pinging @fjlanasa for visibility as well.

freddyaboulton · 2022-01-13T19:59:01Z

evalml/pipelines/utils.py

+        X_train, X, pipeline_params
+    )
+    if not (right_length and X_separated_by_gap):
+        raise PartialDependenceError(


Maybe it'll be better if rather than raising an exception, we return a tuple of bool and List[ValidationErrorCode]?

If the dataset is valid, return True, []

If the dataset does not have right length but is separated by gap, return False, [NotRightLength],

If the dataset has right length but is not separated by gap, return False, [NotSeparatedByGap]

If the dataset is not right length and not separated by gap, return False, [NotRightLength, Not SeparatedByGap]

If we do it this way, it might be easier to communicate which of the two criteria was not met.
What do you think? FYI @fjlanasa

freddyaboulton · 2022-01-13T19:59:16Z

evalml/pipelines/time_series_pipeline_base.py

    def _add_training_data_to_X_Y(self, X, y, X_train, y_train):
        """Append the training data to the holdout data.

        Need to do this so that we have all the data we need to compute lagged features on the holdout set.
        """
+        from evalml.pipelines.utils import (


Can we move the import to top of file?

We end up running into a circular dependency issue unfortunately

Gotcha. Let's move it to gen_utils then? That's where are_ts_parameters_valid_for_split so I think its sensible to include it there.

…to Gap_Separation

chukarsten

This looks good! Just have a question about why we seem to have duplicate modifications of the weather data set and a hypernitpick!

evalml/tests/demo_tests/test_datasets.py

evalml/utils/gen_utils.py

ParthivNaresh added 25 commits December 20, 2021 15:35

Initial commit

51ef066

Merge branch 'main' into Gap_Separated_Training_Test

b2f2ca4

release notes

58d4dcd

Update tests and pin min woodwork to 0.10.0

ad6eb04

update weather demo to fill in missing values

21a9e4d

update demo test

56bb4d4

add nltk==3.6.5

052fc8b

nltk==3.6.5

e9f4bd9

plotly and nltk

b9ea305

no message

9b98f74

no message

528da11

add ValueError for _are_datasets_separated_by_gap_time_index if train…

d4d98c8

…ing data doesn't have valid frequency

no message

95e0bc5

no message

6e8d3e3

Merge branch 'main' into Gap_Separated_Training_Test

95e08ea

# Conflicts: # evalml/tests/dependency_update_check/latest_dependency_versions.txt

no message

b9fdd28

no message

7efde41

no message

599ce89

no message

81a8fc9

Trigger Build

8e42bf3

Merge branch 'main' into Gap_Separated_Training_Test

b25d550

# Conflicts: # evalml/tests/dependency_update_check/minimum_core_requirements.txt

enhance tests

b105535

lint fix

358bd2a

Merge branch 'main' into Gap_Separated_Training_Test

8ccd12f

Merge branch 'main' into Gap_Separation

853fcf2

# Conflicts: # docs/source/release_notes.rst

ParthivNaresh added 4 commits January 11, 2022 11:05

Merge branch 'main' into Gap_Separation

099bf32

no message

035d9a6

Merge branch 'main' into Gap_Separation

fc4c67a

Merge branch 'Gap_Separation' of https://github.com/alteryx/evalml in…

98bcb4e

…to Gap_Separation

ParthivNaresh requested review from freddyaboulton, bchen1116, chukarsten, angela97lin, eccabay and jeremyliweishih and removed request for freddyaboulton January 13, 2022 16:32

freddyaboulton reviewed Jan 13, 2022

View reviewed changes

ParthivNaresh added 13 commits January 14, 2022 10:16

Merge branch 'main' into Gap_Separation

ec682b0

update to not use partialdependence code

1fd8a1d

Merge branch 'Gap_Separation' of https://github.com/alteryx/evalml in…

352e6ae

…to Gap_Separation

lint

69e4ca5

no message

7c6204c

no message

e985fae

move from pipeline utils to gen utils

e04d628

no message

3f3375b

no message

1db24ed

Merge branch 'main' into Gap_Separation

3bb4b90

no message

5b7974e

Merge branch 'main' into Gap_Separation

7412f48

Merge branch 'main' into Gap_Separation

af1b7b9

chukarsten approved these changes Jan 18, 2022

View reviewed changes

evalml/tests/demo_tests/test_datasets.py Outdated Show resolved Hide resolved

evalml/utils/gen_utils.py Outdated Show resolved Hide resolved

ParthivNaresh added 3 commits January 18, 2022 09:56

test change

adf67bf

test fix

2f6c4d2

no message

b176569

ParthivNaresh merged commit af36015 into main Jan 18, 2022

chukarsten mentioned this pull request Jan 18, 2022

Release v0.42.0 #3255

Merged

dsherry mentioned this pull request Jan 21, 2022

Improve documentation for validate_holdout_datasets #3270

Closed

freddyaboulton deleted the Gap_Separation branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gap separation #3208

Gap separation #3208

ParthivNaresh commented Jan 10, 2022 •

edited

codecov bot commented Jan 10, 2022 •

edited

freddyaboulton left a comment

freddyaboulton Jan 13, 2022

freddyaboulton Jan 13, 2022

ParthivNaresh Jan 13, 2022

freddyaboulton Jan 13, 2022

chukarsten left a comment

Gap separation #3208

Gap separation #3208

Conversation

ParthivNaresh commented Jan 10, 2022 • edited

codecov bot commented Jan 10, 2022 • edited

Codecov Report

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Jan 13, 2022

Choose a reason for hiding this comment

freddyaboulton Jan 13, 2022

Choose a reason for hiding this comment

ParthivNaresh Jan 13, 2022

Choose a reason for hiding this comment

freddyaboulton Jan 13, 2022

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

ParthivNaresh commented Jan 10, 2022 •

edited

codecov bot commented Jan 10, 2022 •

edited