-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gap separation #3208
Gap separation #3208
Conversation
…ing data doesn't have valid frequency
# Conflicts: # evalml/tests/dependency_update_check/latest_dependency_versions.txt
# Conflicts: # evalml/tests/dependency_update_check/minimum_core_requirements.txt
# Conflicts: # docs/source/release_notes.rst
Codecov Report
@@ Coverage Diff @@
## main #3208 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 326 326
Lines 31405 31467 +62
=======================================
+ Hits 31314 31376 +62
Misses 91 91
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh Thank you for this! I think this is looking good. I made a suggestion to move away from PartialDependenceError
and PartialDependenceErrorCode
since this isn't being used in partial dependence.
Pinging @fjlanasa for visibility as well.
evalml/pipelines/utils.py
Outdated
X_train, X, pipeline_params | ||
) | ||
if not (right_length and X_separated_by_gap): | ||
raise PartialDependenceError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it'll be better if rather than raising an exception, we return a tuple of bool
and List[ValidationErrorCode]
?
- If the dataset is valid, return
True, []
- If the dataset does not have right length but is separated by gap, return
False, [NotRightLength]
, - If the dataset has right length but is not separated by gap, return
False, [NotSeparatedByGap]
- If the dataset is not right length and not separated by gap, return
False, [NotRightLength, Not SeparatedByGap]
If we do it this way, it might be easier to communicate which of the two criteria was not met.
What do you think? FYI @fjlanasa
def _add_training_data_to_X_Y(self, X, y, X_train, y_train): | ||
"""Append the training data to the holdout data. | ||
|
||
Need to do this so that we have all the data we need to compute lagged features on the holdout set. | ||
""" | ||
from evalml.pipelines.utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the import to top of file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We end up running into a circular dependency issue unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Let's move it to gen_utils
then? That's where are_ts_parameters_valid_for_split
so I think its sensible to include it there.
…to Gap_Separation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Just have a question about why we seem to have duplicate modifications of the weather data set and a hypernitpick!
Fixes: #3078