-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal target check to ensure no class missing from train/val #1226
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1226 +/- ##
==========================================
+ Coverage 91.52% 99.92% +8.40%
==========================================
Files 200 200
Lines 12293 12339 +46
==========================================
+ Hits 11251 12330 +1079
+ Misses 1042 9 -1033
Continue to review full report at Codecov.
|
Of the original issues, this PR addresses issues 1 and 3. Issue 2 has been addressed with PR 1135.
Doc for points 4 and 5 is here |
evalml/automl/automl_search.py
Outdated
@@ -607,10 +607,18 @@ def _compute_cv_scores(self, pipeline, X, y): | |||
start = time.time() | |||
cv_data = [] | |||
logger.info("\tStarting cross validation") | |||
warnings.filterwarnings("ignore", lineno=665) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add filter to catch and suppress SKLearn's warning for having too few cases of a target given n_splits
=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 I have some minor suggestions to improve the implementation and a question regarding your comment about changing this for TrainingValidationSplit
- otherwise looks great!
evalml/automl/automl_search.py
Outdated
@@ -607,10 +607,18 @@ def _compute_cv_scores(self, pipeline, X, y): | |||
start = time.time() | |||
cv_data = [] | |||
logger.info("\tStarting cross validation") | |||
warnings.filterwarnings("ignore", lineno=665) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton about my comment for |
fix #760
Throws an error if the data split results in missing target values in either the train or validation sets