-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add easier way to determine whether data splitter is CV #3297
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3297 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 322 324 +2
Lines 31714 31764 +50
=======================================
+ Hits 31624 31674 +50
Misses 90 90
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling, Bryan. I think we might need to rethink dynamically adding the attribute to the sklearn object, though. Perhaps a subclass of KFold/StratifiedKFold?
evalml/automl/utils.py
Outdated
return KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle) | ||
kfold = KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle) | ||
# can set this to true directly since k-fold requires >1 splits | ||
kfold.is_cv = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of worrisome. The KFold class is an sklearn object. There's not really much reason for contributors or other devs to expect this attribute added to the standard sklearn object if they don't know about this code segment that modifies it. Maybe we should consider a simple class wrapper with the same name the inherits from KFold
and StratifiedKFold
but defines the property as the other splitters do. Curious what others think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten I added a quick fix to this where we define our own classes and add is_cv
as a property to that! The performance shouldn't change otherwise though. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @bchen1116 !! Good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a comment about making is_cv abstract for our base class but not blocking
fix #3098