Add easier way to determine whether data splitter is CV#3297
Add easier way to determine whether data splitter is CV#3297
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3297 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 322 324 +2
Lines 31714 31764 +50
=======================================
+ Hits 31624 31674 +50
Misses 90 90
Continue to review full report at Codecov.
|
chukarsten
left a comment
There was a problem hiding this comment.
Thanks for tackling, Bryan. I think we might need to rethink dynamically adding the attribute to the sklearn object, though. Perhaps a subclass of KFold/StratifiedKFold?
evalml/automl/utils.py
Outdated
| return KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle) | ||
| kfold = KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle) | ||
| # can set this to true directly since k-fold requires >1 splits | ||
| kfold.is_cv = True |
There was a problem hiding this comment.
This is kind of worrisome. The KFold class is an sklearn object. There's not really much reason for contributors or other devs to expect this attribute added to the standard sklearn object if they don't know about this code segment that modifies it. Maybe we should consider a simple class wrapper with the same name the inherits from KFold and StratifiedKFold but defines the property as the other splitters do. Curious what others think...
There was a problem hiding this comment.
@chukarsten I added a quick fix to this where we define our own classes and add is_cv as a property to that! The performance shouldn't change otherwise though. Let me know what you think
chukarsten
left a comment
There was a problem hiding this comment.
Thanks so much @bchen1116 !! Good to go.
angela97lin
left a comment
There was a problem hiding this comment.
LGTM, left a comment about making is_cv abstract for our base class but not blocking
fix #3098