Add easier way to determine whether data splitter is CV #3297

bchen1116 · 2022-02-01T17:17:40Z

codecov · 2022-02-01T17:22:09Z

Codecov Report

Merging #3297 (e25b87c) into main (4fdcf63) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3297     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        322     324      +2     
  Lines      31714   31764     +50     
=======================================
+ Hits       31624   31674     +50     
  Misses        90      90

Impacted Files	Coverage Δ
evalml/automl/utils.py	`100.0% <ø> (ø)`
evalml/preprocessing/data_splitters/__init__.py	`100.0% <100.0%> (ø)`
evalml/preprocessing/data_splitters/no_split.py	`100.0% <100.0%> (ø)`
...valml/preprocessing/data_splitters/sk_splitters.py	`100.0% <100.0%> (ø)`
.../preprocessing/data_splitters/time_series_split.py	`96.7% <100.0%> (+0.4%)`	⬆️
...essing/data_splitters/training_validation_split.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl_utils.py	`100.0% <100.0%> (ø)`
evalml/tests/preprocessing_tests/test_no_split.py	`100.0% <100.0%> (ø)`
...lml/tests/preprocessing_tests/test_sk_splitters.py	`100.0% <100.0%> (ø)`
...processing_tests/test_training_validation_split.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fdcf63...e25b87c. Read the comment docs.

chukarsten

Thanks for tackling, Bryan. I think we might need to rethink dynamically adding the attribute to the sklearn object, though. Perhaps a subclass of KFold/StratifiedKFold?

chukarsten · 2022-02-01T18:09:02Z

evalml/automl/utils.py

-        return KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle)
+        kfold = KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle)
+        # can set this to true directly since k-fold requires >1 splits
+        kfold.is_cv = True


This is kind of worrisome. The KFold class is an sklearn object. There's not really much reason for contributors or other devs to expect this attribute added to the standard sklearn object if they don't know about this code segment that modifies it. Maybe we should consider a simple class wrapper with the same name the inherits from KFold and StratifiedKFold but defines the property as the other splitters do. Curious what others think...

@chukarsten I added a quick fix to this where we define our own classes and add is_cv as a property to that! The performance shouldn't change otherwise though. Let me know what you think

chukarsten

Thanks so much @bchen1116 !! Good to go.

angela97lin

LGTM, left a comment about making is_cv abstract for our base class but not blocking

evalml/preprocessing/data_splitters/no_split.py

add is_cv

60546fd

bchen1116 self-assigned this Feb 1, 2022

update release ntoes

ffb0ecb

Merge branch 'main' into bc_3098_cv

a0a55cc

bchen1116 marked this pull request as ready for review February 1, 2022 17:39

bchen1116 requested review from eccabay, freddyaboulton, chukarsten, christopherbunn, jeremyliweishih and ParthivNaresh February 1, 2022 17:39

chukarsten suggested changes Feb 1, 2022

View reviewed changes

bchen1116 added 3 commits February 1, 2022 14:39

update impl to not dynamically define is_cv

81a5c46

Merge branch 'bc_3098_cv' of github.com:alteryx/evalml into bc_3098_cv

e2a6e23

linting and adding new files

c50fd95

bchen1116 requested a review from chukarsten February 1, 2022 20:05

chukarsten approved these changes Feb 2, 2022

View reviewed changes

Merge branch 'main' into bc_3098_cv

e003c9a

angela97lin approved these changes Feb 3, 2022

View reviewed changes

evalml/preprocessing/data_splitters/no_split.py Show resolved Hide resolved

bchen1116 added 3 commits February 7, 2022 10:25

Merge branch 'main' into bc_3098_cv

a0bec60

update release notes

e457a47

Merge branch 'main' into bc_3098_cv

e25b87c

bchen1116 merged commit 465ae93 into main Feb 7, 2022

chukarsten mentioned this pull request Feb 18, 2022

Release v0.45.0 #3344

Merged

freddyaboulton deleted the bc_3098_cv branch May 13, 2022 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add easier way to determine whether data splitter is CV #3297

Add easier way to determine whether data splitter is CV #3297

bchen1116 commented Feb 1, 2022

codecov bot commented Feb 1, 2022 •

edited

Loading

chukarsten left a comment

chukarsten Feb 1, 2022

bchen1116 Feb 1, 2022

chukarsten left a comment

angela97lin left a comment

Add easier way to determine whether data splitter is CV #3297

Add easier way to determine whether data splitter is CV #3297

Conversation

bchen1116 commented Feb 1, 2022

codecov bot commented Feb 1, 2022 • edited Loading

Codecov Report

chukarsten left a comment

Choose a reason for hiding this comment

chukarsten Feb 1, 2022

Choose a reason for hiding this comment

bchen1116 Feb 1, 2022

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 1, 2022 •

edited

Loading