Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add easier way to determine whether data splitter is CV #3297

Merged
merged 10 commits into from
Feb 7, 2022
Merged

Conversation

bchen1116
Copy link
Contributor

fix #3098

@bchen1116 bchen1116 self-assigned this Feb 1, 2022
@codecov
Copy link

codecov bot commented Feb 1, 2022

Codecov Report

Merging #3297 (e25b87c) into main (4fdcf63) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3297     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        322     324      +2     
  Lines      31714   31764     +50     
=======================================
+ Hits       31624   31674     +50     
  Misses        90      90             
Impacted Files Coverage Δ
evalml/automl/utils.py 100.0% <ø> (ø)
evalml/preprocessing/data_splitters/__init__.py 100.0% <100.0%> (ø)
evalml/preprocessing/data_splitters/no_split.py 100.0% <100.0%> (ø)
...valml/preprocessing/data_splitters/sk_splitters.py 100.0% <100.0%> (ø)
.../preprocessing/data_splitters/time_series_split.py 96.7% <100.0%> (+0.4%) ⬆️
...essing/data_splitters/training_validation_split.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl_utils.py 100.0% <100.0%> (ø)
evalml/tests/preprocessing_tests/test_no_split.py 100.0% <100.0%> (ø)
...lml/tests/preprocessing_tests/test_sk_splitters.py 100.0% <100.0%> (ø)
...processing_tests/test_training_validation_split.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fdcf63...e25b87c. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling, Bryan. I think we might need to rethink dynamically adding the attribute to the sklearn object, though. Perhaps a subclass of KFold/StratifiedKFold?

return KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle)
kfold = KFold(n_splits=n_splits, random_state=random_seed, shuffle=shuffle)
# can set this to true directly since k-fold requires >1 splits
kfold.is_cv = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of worrisome. The KFold class is an sklearn object. There's not really much reason for contributors or other devs to expect this attribute added to the standard sklearn object if they don't know about this code segment that modifies it. Maybe we should consider a simple class wrapper with the same name the inherits from KFold and StratifiedKFold but defines the property as the other splitters do. Curious what others think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten I added a quick fix to this where we define our own classes and add is_cv as a property to that! The performance shouldn't change otherwise though. Let me know what you think

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @bchen1116 !! Good to go.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a comment about making is_cv abstract for our base class but not blocking

@bchen1116 bchen1116 merged commit 465ae93 into main Feb 7, 2022
@chukarsten chukarsten mentioned this pull request Feb 18, 2022
@freddyaboulton freddyaboulton deleted the bc_3098_cv branch May 13, 2022 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easy way to determine whether or not a data splitter counts as a CV splitter
3 participants