Skip to content

Add NoSplit data splitter#2958

Merged
eccabay merged 9 commits intomainfrom
2956_nosplit-cv
Nov 1, 2021
Merged

Add NoSplit data splitter#2958
eccabay merged 9 commits intomainfrom
2956_nosplit-cv

Conversation

@eccabay
Copy link
Copy Markdown
Contributor

@eccabay eccabay commented Oct 25, 2021

Closes #2956

First step of adding unsupervised learning to evalml, this code is lifted directly from my innovation days branch.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 25, 2021

Codecov Report

Merging #2958 (1c6aa23) into main (ffb7334) will decrease coverage by 0.8%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2958     +/-   ##
=======================================
- Coverage   99.7%   99.0%   -0.7%     
=======================================
  Files        307     309      +2     
  Lines      29288   29312     +24     
=======================================
- Hits       29197   29003    -194     
- Misses        91     309    +218     
Impacted Files Coverage Δ
evalml/preprocessing/__init__.py 100.0% <100.0%> (ø)
evalml/preprocessing/data_splitters/__init__.py 100.0% <100.0%> (ø)
evalml/preprocessing/data_splitters/no_split.py 100.0% <100.0%> (ø)
evalml/tests/preprocessing_tests/test_no_split.py 100.0% <100.0%> (ø)
evalml/automl/pipeline_search_plots.py 17.9% <0.0%> (-82.1%) ⬇️
...l/tests/automl_tests/test_pipeline_search_plots.py 25.6% <0.0%> (-74.4%) ⬇️
...ests/automl_tests/test_automl_search_regression.py 84.6% <0.0%> (-15.4%) ⬇️
.../automl_tests/test_automl_search_classification.py 89.7% <0.0%> (-10.3%) ⬇️
evalml/tests/automl_tests/test_automl_utils.py 90.6% <0.0%> (-9.4%) ⬇️
...lml/tests/automl_tests/test_iterative_algorithm.py 94.7% <0.0%> (-5.3%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffb7334...1c6aa23. Read the comment docs.

@eccabay eccabay marked this pull request as ready for review October 25, 2021 20:42
@eccabay eccabay requested a review from a team October 25, 2021 20:42
Copy link
Copy Markdown
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay Curious, is there a way to incorporate the idea of no splitting without introducing another splitter? That is, is there a way to change AutoMLSearch s.t. to take in a splitter, and if the splitter is None then we don't split?

@eccabay
Copy link
Copy Markdown
Contributor Author

eccabay commented Oct 26, 2021

@angela97lin great question! It is possible, I just spent some time playing around with my innovation days branch and proving that it could work. It involves a little refactoring of engine_base.py::train_and_score_pipeline. I originally went with the data splitter object because it seemed simpler at the time, but if it makes more sense to do it the other way, I'm more than happy to update this PR!

Copy link
Copy Markdown
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data splitter that doesn't split! What functionality! So brave. Wow. All joking aside, this looks like this does what you want to do! Definitely would be good to see where this fits into an unsupervised learning epic, but I certainly trust you. I agree with Freddy that returning an empty array better conveys what this splitter is doing. I think the get_n_splits() should also probably return 1!

@angela97lin
Copy link
Copy Markdown
Contributor

@eccabay Gotcha, that makes sense. I think this is great, and love how it makes it simple to not split data within our current structure without much refactoring. If this us to unsupervised learning support more quickly, I'm all on board! We can always support not having a splitter and refactor afterwards 😁

@eccabay eccabay requested a review from chukarsten November 1, 2021 12:34
Copy link
Copy Markdown
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes!

@eccabay eccabay merged commit 64c81d9 into main Nov 1, 2021
@eccabay eccabay deleted the 2956_nosplit-cv branch November 1, 2021 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove requirement of data splitting in search

4 participants