Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NoSplit data splitter #2958

Merged
merged 9 commits into from
Nov 1, 2021
Merged

Add NoSplit data splitter #2958

merged 9 commits into from
Nov 1, 2021

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Oct 25, 2021

Closes #2956

First step of adding unsupervised learning to evalml, this code is lifted directly from my innovation days branch.

@codecov
Copy link

codecov bot commented Oct 25, 2021

Codecov Report

Merging #2958 (1c6aa23) into main (ffb7334) will decrease coverage by 0.8%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2958     +/-   ##
=======================================
- Coverage   99.7%   99.0%   -0.7%     
=======================================
  Files        307     309      +2     
  Lines      29288   29312     +24     
=======================================
- Hits       29197   29003    -194     
- Misses        91     309    +218     
Impacted Files Coverage Δ
evalml/preprocessing/__init__.py 100.0% <100.0%> (ø)
evalml/preprocessing/data_splitters/__init__.py 100.0% <100.0%> (ø)
evalml/preprocessing/data_splitters/no_split.py 100.0% <100.0%> (ø)
evalml/tests/preprocessing_tests/test_no_split.py 100.0% <100.0%> (ø)
evalml/automl/pipeline_search_plots.py 17.9% <0.0%> (-82.1%) ⬇️
...l/tests/automl_tests/test_pipeline_search_plots.py 25.6% <0.0%> (-74.4%) ⬇️
...ests/automl_tests/test_automl_search_regression.py 84.6% <0.0%> (-15.4%) ⬇️
.../automl_tests/test_automl_search_classification.py 89.7% <0.0%> (-10.3%) ⬇️
evalml/tests/automl_tests/test_automl_utils.py 90.6% <0.0%> (-9.4%) ⬇️
...lml/tests/automl_tests/test_iterative_algorithm.py 94.7% <0.0%> (-5.3%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffb7334...1c6aa23. Read the comment docs.

@eccabay eccabay marked this pull request as ready for review October 25, 2021 20:42
@eccabay eccabay requested a review from a team October 25, 2021 20:42
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay Curious, is there a way to incorporate the idea of no splitting without introducing another splitter? That is, is there a way to change AutoMLSearch s.t. to take in a splitter, and if the splitter is None then we don't split?

@eccabay
Copy link
Contributor Author

eccabay commented Oct 26, 2021

@angela97lin great question! It is possible, I just spent some time playing around with my innovation days branch and proving that it could work. It involves a little refactoring of engine_base.py::train_and_score_pipeline. I originally went with the data splitter object because it seemed simpler at the time, but if it makes more sense to do it the other way, I'm more than happy to update this PR!

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data splitter that doesn't split! What functionality! So brave. Wow. All joking aside, this looks like this does what you want to do! Definitely would be good to see where this fits into an unsupervised learning epic, but I certainly trust you. I agree with Freddy that returning an empty array better conveys what this splitter is doing. I think the get_n_splits() should also probably return 1!

@angela97lin
Copy link
Contributor

@eccabay Gotcha, that makes sense. I think this is great, and love how it makes it simple to not split data within our current structure without much refactoring. If this us to unsupervised learning support more quickly, I'm all on board! We can always support not having a splitter and refactor afterwards 😁

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes!

@eccabay eccabay merged commit 64c81d9 into main Nov 1, 2021
@eccabay eccabay deleted the 2956_nosplit-cv branch November 1, 2021 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove requirement of data splitting in search
4 participants