Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for splitting regression sets #112

Merged
merged 7 commits into from
Oct 9, 2019
Merged

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Oct 1, 2019

Resolves #41

@codecov
Copy link

codecov bot commented Oct 2, 2019

Codecov Report

Merging #112 into master will increase coverage by 0.71%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   92.74%   93.45%   +0.71%     
==========================================
  Files          49       50       +1     
  Lines        1268     1299      +31     
==========================================
+ Hits         1176     1214      +38     
+ Misses         92       85       -7
Impacted Files Coverage Δ
...valml/tests/preprocessing_tests/test_split_data.py 100% <100%> (ø)
evalml/preprocessing/utils.py 88.63% <100%> (+17.2%) ⬆️
evalml/tests/conftest.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 698a80a...3f03314. Read the comment docs.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing but looks good to me. @kmax12 @rwedge

@@ -43,24 +43,29 @@ def load_data(path, index, label, drop=None, verbose=True, **kwargs):
return X, y


def split_data(X, y, test_size=.2, random_state=None):
def split_data(X, y, regression=False, test_size=.2, random_state=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Chris, there didn't seem to be a solution to automatically catching the problem type (regression vs. classification) that would satisfy all edges cases. We proposed to set a flag so theres an all-in-one function for users to split data. We saw this as more convenient than creating two separate functions that could be more ambiguous especially when the function names are not indicative of its use case.

evalml/tests/preprocessing_tests/test_split_data.py Outdated Show resolved Hide resolved
@christopherbunn christopherbunn merged commit 3f82885 into master Oct 9, 2019
@christopherbunn christopherbunn deleted the split_data_reg branch October 9, 2019 22:08
@angela97lin angela97lin mentioned this pull request Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split data doesn't support regression
3 participants