Added support for splitting regression sets #112

christopherbunn · 2019-10-01T21:31:52Z

Resolves #41

codecov · 2019-10-02T14:45:46Z

Codecov Report

Merging #112 into master will increase coverage by 0.71%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   92.74%   93.45%   +0.71%     
==========================================
  Files          49       50       +1     
  Lines        1268     1299      +31     
==========================================
+ Hits         1176     1214      +38     
+ Misses         92       85       -7

Impacted Files	Coverage Δ
...valml/tests/preprocessing_tests/test_split_data.py	`100% <100%> (ø)`
evalml/preprocessing/utils.py	`88.63% <100%> (+17.2%)`	⬆️
evalml/tests/conftest.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 698a80a...3f03314. Read the comment docs.

jeremyliweishih

One small thing but looks good to me. @kmax12 @rwedge

jeremyliweishih · 2019-10-08T15:48:33Z

evalml/preprocessing/utils.py

@@ -43,24 +43,29 @@ def load_data(path, index, label, drop=None, verbose=True, **kwargs):
    return X, y


-def split_data(X, y, test_size=.2, random_state=None):
+def split_data(X, y, regression=False, test_size=.2, random_state=None):


After discussion with Chris, there didn't seem to be a solution to automatically catching the problem type (regression vs. classification) that would satisfy all edges cases. We proposed to set a flag so theres an all-in-one function for users to split data. We saw this as more convenient than creating two separate functions that could be more ambiguous especially when the function names are not indicative of its use case.

evalml/tests/preprocessing_tests/test_split_data.py

…plit_data_reg

Added support for splitting regression sets

d1c7afd

Fixed lint errors

0b772a2

christopherbunn requested a review from jeremyliweishih October 2, 2019 15:06

christopherbunn and others added 3 commits October 7, 2019 16:38

Require regression split to be explicitly enabled with param

71c8732

Updated tests to use new split method

31f638b

Merge branch 'master' into split_data_reg

33f5593

jeremyliweishih reviewed Oct 8, 2019

View reviewed changes

christopherbunn added 2 commits October 8, 2019 12:26

Removed hardcoded test values

f824f05

Merge branch 'split_data_reg' of github.com:FeatureLabs/evalml into s…

3f03314

…plit_data_reg

kmax12 approved these changes Oct 9, 2019

View reviewed changes

christopherbunn merged commit 3f82885 into master Oct 9, 2019

christopherbunn deleted the split_data_reg branch October 9, 2019 22:08

angela97lin mentioned this pull request Oct 29, 2019

v0.5.0 #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for splitting regression sets #112

Added support for splitting regression sets #112

christopherbunn commented Oct 1, 2019 •

edited

Loading

codecov bot commented Oct 2, 2019 •

edited

Loading

jeremyliweishih left a comment

jeremyliweishih Oct 8, 2019

Added support for splitting regression sets #112

Added support for splitting regression sets #112

Conversation

christopherbunn commented Oct 1, 2019 • edited Loading

codecov bot commented Oct 2, 2019 • edited Loading

Codecov Report

jeremyliweishih left a comment

Choose a reason for hiding this comment

jeremyliweishih Oct 8, 2019

Choose a reason for hiding this comment

christopherbunn commented Oct 1, 2019 •

edited

Loading

codecov bot commented Oct 2, 2019 •

edited

Loading