Update cost-benefit tutorial to use a holdout/test set #1159

angela97lin · 2020-09-10T19:13:38Z

Closes #1123 by updating tutorial to use a holdout set. This caused changes in performance for both pipelines being compared, so I updated the results and provided analysis of the updated results (esp confusion matrix which is less black-and-white in terms of performance now).

Unfortunately, I was not able to provide consistent numbers for the scores, even after setting random_state. Hence, I updated the docs to calculate the the profit difference dynamically instead of writing it in text. (Did a brief look into embedding the calculated value in markdown but it looks like it would require another package)

Updated docs here: https://evalml.alteryx.com/en/1123_holdout/demos/cost_benefit_matrix.html

codecov · 2020-09-10T19:18:38Z

Codecov Report

Merging #1159 into main will increase coverage by 8.41%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1159      +/-   ##
==========================================
+ Coverage   91.52%   99.93%   +8.41%     
==========================================
  Files         210      210              
  Lines       13247    13247              
==========================================
+ Hits        12124    13239    +1115     
+ Misses       1123        8    -1115

Impacted Files	Coverage Δ
evalml/preprocessing/utils.py	`100.00% <ø> (ø)`
evalml/automl/automl_search.py	`99.59% <0.00%> (+0.40%)`	⬆️
...s/prediction_explanations_tests/test_algorithms.py	`100.00% <0.00%> (+1.11%)`	⬆️
evalml/tests/component_tests/test_components.py	`100.00% <0.00%> (+1.16%)`	⬆️
evalml/utils/gen_utils.py	`100.00% <0.00%> (+1.76%)`	⬆️
evalml/tests/component_tests/test_utils.py	`100.00% <0.00%> (+1.85%)`	⬆️
evalml/tests/pipeline_tests/test_pipelines.py	`100.00% <0.00%> (+3.81%)`	⬆️
...derstanding/prediction_explanations/_algorithms.py	`97.14% <0.00%> (+4.28%)`	⬆️
evalml/pipelines/pipeline_base.py	`100.00% <0.00%> (+6.14%)`	⬆️
evalml/tests/utils_tests/test_dependencies.py	`100.00% <0.00%> (+6.25%)`	⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf8df40...3449a1d. Read the comment docs.

angela97lin · 2020-09-11T21:54:00Z

evalml/preprocessing/utils.py

@@ -58,7 +58,7 @@ def split_data(X, y, regression=False, test_size=.2, random_state=None):
    if regression:
        CV_method = ShuffleSplit(n_splits=1,
                                 test_size=test_size,
-                                 random_state=0)


Unrelated but I think small enough change!

…3_holdout

docs/source/demos/cost_benefit_matrix.ipynb

bchen1116

Great documentation and explanations in the notebook! LGTM

init

037e107

angela97lin added this to the September 2020 milestone Sep 10, 2020

angela97lin self-assigned this Sep 10, 2020

release note

bda71f0

angela97lin added 11 commits September 10, 2020 15:28

Merge branch 'main' into 1123_holdout

63c7c09

Merge branch 'main' into 1123_holdout

337118c

Merge branch 'main' into 1123_holdout

a869bb0

updating numbers

cf20df7

Merge branch 'main' into 1123_holdout

8ad4969

attempt fix

6250442

remove numerics

bda880e

update to calculate in cell

3046c6e

add another cell for profit diff

d5377ab

toggle scrolling

d01b736

remove metadata scrolled value

bdf5c0b

angela97lin commented Sep 11, 2020

View reviewed changes

angela97lin requested review from freddyaboulton, dsherry, bchen1116, eccabay, jeremyliweishih and christopherbunn and removed request for freddyaboulton September 11, 2020 21:54

angela97lin added 3 commits September 11, 2020 17:54

Merge branch 'main' into 1123_holdout

db34494

remove default params

50c228d

Merge branch '1123_holdout' of github.com:FeatureLabs/evalml into 112…

fbd5acb

…3_holdout

angela97lin marked this pull request as ready for review September 11, 2020 21:56

Merge branch 'main' into 1123_holdout

194a805

bchen1116 reviewed Sep 16, 2020

View reviewed changes

docs/source/demos/cost_benefit_matrix.ipynb Show resolved Hide resolved

bchen1116 reviewed Sep 16, 2020

View reviewed changes

docs/source/demos/cost_benefit_matrix.ipynb Outdated Show resolved Hide resolved

bchen1116 approved these changes Sep 16, 2020

View reviewed changes

angela97lin added 5 commits September 17, 2020 13:26

Merge branch 'main' into 1123_holdout

fed5aca

Merge branch 'main' into 1123_holdout

939ef4d

move release notes

5575126

empty for circleci

fdf6a09

Merge branch 'main' into 1123_holdout

b15811b

angela97lin modified the milestones: September 2020, October 2020 Sep 30, 2020

angela97lin added 4 commits October 1, 2020 17:00

Merge branch 'main' into 1123_holdout

fa9c62f

Merge branch 'main' into 1123_holdout

5a104fe

Merge branch 'main' into 1123_holdout

e917270

revert python version

3449a1d

angela97lin merged commit 6ed418e into main Oct 9, 2020

angela97lin deleted the 1123_holdout branch October 9, 2020 19:22

dsherry mentioned this pull request Oct 29, 2020

Release v0.15.0 #1370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cost-benefit tutorial to use a holdout/test set #1159

Update cost-benefit tutorial to use a holdout/test set #1159

angela97lin commented Sep 10, 2020 •

edited

Loading

codecov bot commented Sep 10, 2020 •

edited

Loading

angela97lin Sep 11, 2020

bchen1116 left a comment

Update cost-benefit tutorial to use a holdout/test set #1159

Update cost-benefit tutorial to use a holdout/test set #1159

Conversation

angela97lin commented Sep 10, 2020 • edited Loading

codecov bot commented Sep 10, 2020 • edited Loading

Codecov Report

angela97lin Sep 11, 2020

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

angela97lin commented Sep 10, 2020 •

edited

Loading

codecov bot commented Sep 10, 2020 •

edited

Loading