Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cost-benefit tutorial to use a holdout/test set #1159

Merged
merged 26 commits into from
Oct 9, 2020
Merged

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Sep 10, 2020

Closes #1123 by updating tutorial to use a holdout set. This caused changes in performance for both pipelines being compared, so I updated the results and provided analysis of the updated results (esp confusion matrix which is less black-and-white in terms of performance now).

Unfortunately, I was not able to provide consistent numbers for the scores, even after setting random_state. Hence, I updated the docs to calculate the the profit difference dynamically instead of writing it in text. (Did a brief look into embedding the calculated value in markdown but it looks like it would require another package)

Updated docs here: https://evalml.alteryx.com/en/1123_holdout/demos/cost_benefit_matrix.html

@angela97lin angela97lin added this to the September 2020 milestone Sep 10, 2020
@angela97lin angela97lin self-assigned this Sep 10, 2020
@codecov
Copy link

codecov bot commented Sep 10, 2020

Codecov Report

Merging #1159 into main will increase coverage by 8.41%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1159      +/-   ##
==========================================
+ Coverage   91.52%   99.93%   +8.41%     
==========================================
  Files         210      210              
  Lines       13247    13247              
==========================================
+ Hits        12124    13239    +1115     
+ Misses       1123        8    -1115     
Impacted Files Coverage Δ
evalml/preprocessing/utils.py 100.00% <ø> (ø)
evalml/automl/automl_search.py 99.59% <0.00%> (+0.40%) ⬆️
...s/prediction_explanations_tests/test_algorithms.py 100.00% <0.00%> (+1.11%) ⬆️
evalml/tests/component_tests/test_components.py 100.00% <0.00%> (+1.16%) ⬆️
evalml/utils/gen_utils.py 100.00% <0.00%> (+1.76%) ⬆️
evalml/tests/component_tests/test_utils.py 100.00% <0.00%> (+1.85%) ⬆️
evalml/tests/pipeline_tests/test_pipelines.py 100.00% <0.00%> (+3.81%) ⬆️
...derstanding/prediction_explanations/_algorithms.py 97.14% <0.00%> (+4.28%) ⬆️
evalml/pipelines/pipeline_base.py 100.00% <0.00%> (+6.14%) ⬆️
evalml/tests/utils_tests/test_dependencies.py 100.00% <0.00%> (+6.25%) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf8df40...3449a1d. Read the comment docs.

@@ -58,7 +58,7 @@ def split_data(X, y, regression=False, test_size=.2, random_state=None):
if regression:
CV_method = ShuffleSplit(n_splits=1,
test_size=test_size,
random_state=0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but I think small enough change!

@angela97lin angela97lin marked this pull request as ready for review September 11, 2020 21:56
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great documentation and explanations in the notebook! LGTM

@angela97lin angela97lin merged commit 6ed418e into main Oct 9, 2020
@angela97lin angela97lin deleted the 1123_holdout branch October 9, 2020 19:22
@dsherry dsherry mentioned this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cost-benefit tutorial to use a holdout/test set
2 participants