-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cost-benefit tutorial to use a holdout/test set #1159
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1159 +/- ##
==========================================
+ Coverage 91.52% 99.93% +8.41%
==========================================
Files 210 210
Lines 13247 13247
==========================================
+ Hits 12124 13239 +1115
+ Misses 1123 8 -1115
Continue to review full report at Codecov.
|
@@ -58,7 +58,7 @@ def split_data(X, y, regression=False, test_size=.2, random_state=None): | |||
if regression: | |||
CV_method = ShuffleSplit(n_splits=1, | |||
test_size=test_size, | |||
random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated but I think small enough change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation and explanations in the notebook! LGTM
Closes #1123 by updating tutorial to use a holdout set. This caused changes in performance for both pipelines being compared, so I updated the results and provided analysis of the updated results (esp confusion matrix which is less black-and-white in terms of performance now).
Unfortunately, I was not able to provide consistent numbers for the scores, even after setting
random_state
. Hence, I updated the docs to calculate the the profit difference dynamically instead of writing it in text. (Did a brief look into embedding the calculated value in markdown but it looks like it would require another package)Updated docs here: https://evalml.alteryx.com/en/1123_holdout/demos/cost_benefit_matrix.html