Fix binary classification pipeline thresholding#3280
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3280 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 322 322
Lines 31614 31630 +16
=======================================
+ Hits 31524 31540 +16
Misses 90 90
Continue to review full report at Codecov.
|
| "source": [ | ||
| "lead_scoring_objective = LeadScoring(\n", | ||
| " true_positives=100,\n", | ||
| " true_positives=50,\n", |
There was a problem hiding this comment.
Change needed to get the assert to pass in this file. Likely due to the new thresholding method
There was a problem hiding this comment.
I'm confused by this. If this method finds the global max, and the other found a local max, shouldn't this method give the same score or higher with the same parameters?
| optimal = minimize_scalar( | ||
| cost, bracket=(0, 1), method="Golden", options={"maxiter": 250} | ||
| ) | ||
| optimal = differential_evolution(cost, bounds=[(0, 1)], seed=0) |
There was a problem hiding this comment.
this method is a global minimum optimizer, whereas minimize_scalar was a local minimizer. This will take a little longer, but should allow us to avoid most issues where the optimized threshold doesn't align with our expectations
There was a problem hiding this comment.
Is this supported by perf tests? Would be nice to know before going into the release next week what to expect.
There was a problem hiding this comment.
Feel free to run n_trials=1 if you do them
There was a problem hiding this comment.
Ran them yesterday, will post results shortly!
jeremyliweishih
left a comment
There was a problem hiding this comment.
Looks great to me!
| optimal = minimize_scalar( | ||
| cost, bracket=(0, 1), method="Golden", options={"maxiter": 250} | ||
| ) | ||
| optimal = differential_evolution(cost, bounds=[(0, 1)], seed=0) |
There was a problem hiding this comment.
might be interesting to play with some of the parameters to see if we can get the same performance but quicker. Def not blocking though!
There was a problem hiding this comment.
From my impression, the default values should perform fairly similar in speed to the other optimization methods available to this. I can try running a few other perf tests to double check though!
freddyaboulton
left a comment
There was a problem hiding this comment.
@bchen1116 I'm confused as to why we need to change the parameters in the lead scoring doc. Doesn't that mean this method found a threshold that's less optimas than the previous method?
| "source": [ | ||
| "lead_scoring_objective = LeadScoring(\n", | ||
| " true_positives=100,\n", | ||
| " true_positives=50,\n", |
There was a problem hiding this comment.
I'm confused by this. If this method finds the global max, and the other found a local max, shouldn't this method give the same score or higher with the same parameters?
|
@freddyaboulton, took me a while to track down why the scores changed. Threhsolding param and Lead scoring value [This branch] (I lightly highlighted the threshold and score of the AUC/F1 oriented run): Thresholding param and F1 value [This branch]: Thresholding param and Lead scoring [Main branch]: I've highlighted the optimal objectives here for the optimal thresholds on this new branch. Using the current parameters, the F1 score is highest at a threshold of ~0.44, while the lead scoring is highest at a threshold of ~0.40. Because of this slight deviation, we see a drop in the actual lead scoring value on the holdout data, which is why I needed to change the lead scoring objective parameters. We see that on the main branch, we don't even search for objective scores when the threshold is below 0.42. This would explain why we see such a difference, since the local optimization method on main doesn't search as thoroughly as the global method here. While it does find a local minima, we don't find as optimal of a minima as with this new branch. Although this leads to a lower holdout score on this new branch, that is likely due to the way we structured the Lead scoring objective cost function itself. I believe it's fine to move forward with this PR |
|
Thanks for the explanation @bchen1116 ! |



fix #3086
Perf tests here
report_threshold.html.zip
Tests for two other methods here:
report_threshold_binexp.html.zip
report_threshold_randexp.html.zip
Current defaults are faster, so we will continue with that.
Writeup under name "Global Binary Pipeline Thresholding Perf Tests"