Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary classification pipeline thresholding #3280

Merged
merged 25 commits into from
Feb 1, 2022
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jan 25, 2022

fix #3086

Perf tests here
report_threshold.html.zip

Tests for two other methods here:
report_threshold_binexp.html.zip
report_threshold_randexp.html.zip

Current defaults are faster, so we will continue with that.

Writeup under name "Global Binary Pipeline Thresholding Perf Tests"

@bchen1116 bchen1116 self-assigned this Jan 25, 2022
@bchen1116 bchen1116 changed the title simple impl to fix threhsolding simple impl to fix thresholding Jan 25, 2022
@codecov
Copy link

codecov bot commented Jan 25, 2022

Codecov Report

Merging #3280 (bc2d466) into main (a18efa8) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3280     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        322     322             
  Lines      31614   31630     +16     
=======================================
+ Hits       31524   31540     +16     
  Misses        90      90             
Impacted Files Coverage Δ
...alml/objectives/binary_classification_objective.py 100.0% <100.0%> (ø)
...odel_understanding_tests/test_decision_boundary.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a18efa8...bc2d466. Read the comment docs.

@bchen1116 bchen1116 changed the title simple impl to fix thresholding Fix binary classification pipeline thresholding Jan 25, 2022
@@ -41,7 +41,7 @@
"outputs": [],
"source": [
"lead_scoring_objective = LeadScoring(\n",
" true_positives=100,\n",
" true_positives=50,\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change needed to get the assert to pass in this file. Likely due to the new thresholding method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. If this method finds the global max, and the other found a local max, shouldn't this method give the same score or higher with the same parameters?

optimal = minimize_scalar(
cost, bracket=(0, 1), method="Golden", options={"maxiter": 250}
)
optimal = differential_evolution(cost, bounds=[(0, 1)], seed=0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is a global minimum optimizer, whereas minimize_scalar was a local minimizer. This will take a little longer, but should allow us to avoid most issues where the optimized threshold doesn't align with our expectations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supported by perf tests? Would be nice to know before going into the release next week what to expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to run n_trials=1 if you do them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran them yesterday, will post results shortly!

@bchen1116 bchen1116 marked this pull request as draft January 27, 2022 16:05
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

optimal = minimize_scalar(
cost, bracket=(0, 1), method="Golden", options={"maxiter": 250}
)
optimal = differential_evolution(cost, bounds=[(0, 1)], seed=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be interesting to play with some of the parameters to see if we can get the same performance but quicker. Def not blocking though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my impression, the default values should perform fairly similar in speed to the other optimization methods available to this. I can try running a few other perf tests to double check though!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 I'm confused as to why we need to change the parameters in the lead scoring doc. Doesn't that mean this method found a threshold that's less optimas than the previous method?

docs/source/release_notes.rst Outdated Show resolved Hide resolved
@@ -41,7 +41,7 @@
"outputs": [],
"source": [
"lead_scoring_objective = LeadScoring(\n",
" true_positives=100,\n",
" true_positives=50,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. If this method finds the global max, and the other found a local max, shouldn't this method give the same score or higher with the same parameters?

@bchen1116
Copy link
Contributor Author

@freddyaboulton, took me a while to track down why the scores changed.
First, the thresholding data was optimized on a subset of the training data, while the holdout data is used to score. This can lead to deviations in holdout data performance. Along with that, here are two screenshots of the thresholding parameter and associated objective score:

Threhsolding param and Lead scoring value [This branch] (I lightly highlighted the threshold and score of the AUC/F1 oriented run):
image

Thresholding param and F1 value [This branch]:
image

Thresholding param and Lead scoring [Main branch]:
image

I've highlighted the optimal objectives here for the optimal thresholds on this new branch. Using the current parameters, the F1 score is highest at a threshold of ~0.44, while the lead scoring is highest at a threshold of ~0.40. Because of this slight deviation, we see a drop in the actual lead scoring value on the holdout data, which is why I needed to change the lead scoring objective parameters.

We see that on the main branch, we don't even search for objective scores when the threshold is below 0.42. This would explain why we see such a difference, since the local optimization method on main doesn't search as thoroughly as the global method here. While it does find a local minima, we don't find as optimal of a minima as with this new branch. Although this leads to a lower holdout score on this new branch, that is likely due to the way we structured the Lead scoring objective cost function itself.

I believe it's fine to move forward with this PR

@freddyaboulton
Copy link
Contributor

Thanks for the explanation @bchen1116 !

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bchen1116 bchen1116 merged commit e0ca517 into main Feb 1, 2022
@chukarsten chukarsten mentioned this pull request Feb 4, 2022
@freddyaboulton freddyaboulton deleted the bc_3086_threshold branch May 13, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize_threshold not working as expected
4 participants