-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a “cost-benefit curve” util method to graph objective score vs binary classification threshold #1081
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1081 +/- ##
==========================================
+ Coverage 99.89% 99.91% +0.01%
==========================================
Files 192 191 -1
Lines 10981 10701 -280
==========================================
- Hits 10970 10692 -278
+ Misses 11 9 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good! I left some comments that I think would be good to resolve before merge. I also left a not-blocking comment to hear your thoughts on an issue we may run into in the near future lol. No need to resolve that comment in this PR.
@@ -319,3 +319,65 @@ def graph_permutation_importance(pipeline, X, y, objective, show_all_features=Fa | |||
|
|||
fig = go.Figure(data=data, layout=layout) | |||
return fig | |||
|
|||
|
|||
def binary_objective_vs_threshold(pipeline, X, y, objective, steps=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not-blocking but is there a way to unify this function with the threshold optimization happening in AutoMLSearch
? I think we're going to start computing model understanding methods within the CV folds soon-ish and it'd be nice to not optimize the same function twice.
This is outside the scope of this PR but I want to start the conversation and hear what you guys think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good question, and something I was thinking about too. Right now, automl does a maximum of 100 iterations of trying to minimize our cost function (optimize_threshold
in binary_classification_objective.py
). Although this method currently just steps through and calculates the objective function at each threshold, I can see us then finding the threshold step that gives us the best score and setting that as a threshold for our pipeline.
I think it'd be a really neat idea to support a threshold_optimization_method
parameter to specify how we should optimize for the threshold, which can then allow us to either use our current implementation or this implementation (or anything else we come up with later down the road). That's my two cents!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton @angela97lin we do support threshold optimization. Are you talking about doing that inside this graph data method? I do think it could be a nice addition to compute and plot the optimal threshold, and display the numeric threshold value in the plot. And include that in the plot data of course.
We don't currently expose the optimization method in the API but we could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry Yup, we currently support threshold optimization but we use minimize_scalar
to do so; my comment was just suggesting that we could expose the threshold_optimization_method
and allow users to choose whether we optimize via minimize_scalar
or what we've done here (sweep through possible threshold values, choose one that gives best score). All future work of course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! I liked the writeups. The plot looks nice :)
Stuff which I think should be addressed before merge:
- Resolve the discussion on the plot title. I left a suggested change.
- Left a question about
test_estimator_needs_fitting_false
. - I left a recommendation for adding a section to the Objectives guide and linking to it in the Model Understanding guide. I guess this isn't required before merge, but it feels important to get to!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Graphing Utilities" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's delete this header? There isn't another "##
"-level header in this file, just "###
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry The other "##" header is for ## Explaining Predictions
; I wanted to separate out the Graphing Utilities from the "Explaining Predictions" section 😁
"## Explaining Individual Predictions\n", | ||
"### Binary Objective Score vs. Threshold Graph\n", | ||
"\n", | ||
"For binary classification objectives that we can tune the decision threshold for (objectives that have `score_needs_proba` set to False), we can obtain and graph the scores for thresholds from zero to one, calculated at evenly-spaced intervals determined by `steps`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
I see we don't mention threshold optimization in the guide yet. I think we should add something about that.
Let's add something like this to the Objectives guide, Core Objectives section, under a sub-heading
Some binary classification objectives like log loss and AUC are unaffected by the choice of binary classification threshold, because they score based on the predicted probabilities or examine a range of threshold values. These metrics are defined with
score_needs_proba
set toFalse
. For all other binary classification objectives, we can compute the optimal binary classification threshold from the predicted probabilities and the target.import evalml class RFBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline): component_graph = ['Simple Imputer', 'Random Forest Classifier'] X, y = evalml.demos.load_breast_cancer() pipeline = RFBinaryClassificationPipeline({}) pipeline.fit(X, y) print(pipeline.threshold) print(pipeline.score(X, y, objectives=['F1'])) pipeline.threshold = evalml.objectives.F1.optimize_threshold(pipeline.predict_proba(X), y) print(pipeline.threshold) print(pipeline.score(X, y, objectives=['F1']))
Then, on this page we can link to that part of the guide.
Also, one suggestion here: "For binary classification objectives that we can tune the decision threshold for" --> "For binary classification objectives which are sensitive to the decision threshold"
@@ -319,3 +319,65 @@ def graph_permutation_importance(pipeline, X, y, objective, show_all_features=Fa | |||
|
|||
fig = go.Figure(data=data, layout=layout) | |||
return fig | |||
|
|||
|
|||
def binary_objective_vs_threshold(pipeline, X, y, objective, steps=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton @angela97lin we do support threshold optimization. Are you talking about doing that inside this graph data method? I do think it could be a nice addition to compute and plot the optimal threshold, and display the numeric threshold value in the plot. And include that in the plot data of course.
We don't currently expose the optimization method in the API but we could.
Closes #1026, continuation and update from #1066
(From https://evalml.alteryx.com/en/1026_cost_curve/user_guide/model_understanding.html#Graphing-Utilities)