Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a “cost-benefit curve” util method to graph objective score vs binary classification threshold #1081

Merged
merged 40 commits into from Aug 25, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Aug 20, 2020

@angela97lin angela97lin added this to the August 2020 milestone Aug 20, 2020
@angela97lin angela97lin self-assigned this Aug 20, 2020
@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #1081 into main will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1081      +/-   ##
==========================================
+ Coverage   99.89%   99.91%   +0.01%     
==========================================
  Files         192      191       -1     
  Lines       10981    10701     -280     
==========================================
- Hits        10970    10692     -278     
+ Misses         11        9       -2     
Impacted Files Coverage Δ
evalml/demos/breast_cancer.py 100.00% <ø> (ø)
evalml/model_understanding/__init__.py 100.00% <ø> (ø)
evalml/objectives/cost_benefit_matrix.py 100.00% <ø> (ø)
evalml/utils/__init__.py 100.00% <ø> (ø)
evalml/model_understanding/graphs.py 100.00% <100.00%> (ø)
evalml/tests/component_tests/test_components.py 100.00% <100.00%> (+0.39%) ⬆️
...lml/tests/model_understanding_tests/test_graphs.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52ea6d9...4a69dc7. Read the comment docs.

@angela97lin angela97lin changed the title Add a “cost-benefit curve” util method to graph cost-benefit matrix score vs binary classification threshold Add a “cost-benefit curve” util method to graph objective score vs binary classification threshold Aug 20, 2020
@angela97lin angela97lin requested review from dsherry, freddyaboulton and jeremyliweishih and removed request for dsherry and freddyaboulton August 20, 2020 15:53
@angela97lin angela97lin marked this pull request as ready for review August 20, 2020 15:53
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Looks good! I left some comments that I think would be good to resolve before merge. I also left a not-blocking comment to hear your thoughts on an issue we may run into in the near future lol. No need to resolve that comment in this PR.

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
@@ -319,3 +319,65 @@ def graph_permutation_importance(pipeline, X, y, objective, show_all_features=Fa

fig = go.Figure(data=data, layout=layout)
return fig


def binary_objective_vs_threshold(pipeline, X, y, objective, steps=100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not-blocking but is there a way to unify this function with the threshold optimization happening in AutoMLSearch? I think we're going to start computing model understanding methods within the CV folds soon-ish and it'd be nice to not optimize the same function twice.

This is outside the scope of this PR but I want to start the conversation and hear what you guys think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good question, and something I was thinking about too. Right now, automl does a maximum of 100 iterations of trying to minimize our cost function (optimize_threshold in binary_classification_objective.py). Although this method currently just steps through and calculates the objective function at each threshold, I can see us then finding the threshold step that gives us the best score and setting that as a threshold for our pipeline.

I think it'd be a really neat idea to support a threshold_optimization_method parameter to specify how we should optimize for the threshold, which can then allow us to either use our current implementation or this implementation (or anything else we come up with later down the road). That's my two cents!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton @angela97lin we do support threshold optimization. Are you talking about doing that inside this graph data method? I do think it could be a nice addition to compute and plot the optimal threshold, and display the numeric threshold value in the plot. And include that in the plot data of course.

We don't currently expose the optimization method in the API but we could.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Yup, we currently support threshold optimization but we use minimize_scalar to do so; my comment was just suggesting that we could expose the threshold_optimization_method and allow users to choose whether we optimize via minimize_scalar or what we've done here (sweep through possible threshold values, choose one that gives best score). All future work of course!

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! I liked the writeups. The plot looks nice :)

Stuff which I think should be addressed before merge:

  • Resolve the discussion on the plot title. I left a suggested change.
  • Left a question about test_estimator_needs_fitting_false.
  • I left a recommendation for adding a section to the Objectives guide and linking to it in the Model Understanding guide. I guess this isn't required before merge, but it feels important to get to!

"cell_type": "markdown",
"metadata": {},
"source": [
"## Graphing Utilities"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete this header? There isn't another "##"-level header in this file, just "###"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry The other "##" header is for ## Explaining Predictions; I wanted to separate out the Graphing Utilities from the "Explaining Predictions" section 😁

"## Explaining Individual Predictions\n",
"### Binary Objective Score vs. Threshold Graph\n",
"\n",
"For binary classification objectives that we can tune the decision threshold for (objectives that have `score_needs_proba` set to False), we can obtain and graph the scores for thresholds from zero to one, calculated at evenly-spaced intervals determined by `steps`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.

I see we don't mention threshold optimization in the guide yet. I think we should add something about that.

Let's add something like this to the Objectives guide, Core Objectives section, under a sub-heading

Some binary classification objectives like log loss and AUC are unaffected by the choice of binary classification threshold, because they score based on the predicted probabilities or examine a range of threshold values. These metrics are defined with score_needs_proba set to False. For all other binary classification objectives, we can compute the optimal binary classification threshold from the predicted probabilities and the target.

import evalml

class RFBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline):
    component_graph = ['Simple Imputer', 'Random Forest Classifier']

X, y = evalml.demos.load_breast_cancer()

pipeline = RFBinaryClassificationPipeline({})
pipeline.fit(X, y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=['F1']))

pipeline.threshold = evalml.objectives.F1.optimize_threshold(pipeline.predict_proba(X), y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=['F1']))

Then, on this page we can link to that part of the guide.

Also, one suggestion here: "For binary classification objectives that we can tune the decision threshold for" --> "For binary classification objectives which are sensitive to the decision threshold"

@@ -319,3 +319,65 @@ def graph_permutation_importance(pipeline, X, y, objective, show_all_features=Fa

fig = go.Figure(data=data, layout=layout)
return fig


def binary_objective_vs_threshold(pipeline, X, y, objective, steps=100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton @angela97lin we do support threshold optimization. Are you talking about doing that inside this graph data method? I do think it could be a nice addition to compute and plot the optimal threshold, and display the numeric threshold value in the plot. And include that in the plot data of course.

We don't currently expose the optimization method in the API but we could.

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/objectives/cost_benefit_matrix.py Show resolved Hide resolved
evalml/tests/utils_tests/test_graph_utils.py Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
@angela97lin angela97lin merged commit 4870c83 into main Aug 25, 2020
@angela97lin angela97lin deleted the 1026_cost_curve branch August 25, 2020 00:59
@dsherry dsherry mentioned this pull request Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add “cost-benefit curve” to graph metric score vs binary classification threshold
3 participants