Add pipeline threshold to confusion matrix returns #3080

bchen1116 · 2021-11-18T19:56:46Z

There are 2 potential issues that I want to raise, although they are not severe enough to block this PR from being merged.
First is the aesthetic of the index when it is very granular, as can be seen here:

Note that the threshold is 0.99999..., but looking at the dataframe, it just shows as 1.0, which could be confusing for OS users. By grabbing the index itself,, we can see the actual value. We could stringify the index to have them appear, but this might be annoying for our internal use:

The other is when our optimized pipeline threshold doesn't match the best threshold chosen through this method. This can be seen in the problem above, but also can be seen here:

Above, the pipeline is optimized using accuracy binary, and we see the pipeline threshold 0.360321 actually has a worse performance value compared to 0.5 with accuracy, which is the ideal value find_confusion_matrix_per_thresholds finds. The differences here are likely how we're finding the optimal threshold, with our optimize_thresholds using gradient descent, and our current method using a simple linear scan. Is this disparity an issue for our users? Discussing with @freddyaboulton, it could be confusing when our optimal thresholds don't match up. However, what would be the best approach to fix this, if necessary?

Again, these two issues shouldn't block the merge of this PR, but are both things I wanted to bring up for discussion.

codecov · 2021-11-18T19:59:58Z

Codecov Report

Merging #3080 (76b418e) into main (2772536) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3080     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        313     313             
  Lines      30470   30483     +13     
=======================================
+ Hits       30380   30393     +13     
  Misses        90      90

Impacted Files	Coverage Δ
evalml/model_understanding/decision_boundary.py	`100.0% <100.0%> (ø)`
...odel_understanding_tests/test_decision_boundary.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2772536...76b418e. Read the comment docs.

chukarsten

This looks good to me!

angela97lin

Interesting, thanks for outlining these two concerns! Is the first issue because of the way pandas internally handles things? aka it just rounds up rather than printing the actual value? If so, I'd say it's not a big deal given the actual index is still what we expect it to be.

bchen1116 · 2021-11-22T18:31:01Z

@angela97lin yep, that was the first concern! Just an aesthetic issue. Filed an issue to address the second here

ParthivNaresh

Looks good man, thanks for filing an issue for the second point. I think we're fine with the aesthetic issue for now.

add support for getting pipeline thresh

8f6c378

bchen1116 self-assigned this Nov 18, 2021

update rleasenotes

37d17df

bchen1116 added 2 commits November 18, 2021 15:15

linting

5651d96

lint

190c8b8

bchen1116 requested review from freddyaboulton, eccabay, dsherry, angela97lin, christopherbunn, chukarsten, ParthivNaresh and jeremyliweishih November 18, 2021 22:27

chukarsten approved these changes Nov 19, 2021

View reviewed changes

Merge branch 'main' into bc_3079_cf

6d944d1

bchen1116 mentioned this pull request Nov 19, 2021

Optimize_threshold not working as expected #3086

Closed

Merge branch 'main' into bc_3079_cf

8d3079c

angela97lin reviewed Nov 22, 2021

View reviewed changes

Merge branch 'main' into bc_3079_cf

76b418e

ParthivNaresh approved these changes Nov 22, 2021

View reviewed changes

bchen1116 merged commit fcfb9dc into main Nov 22, 2021

chukarsten mentioned this pull request Nov 29, 2021

Release v.0.38.0 #3102

Merged

freddyaboulton deleted the bc_3079_cf branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline threshold to confusion matrix returns #3080

Add pipeline threshold to confusion matrix returns #3080

bchen1116 commented Nov 18, 2021 •

edited

Loading

codecov bot commented Nov 18, 2021 •

edited

Loading

chukarsten left a comment

angela97lin left a comment

bchen1116 commented Nov 22, 2021

ParthivNaresh left a comment

Add pipeline threshold to confusion matrix returns #3080

Add pipeline threshold to confusion matrix returns #3080

Conversation

bchen1116 commented Nov 18, 2021 • edited Loading

codecov bot commented Nov 18, 2021 • edited Loading

Codecov Report

chukarsten left a comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

bchen1116 commented Nov 22, 2021

ParthivNaresh left a comment

Choose a reason for hiding this comment

bchen1116 commented Nov 18, 2021 •

edited

Loading

codecov bot commented Nov 18, 2021 •

edited

Loading