Added support for multiclass classification to `roc_curve` #1164

christopherbunn · 2020-09-14T15:10:10Z

Moved LabelBinarization code from graph_roc_curve to roc_curve to enable support for multiclass classification. Also updated API docs and model understanding section to include multiclass example.

There's currently a breaking API change where data from roc_curve will now be returned as a list of dicts (with each class represented as a dict with corresponding ROC data). Previously, we were returning a dict with ROC data for a binary class.

Resolves #1063

codecov · 2020-09-14T15:25:52Z

Codecov Report

Merging #1164 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1164   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      196           
  Lines       11729    11780   +51     
=======================================
+ Hits        11720    11771   +51     
  Misses          9        9

Impacted Files	Coverage Δ
evalml/model_understanding/graphs.py	`100.00% <100.00%> (ø)`
...lml/tests/model_understanding_tests/test_graphs.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccc7e05...db65ea9. Read the comment docs.

freddyaboulton

@christopherbunn This looks good to me! My one comment is that I think it would be nice if we could give users the option of passing in the predict_proba dataframe rather than forcing them to pick out the column for the positive class for binary problems.

freddyaboulton · 2020-09-14T22:08:17Z

evalml/model_understanding/graphs.py


    Arguments:
        y_true (pd.Series or np.array): true labels.
-        y_pred_proba (pd.Series or np.array): predictions from a classifier, before thresholding has been applied. Note that 1 dimensional input is expected.
-
+        y_pred_proba (pd.Series or np.array): predictions from a classifier, before thresholding has been applied.


Nit-pick: I think we need to update the docstring because y_pred_proba can now be a dataframe

freddyaboulton · 2020-09-15T19:21:16Z

evalml/model_understanding/graphs.py

+    if isinstance(y_pred_proba, (pd.Series, pd.DataFrame)):
+        y_pred_proba = y_pred_proba.to_numpy()
+
+    if y_pred_proba.ndim == 1:


Maybe we should also check for the binary case case like so:

if y_pred_proba.shape[1] == 2: y_pred_proba = y_pred_proba.iloc[:, 1].reshape(-1, 1)

My thought for doing this is that it would be nice if the api for binary and multiclass classification would be the same. As it stands now, a user has to manually pick out the column for the positive class from the predict_proba dataframe but for multiclass they pass in the entire dataframe.

To be clear, this wouldn't be a breaking change because a user could still pass the probabilities for the dominant class and the y_pred_proba.ndim == 1 case would catch that.

Good catch! I intended for the binary and multiclass API to be the same, but I don't think I caught the fact that the binary case for predict_proba was calculated incorrectly in my original implementation. I've added your code snippet in.

… example

christopherbunn marked this pull request as ready for review September 14, 2020 17:32

christopherbunn requested a review from dsherry September 15, 2020 14:41

freddyaboulton approved these changes Sep 15, 2020

View reviewed changes

christopherbunn added 5 commits September 16, 2020 11:11

Added multiclass support to roc_curve

408a925

Updated docs to include mentions of multiclass and new multiclass ROC…

cc8010a

… example

Fixed graph data errors

efcd016

Updated release notes

154f95f

Removed dict checking logic in graph_roc_curve

1f7df5c

christopherbunn force-pushed the 1063_roc_multiclass branch from 87c881a to e1e904e Compare September 16, 2020 15:13

Updated doc string and added support for binary pred_proba

9e4b66d

christopherbunn force-pushed the 1063_roc_multiclass branch from e1e904e to 9e4b66d Compare September 16, 2020 15:15

christopherbunn and others added 2 commits September 16, 2020 11:34

Added pred_proba output test for test_roc_curve_binary

64df3b0

Merge branch 'main' into 1063_roc_multiclass

db65ea9

christopherbunn merged commit 9d1303c into main Sep 16, 2020

christopherbunn deleted the 1063_roc_multiclass branch September 16, 2020 21:46

This was referenced Sep 17, 2020

Release v0.14.0 #1191

Closed

Release v0.13.2 #1192

Merged

tamargrey mentioned this pull request Mar 15, 2023

Remove nullable handlings where possible from sklearn 1.2.2 upgrade #4072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for multiclass classification to `roc_curve` #1164

Added support for multiclass classification to `roc_curve` #1164

christopherbunn commented Sep 14, 2020 •

edited

Loading

codecov bot commented Sep 14, 2020 •

edited

Loading

freddyaboulton left a comment

freddyaboulton Sep 14, 2020

freddyaboulton Sep 15, 2020

christopherbunn Sep 16, 2020

Added support for multiclass classification to roc_curve #1164

Added support for multiclass classification to roc_curve #1164

Conversation

christopherbunn commented Sep 14, 2020 • edited Loading

codecov bot commented Sep 14, 2020 • edited Loading

Codecov Report

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Sep 14, 2020

Choose a reason for hiding this comment

freddyaboulton Sep 15, 2020

Choose a reason for hiding this comment

christopherbunn Sep 16, 2020

Choose a reason for hiding this comment

Added support for multiclass classification to `roc_curve` #1164

Added support for multiclass classification to `roc_curve` #1164

christopherbunn commented Sep 14, 2020 •

edited

Loading

codecov bot commented Sep 14, 2020 •

edited

Loading