-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ROC multi-class plotting capability #832
Conversation
Codecov Report
@@ Coverage Diff @@
## master #832 +/- ##
=======================================
Coverage 99.68% 99.68%
=======================================
Files 193 193
Lines 7530 7598 +68
=======================================
+ Hits 7506 7574 +68
Misses 24 24
Continue to review full report at Codecov.
|
@ctduffy this is great! I'll take a look. Could you post an example of what this looks like on a multiclass dataset? A screenshot would be enough. |
evalml/pipelines/graph_utils.py
Outdated
thresholds = dict() | ||
tpr_rates = dict() | ||
auc_scores = dict() | ||
if n_classes == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have this be user-defined rather than something we detect from the target? It feels problematic to me to ask the user to provide this.
We could do something like if len(y_true.dropna().unique()) > 2
then multiclass. I may not have the right syntax for dropna
here!
evalml/pipelines/graph_utils.py
Outdated
|
||
|
||
def graph_roc_curve(y_true, y_pred_proba, title_addition=None): | ||
def graph_roc_curve(y_true, y_pred_proba, labels=None, title_addition=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I like the idea of providing a way for people to override what labels will appear for each class. Perhaps we could call this custom_class_labels
? The term "labels" means something different to me in the context of classification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true good point
evalml/pipelines/graph_utils.py
Outdated
roc_curve_data = roc_curve(y_true, y_pred_proba) | ||
|
||
if len(y_true.shape) > 1: | ||
n_classes = y_true.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works. y_true
is the target column. Did you mean y_pred_proba
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I think this is good.
A warning: our existing unit test coverage appears to assume y_pred_proba
has only one column in the binary case. That's technically incorrect, since binary classifiers should output two columns for predicted probabilities, one for the "false" class and one for the "true" class. Including both in our pipelines' predict_proba
output is redundant, because each row would sum to 1, but we do so because it helps keep consistency with multiclass. So, you'll have to update those tests to set up y_pred_proba
with 2 columns for the binary case.
Relatedly, we have a decision to make. If someone passes in a single column here for y_pred_proba
, we can either assume that's for the "true" class, and document that assumption, or we can throw an error. What do you think? I like the idea of assuming "true".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this works right now because the way that the algo is set up is that it takes in y_true
values that are one-hot encoded, so the y_true
and the y_pred_proba
have the same dimensions. I can change this if need be (ie putting the one-hot encoding into this function instead of having the user do it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should assume it is for the True
class, that makes the most sense to me. Then, when there is a multi-class classification, I can add the one-hot encoding to the function so that it does not have to be passed in that way by the user, and the number of unique objects can be used as the number of classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Yeah, I don't think we should expect or add any one-hot encoding. Its a useful piece of feature engineering for getting good performance from models, but outside of that context I don't think its necessary.
Let's maintain the assumption that y_true
is always a 1d vector of length n_rows
. In the multiclass case, y_pred_proba
will have a number of columns equal to the number of classes; in the binary case, y_pred_proba
could have either 2 columns or just 1 (which we can assume is for the "true" class, i.e. the 2nd column). Does all that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I know what you mean. In order to generate the ROC curve for each class, you need to treat that class as "true" and all others as "false". Doesn't look like that's in this PR yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I was totally missing this before! 😂 So in that case, I have a suggestion. How about you back out the changes you've made to roc_curve
, so that it still only supports the binary case. Then in graph_roc_curve
, you can add code to implement that sort of one-hot encoding. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just committed a version that follows that structure. It seems less redundant and more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions, great start!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Just had a suggestion on testing :)
evalml/pipelines/graph_utils.py
Outdated
"""Generate and display a Receiver Operating Characteristic (ROC) plot. | ||
|
||
Arguments: | ||
y_true (pd.Series or np.array): true binary labels. | ||
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label. | ||
custom_class_labels (list or None): if not None, custom lables for classes. Default None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: lables --> labels
evalml/pipelines/graph_utils.py
Outdated
y_true (pd.Series or np.array): true binary labels. | ||
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove "binary" from the docstrings now? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy looking good! Left one more round of feedback, should be ready to merge after that
evalml/pipelines/graph_utils.py
Outdated
for i in range(n_classes): | ||
roc_curve_data = roc_curve(y_true[:, i], y_pred_proba[:, i]) | ||
data.append(_go.Scatter(x=roc_curve_data['fpr_rates'], y=roc_curve_data['tpr_rates'], | ||
name='ROC (AUC {:06f}) of Class {name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're listing multiple classes, perhaps we could reorganize this. The title of the plot says Receiver Operating Characteristic, so perhaps we can just say 'Class {name} (AUC {:06f})'
here, to keep the legend a bit shorter
evalml/pipelines/graph_utils.py
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
import numpy as np | |||
import pandas as pd | |||
from sklearn import preprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: can you please do from sklearn.preprocessing import LabelBinarizer
?
Weird that lint didn't say to do this! I thought it would have. 🤷♂️
evalml/pipelines/graph_utils.py
Outdated
|
||
lb = preprocessing.LabelBinarizer() | ||
lb.fit(np.unique(y_true)) | ||
y_true = lb.transform(y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're changing the shape and meaning of the originally inputted y_true
here, for readability I'd suggest using a new variable name here.
evalml/pipelines/graph_utils.py
Outdated
lb = preprocessing.LabelBinarizer() | ||
lb.fit(np.unique(y_true)) | ||
y_true = lb.transform(y_true) | ||
n_classes = y_true.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could y_true
ever be one-dimensional? If there's only one unique value in y_true
, will LabelBinarizer.transform
output a 1d or 2d vector? If the answer is 1d, trying to access index 1 would fail.
This would be a good unit test case to add regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It outputs a 2d vector! But am adding testing around that.
roc_curve_data = roc_curve(y_true, y_pred_proba) | ||
|
||
lb = preprocessing.LabelBinarizer() | ||
lb.fit(np.unique(y_true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the behavior of this graph be if any of the target rows, or predicted probabilities, are np.nan
?
I hope those rows would simply be ignored in roc_curve
, but idk what the behavior of that is right now.
I suggest you add a test case for this and see what happens. Ideally, roc_curve
handles nan
s, and all you need to do is update this line to np.unique(y_true).dropna()
to avoid treating np.nan
as a valid class. If not, you could drop nan
s from the input, with something like:
nan_indices = np.isnan(y_true) | np.isnan(y_pred_proba.sum(axis=1))
y_true = y_true[~nan_indices]
y_pred_proba = y_pred_proba[~nan_indices, :]
Otherwise if its too hard to update the impl here to handle np.nan
s in the input, we can detect and throw an error. That sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy do we have unit test coverage yet for when the target or predicted probabilities are nan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry yes, here:
Though I realize that this is mostly just a test to make sure that no error is thrown, but I will go in and add a few assertions after the graphing functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Thanks! Yeah, agreed. In fact, you could split the nan
s part out to a separate unit test, since its independent from the single-value case you're testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed the name, but kept it in the same test, as the second part of the single value testing does involve nans
go = pytest.importorskip('plotly.graph_objects', reason='Skipping plotting test because plotly not installed') | ||
X, y_true = X_y_multi | ||
rs = np.random.RandomState(42) | ||
y_tr = label_binarize(y_true, classes=[0, 1, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we use label_binarize
here but `LabelBinarizer in the function impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LabelBinarizer
doesn't require prior knowledge of classes, whereas the label_binarize does. So in the testing environment we know the classes, and in the graph utils we don't know the labels at that particular time.
@@ -62,18 +63,19 @@ def graph_precision_recall_curve(y_true, y_pred_proba, title_addition=None): | |||
|
|||
def roc_curve(y_true, y_pred_proba): | |||
""" | |||
Given labels and binary classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. | |||
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
evalml/pipelines/graph_utils.py
Outdated
|
||
Returns: | ||
dict: Dictionary containing metrics used to generate an ROC plot, with the following keys: | ||
dict: Dictionary containing metrics used to generate an ROC plot, with the following keys, each of which contains a dictionary for each classification class: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 awesome
evalml/pipelines/graph_utils.py
Outdated
* `fpr_rates`: False positive rates. | ||
* `tpr_rates`: True positive rates. | ||
* `thresholds`: Threshold values used to produce each pair of true/false positive rates. | ||
* `auc_score`: The area under the ROC curve. | ||
* `auc_scores`: The area under the ROC curve. (One vs Rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should say One vs Rest here, because roc_curve
doesn't know anything about multiclass, right?
evalml/pipelines/graph_utils.py
Outdated
y_true (pd.Series or np.array): true binary labels. | ||
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label. | ||
y_true (pd.Series or np.array): true labels. | ||
y_pred_proba (pd.Series or np.array): predictions from a classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label in the binary case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes you're making would mean we can delete
Note this should be the predicted probability for the "true" label in the binary case.
Right? Or perhaps we can just add a note saying that we expect 2d input, and if its 1d it should be for the "true" case. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy if you're able to get these docs comments in that would be great. I think I left one more which is still open. OK to do a separate PR too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry I missed this, just pushed up the changes.
@ctduffy that screenshot looks great!! Thanks for sharing |
rs = np.random.RandomState(42) | ||
y_tr = label_binarize(X_y_multi[1], classes=[0, 1, 2]) | ||
y_pred_proba = y_tr * rs.random(y_tr.shape) | ||
return X_y_multi[1], y_tr, y_pred_proba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is X_y_multi[1]
? I suggest you add _, y = X_y_multi
at the top of this method and then refer directly to y
Also under our naming convention you could call the binarized output y_true
@@ -13,6 +15,14 @@ | |||
) | |||
|
|||
|
|||
@pytest.fixture | |||
def Binarized_ys(X_y_multi): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this name lowercase, for our code style standards.
go = pytest.importorskip('plotly.graph_objects', reason='Skipping plotting test because plotly not installed') | ||
one_val_y_zero = np.array([0]) | ||
with pytest.warns(UndefinedMetricWarning): | ||
fig = graph_roc_curve(one_val_y_zero, one_val_y_zero) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this case throw an UndefinedMetricWarning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the training set only has one sample, so the sample is meaningless and the sklearn implementation throws an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@ctduffy I left a few more comments which we should round off before merging. In particular, please add unit test coverage yet for when the target or predicted probabilities are nan
.
Closes #777, adds ROC multi-class plotting capability. Expands current
roc_curve
andgraph_roc_curve
functionality to handle multiclass classification. Default behavior doesn't change, doesn't need any modifications to work as it originally did with binary classification.