Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ROC multi-class plotting capability #832

Merged
merged 26 commits into from Jun 9, 2020
Merged

Adds ROC multi-class plotting capability #832

merged 26 commits into from Jun 9, 2020

Conversation

ctduffy
Copy link
Contributor

@ctduffy ctduffy commented Jun 3, 2020

Closes #777, adds ROC multi-class plotting capability. Expands current roc_curve and graph_roc_curve functionality to handle multiclass classification. Default behavior doesn't change, doesn't need any modifications to work as it originally did with binary classification.

@codecov
Copy link

codecov bot commented Jun 3, 2020

Codecov Report

Merging #832 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #832   +/-   ##
=======================================
  Coverage   99.68%   99.68%           
=======================================
  Files         193      193           
  Lines        7530     7598   +68     
=======================================
+ Hits         7506     7574   +68     
  Misses         24       24           
Impacted Files Coverage Δ
evalml/pipelines/graph_utils.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_graph_utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ed8545...d56242d. Read the comment docs.

@ctduffy ctduffy marked this pull request as ready for review Jun 3, 2020
@ctduffy ctduffy requested review from angela97lin and dsherry Jun 3, 2020
@dsherry
Copy link
Collaborator

dsherry commented Jun 3, 2020

@ctduffy this is great! I'll take a look.

Could you post an example of what this looks like on a multiclass dataset? A screenshot would be enough.

thresholds = dict()
tpr_rates = dict()
auc_scores = dict()
if n_classes == 1:
Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have this be user-defined rather than something we detect from the target? It feels problematic to me to ask the user to provide this.

We could do something like if len(y_true.dropna().unique()) > 2 then multiclass. I may not have the right syntax for dropna here!



def graph_roc_curve(y_true, y_pred_proba, title_addition=None):
def graph_roc_curve(y_true, y_pred_proba, labels=None, title_addition=None):
Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I like the idea of providing a way for people to override what labels will appear for each class. Perhaps we could call this custom_class_labels ? The term "labels" means something different to me in the context of classification.

Copy link
Contributor Author

@ctduffy ctduffy Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah true good point

roc_curve_data = roc_curve(y_true, y_pred_proba)

if len(y_true.shape) > 1:
n_classes = y_true.shape[1]
Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. y_true is the target column. Did you mean y_pred_proba?

Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I think this is good.

A warning: our existing unit test coverage appears to assume y_pred_proba has only one column in the binary case. That's technically incorrect, since binary classifiers should output two columns for predicted probabilities, one for the "false" class and one for the "true" class. Including both in our pipelines' predict_proba output is redundant, because each row would sum to 1, but we do so because it helps keep consistency with multiclass. So, you'll have to update those tests to set up y_pred_proba with 2 columns for the binary case.

Relatedly, we have a decision to make. If someone passes in a single column here for y_pred_proba, we can either assume that's for the "true" class, and document that assumption, or we can throw an error. What do you think? I like the idea of assuming "true".

Copy link
Contributor Author

@ctduffy ctduffy Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this works right now because the way that the algo is set up is that it takes in y_true values that are one-hot encoded, so the y_true and the y_pred_proba have the same dimensions. I can change this if need be (ie putting the one-hot encoding into this function instead of having the user do it).

Copy link
Contributor Author

@ctduffy ctduffy Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assume it is for the True class, that makes the most sense to me. Then, when there is a multi-class classification, I can add the one-hot encoding to the function so that it does not have to be passed in that way by the user, and the number of unique objects can be used as the number of classes.

Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Yeah, I don't think we should expect or add any one-hot encoding. Its a useful piece of feature engineering for getting good performance from models, but outside of that context I don't think its necessary.

Let's maintain the assumption that y_true is always a 1d vector of length n_rows. In the multiclass case, y_pred_proba will have a number of columns equal to the number of classes; in the binary case, y_pred_proba could have either 2 columns or just 1 (which we can assume is for the "true" class, i.e. the 2nd column). Does all that make sense?

Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I know what you mean. In order to generate the ROC curve for each class, you need to treat that class as "true" and all others as "false". Doesn't look like that's in this PR yet, right?

Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I was totally missing this before! 😂 So in that case, I have a suggestion. How about you back out the changes you've made to roc_curve, so that it still only supports the binary case. Then in graph_roc_curve, you can add code to implement that sort of one-hot encoding. Does that make sense?

Copy link
Contributor Author

@ctduffy ctduffy Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it that way

Copy link
Collaborator

@dsherry dsherry Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds good

Copy link
Contributor Author

@ctduffy ctduffy Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just committed a version that follows that structure. It seems less redundant and more elegant.

evalml/pipelines/graph_utils.py Outdated Show resolved Hide resolved
evalml/pipelines/graph_utils.py Outdated Show resolved Hide resolved
dsherry
dsherry previously requested changes Jun 3, 2020
Copy link
Collaborator

@dsherry dsherry left a comment

Left some suggestions, great start!

@ctduffy
Copy link
Contributor Author

ctduffy commented Jun 4, 2020

@ctduffy this is great! I'll take a look.

Could you post an example of what this looks like on a multiclass dataset? A screenshot would be enough.

@dsherry
Here is a screenshot of a simple multiclass ROC curve using a random forest classifier on the Iris Dataset from sklearn
Screen Shot 2020-06-04 at 10 06 33 AM

@ctduffy ctduffy requested a review from dsherry Jun 4, 2020
Copy link
Contributor

@angela97lin angela97lin left a comment

Nice work! Just had a suggestion on testing :)

"""Generate and display a Receiver Operating Characteristic (ROC) plot.

Arguments:
y_true (pd.Series or np.array): true binary labels.
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label.
custom_class_labels (list or None): if not None, custom lables for classes. Default None.
Copy link
Contributor

@angela97lin angela97lin Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: lables --> labels

y_true (pd.Series or np.array): true binary labels.
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label.
Copy link
Contributor

@angela97lin angela97lin Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove "binary" from the docstrings now? :D

evalml/tests/pipeline_tests/test_graph_utils.py Outdated Show resolved Hide resolved
@ctduffy ctduffy requested a review from angela97lin Jun 5, 2020
@ctduffy ctduffy dismissed dsherry’s stale review Jun 5, 2020

changes changed!

Copy link
Collaborator

@dsherry dsherry left a comment

@ctduffy looking good! Left one more round of feedback, should be ready to merge after that

for i in range(n_classes):
roc_curve_data = roc_curve(y_true[:, i], y_pred_proba[:, i])
data.append(_go.Scatter(x=roc_curve_data['fpr_rates'], y=roc_curve_data['tpr_rates'],
name='ROC (AUC {:06f}) of Class {name}'
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're listing multiple classes, perhaps we could reorganize this. The title of the plot says Receiver Operating Characteristic, so perhaps we can just say 'Class {name} (AUC {:06f})' here, to keep the legend a bit shorter

@@ -2,6 +2,7 @@

import numpy as np
import pandas as pd
from sklearn import preprocessing
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: can you please do from sklearn.preprocessing import LabelBinarizer?

Weird that lint didn't say to do this! I thought it would have. 🤷‍♂️


lb = preprocessing.LabelBinarizer()
lb.fit(np.unique(y_true))
y_true = lb.transform(y_true)
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're changing the shape and meaning of the originally inputted y_true here, for readability I'd suggest using a new variable name here.

lb = preprocessing.LabelBinarizer()
lb.fit(np.unique(y_true))
y_true = lb.transform(y_true)
n_classes = y_true.shape[1]
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could y_true ever be one-dimensional? If there's only one unique value in y_true, will LabelBinarizer.transform output a 1d or 2d vector? If the answer is 1d, trying to access index 1 would fail.

This would be a good unit test case to add regardless.

Copy link
Contributor Author

@ctduffy ctduffy Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It outputs a 2d vector! But am adding testing around that.

roc_curve_data = roc_curve(y_true, y_pred_proba)

lb = preprocessing.LabelBinarizer()
lb.fit(np.unique(y_true))
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the behavior of this graph be if any of the target rows, or predicted probabilities, are np.nan?

I hope those rows would simply be ignored in roc_curve, but idk what the behavior of that is right now.

I suggest you add a test case for this and see what happens. Ideally, roc_curve handles nans, and all you need to do is update this line to np.unique(y_true).dropna() to avoid treating np.nan as a valid class. If not, you could drop nans from the input, with something like:

nan_indices = np.isnan(y_true) | np.isnan(y_pred_proba.sum(axis=1))
y_true = y_true[~nan_indices]
y_pred_proba = y_pred_proba[~nan_indices, :]

Otherwise if its too hard to update the impl here to handle np.nans in the input, we can detect and throw an error. That sound good?

Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy do we have unit test coverage yet for when the target or predicted probabilities are nan?

Copy link
Contributor Author

@ctduffy ctduffy Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry yes, here:
Screen Shot 2020-06-09 at 10 28 09 AM
Though I realize that this is mostly just a test to make sure that no error is thrown, but I will go in and add a few assertions after the graphing functions.

Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Thanks! Yeah, agreed. In fact, you could split the nans part out to a separate unit test, since its independent from the single-value case you're testing.

Copy link
Contributor Author

@ctduffy ctduffy Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the name, but kept it in the same test, as the second part of the single value testing does involve nans

go = pytest.importorskip('plotly.graph_objects', reason='Skipping plotting test because plotly not installed')
X, y_true = X_y_multi
rs = np.random.RandomState(42)
y_tr = label_binarize(y_true, classes=[0, 1, 2])
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we use label_binarize here but `LabelBinarizer in the function impl?

Copy link
Contributor Author

@ctduffy ctduffy Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LabelBinarizer doesn't require prior knowledge of classes, whereas the label_binarize does. So in the testing environment we know the classes, and in the graph utils we don't know the labels at that particular time.

@@ -62,18 +63,19 @@ def graph_precision_recall_curve(y_true, y_pred_proba, title_addition=None):

def roc_curve(y_true, y_pred_proba):
"""
Given labels and binary classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


Returns:
dict: Dictionary containing metrics used to generate an ROC plot, with the following keys:
dict: Dictionary containing metrics used to generate an ROC plot, with the following keys, each of which contains a dictionary for each classification class:
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 awesome

* `fpr_rates`: False positive rates.
* `tpr_rates`: True positive rates.
* `thresholds`: Threshold values used to produce each pair of true/false positive rates.
* `auc_score`: The area under the ROC curve.
* `auc_scores`: The area under the ROC curve. (One vs Rest)
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should say One vs Rest here, because roc_curve doesn't know anything about multiclass, right?

y_true (pd.Series or np.array): true binary labels.
y_pred_proba (pd.Series or np.array): predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label.
y_true (pd.Series or np.array): true labels.
y_pred_proba (pd.Series or np.array): predictions from a classifier, before thresholding has been applied. Note this should be the predicted probability for the "true" label in the binary case.
Copy link
Collaborator

@dsherry dsherry Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes you're making would mean we can delete

Note this should be the predicted probability for the "true" label in the binary case.
Right? Or perhaps we can just add a note saying that we expect 2d input, and if its 1d it should be for the "true" case. What do you think?

Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctduffy if you're able to get these docs comments in that would be great. I think I left one more which is still open. OK to do a separate PR too.

Copy link
Contributor Author

@ctduffy ctduffy Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry I missed this, just pushed up the changes.

@dsherry
Copy link
Collaborator

dsherry commented Jun 5, 2020

@ctduffy that screenshot looks great!! Thanks for sharing

@ctduffy ctduffy requested a review from dsherry Jun 5, 2020
rs = np.random.RandomState(42)
y_tr = label_binarize(X_y_multi[1], classes=[0, 1, 2])
y_pred_proba = y_tr * rs.random(y_tr.shape)
return X_y_multi[1], y_tr, y_pred_proba
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is X_y_multi[1]? I suggest you add _, y = X_y_multi at the top of this method and then refer directly to y

Also under our naming convention you could call the binarized output y_true

@@ -13,6 +15,14 @@
)


@pytest.fixture
def Binarized_ys(X_y_multi):
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this name lowercase, for our code style standards.

go = pytest.importorskip('plotly.graph_objects', reason='Skipping plotting test because plotly not installed')
one_val_y_zero = np.array([0])
with pytest.warns(UndefinedMetricWarning):
fig = graph_roc_curve(one_val_y_zero, one_val_y_zero)
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this case throw an UndefinedMetricWarning?

Copy link
Contributor Author

@ctduffy ctduffy Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the training set only has one sample, so the sample is meaningless and the sklearn implementation throws an error.

dsherry
dsherry approved these changes Jun 9, 2020
Copy link
Collaborator

@dsherry dsherry left a comment

Looks good!

@ctduffy I left a few more comments which we should round off before merging. In particular, please add unit test coverage yet for when the target or predicted probabilities are nan.

@ctduffy ctduffy merged commit c0d7f41 into master Jun 9, 2020
2 checks passed
@ctduffy ctduffy deleted the 777-roc branch Jun 9, 2020
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add one-vs-rest multiclass ROC plot
3 participants