-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added labels for the row index of confusion matrix #1154
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1154 +/- ##
==========================================
+ Coverage 99.72% 99.91% +0.19%
==========================================
Files 195 195
Lines 11554 11596 +42
==========================================
+ Hits 11522 11586 +64
+ Misses 32 10 -22
Continue to review full report at Codecov.
|
from evalml.model_understanding.graphs import confusion_matrix
# binary booleans
y_true = [True, False, True, True, False, False]
y_pred = [False, False, True, True, False, False]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat
# binary integers
y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 1, 1]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat
# binary strings
y_true = ['blue', 'red', 'blue', 'red']
y_pred = ['blue', 'red', 'red', 'red']
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat
# multiclass strings
y_true = ['blue', 'red', 'red', 'red', 'orange', 'orange']
y_pred = ['red', 'blue', 'blue', 'red', 'orange', 'orange']
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat
# multiclass integers
y_true = [0, 1, 2, 1, 2, 1, 2, 3]
y_pred = [0, 1, 1, 1, 1, 1, 3, 3]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat |
@@ -145,7 +145,18 @@ | |||
"source": [ | |||
"### Confusion Matrix\n", | |||
"\n", | |||
"For binary or multiclass classification, we can view a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) of the classifier's predictions" | |||
"For binary or multiclass classification, we can view a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) of the classifier's predictions. In the DataFrame output of `confusion_matrix()`, the column header represents the predicted labels while row header represents the actual labels." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. The issue mentions possibly updating the row and column labels but I think it's ok to not do that since we updated the documentation and docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn This looks good to me! I had just one suggestion on the tests.
conf_mat = confusion_matrix(y_true, y_predicted, normalize_method=None) | ||
conf_mat_expected = np.array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) | ||
assert np.array_equal(conf_mat_expected, conf_mat) | ||
assert isinstance(conf_mat, pd.DataFrame) | ||
if data_type == 'pd': | ||
labels = [0, 1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this check even if data_type
is np
. From what I understand, the data_type
parameter in this test determines the type of the input and not the output.
I think there is also value in turning the code @gsheni posted in this PR into a unit test that checks the labels and column names are set correctly for different kinds of inputs (binary string, multiclass string, bool, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up separating out the conf matrix labels into their own unit test and I incorporated @gsheni's code into it. I also set the new unit tests to check both pandas and numpy inputs. Let me know what you think 😄 .
@@ -384,9 +395,9 @@ | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.7.8" | |||
"version": "3.8.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to not change the notebook version.
df6fe2a
to
5fd9fcc
Compare
docs/source/release_notes.rst
Outdated
@@ -7,6 +7,8 @@ Release Notes | |||
* Modified `get_objective` and `get_objectives` to be able to return any objective in `evalml.objectives` :pr:`1132` | |||
* Added a `return_instance` boolean parameter to `get_objective` :pr:`1132` | |||
* Added label encoder to lightGBM for binary classification :pr:`1152` | |||
* Added labels for the row index of confusion matrix :pr: `1154` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn This looks great! Thanks for modifying the tests. I think this is good to merge once we fix the duplicate line in the release notes.
Added the labels for the row index for the pandas confusion matrix. Updated the documentation and API accordingly.
Changes proposed
Fixes #1059