Added labels for the row index of confusion matrix #1154

christopherbunn · 2020-09-09T21:22:43Z

Added the labels for the row index for the pandas confusion matrix. Updated the documentation and API accordingly.

Changes proposed

The row index and column index of the DF returned from confusion_matrix should be identical, and should both use the target values, not integers or booleans which map to the target values. I thought our pipeline class was already taking care of this for us, since it handles this mapping internally.
Verify that graph_confusion_matrix still works properly after any changes are made.
Update API docs, and the confusion matrix section of the model understanding user guide.

codecov · 2020-09-09T21:30:12Z

Codecov Report

Merging #1154 into main will increase coverage by 0.19%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1154      +/-   ##
==========================================
+ Coverage   99.72%   99.91%   +0.19%     
==========================================
  Files         195      195              
  Lines       11554    11596      +42     
==========================================
+ Hits        11522    11586      +64     
+ Misses         32       10      -22

Impacted Files	Coverage Δ
evalml/model_understanding/graphs.py	`100.00% <100.00%> (ø)`
...lml/tests/model_understanding_tests/test_graphs.py	`100.00% <100.00%> (+0.17%)`	⬆️
evalml/automl/automl_search.py	`99.58% <0.00%> (+0.41%)`	⬆️
.../automl_tests/test_automl_search_classification.py	`100.00% <0.00%> (+0.45%)`	⬆️
evalml/tests/component_tests/test_components.py	`100.00% <0.00%> (+0.76%)`	⬆️
evalml/tests/pipeline_tests/test_pipelines.py	`100.00% <0.00%> (+0.88%)`	⬆️
...ests/automl_tests/test_automl_search_regression.py	`100.00% <0.00%> (+1.06%)`	⬆️
evalml/utils/gen_utils.py	`98.94% <0.00%> (+2.10%)`	⬆️
evalml/tests/component_tests/test_utils.py	`100.00% <0.00%> (+3.57%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5239a8...ddaa601. Read the comment docs.

gsheni · 2020-09-10T17:11:21Z

This function looks good to me given that now it is able to handle the different scenarios of binary/multiclass with strings, integers, and booleans (binary only).
It also addresses the weird 0/1 vs True/False confusion matrix I saw in the original issue.

from evalml.model_understanding.graphs import confusion_matrix

# binary booleans
y_true = [True, False, True, True, False, False]
y_pred = [False, False, True, True, False, False]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat

# binary integers
y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 1, 1]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat

# binary strings
y_true = ['blue', 'red', 'blue', 'red']
y_pred = ['blue', 'red', 'red', 'red']
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat

# multiclass strings
y_true = ['blue', 'red', 'red', 'red', 'orange', 'orange']
y_pred = ['red', 'blue', 'blue', 'red', 'orange', 'orange']
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat

# multiclass integers
y_true = [0, 1, 2, 1, 2, 1, 2, 3]
y_pred = [0, 1, 1, 1, 1, 1, 3, 3]
conf_mat = confusion_matrix(y_true=y_true, y_predicted=y_pred)
conf_mat

freddyaboulton · 2020-09-10T21:16:42Z

docs/source/user_guide/model_understanding.ipynb

@@ -145,7 +145,18 @@
   "source": [
    "### Confusion Matrix\n",
    "\n",
-    "For binary or multiclass classification, we can view a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) of the classifier's predictions"
+    "For binary or multiclass classification, we can view a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) of the classifier's predictions. In the DataFrame output of `confusion_matrix()`, the column header represents the predicted labels while row header represents the actual labels."


Nice. The issue mentions possibly updating the row and column labels but I think it's ok to not do that since we updated the documentation and docstring.

freddyaboulton

@christopherbunn This looks good to me! I had just one suggestion on the tests.

freddyaboulton · 2020-09-10T21:20:01Z

evalml/tests/model_understanding_tests/test_graphs.py

    conf_mat = confusion_matrix(y_true, y_predicted, normalize_method=None)
    conf_mat_expected = np.array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
    assert np.array_equal(conf_mat_expected, conf_mat)
    assert isinstance(conf_mat, pd.DataFrame)
+    if data_type == 'pd':
+        labels = [0, 1, 2]


I think we should do this check even if data_type is np. From what I understand, the data_type parameter in this test determines the type of the input and not the output.

I think there is also value in turning the code @gsheni posted in this PR into a unit test that checks the labels and column names are set correctly for different kinds of inputs (binary string, multiclass string, bool, etc).

I ended up separating out the conf matrix labels into their own unit test and I incorporated @gsheni's code into it. I also set the new unit tests to check both pandas and numpy inputs. Let me know what you think 😄 .

freddyaboulton · 2020-09-10T21:20:26Z

docs/source/user_guide/model_understanding.ipynb

@@ -384,9 +395,9 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.8.5"


We try to not change the notebook version.

gsheni · 2020-09-11T14:29:46Z

docs/source/release_notes.rst

@@ -7,6 +7,8 @@ Release Notes
        * Modified `get_objective` and `get_objectives` to be able to return any objective in `evalml.objectives` :pr:`1132`
        * Added a `return_instance` boolean parameter to `get_objective` :pr:`1132`
        * Added label encoder to lightGBM for binary classification :pr:`1152`
+        * Added labels for the row index of confusion matrix :pr: `1154`


Duplicate line?

freddyaboulton

@christopherbunn This looks great! Thanks for modifying the tests. I think this is good to merge once we fix the duplicate line in the release notes.

christopherbunn changed the title ~~Added labels for the row and column index of confusion matrix~~ Added labels for the row index of confusion matrix Sep 9, 2020

christopherbunn marked this pull request as ready for review September 10, 2020 13:03

freddyaboulton reviewed Sep 10, 2020

View reviewed changes

freddyaboulton approved these changes Sep 10, 2020

View reviewed changes

christopherbunn added 5 commits September 11, 2020 09:37

Added data labels to index of confusion matrix

f0d6b1d

Updated documentation to reflect new headers

d5d18f5

Updated release notes

04278da

Removed space from release notes PR

85e34fd

Separated out conf matix unit test and reverted doc python version

5fd9fcc

christopherbunn force-pushed the 1059_conf_matrix_labels branch from df6fe2a to 5fd9fcc Compare September 11, 2020 14:26

gsheni reviewed Sep 11, 2020

View reviewed changes

christopherbunn requested a review from freddyaboulton September 11, 2020 14:39

freddyaboulton approved these changes Sep 11, 2020

View reviewed changes

christopherbunn and others added 2 commits September 11, 2020 11:22

Removed duplicate line from release notes

4d98ca3

Merge branch 'main' into 1059_conf_matrix_labels

ddaa601

christopherbunn merged commit 706b9f0 into main Sep 11, 2020

christopherbunn deleted the 1059_conf_matrix_labels branch September 11, 2020 15:47

This was referenced Sep 17, 2020

Release v0.14.0 #1191

Closed

Release v0.13.2 #1192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added labels for the row index of confusion matrix #1154

Added labels for the row index of confusion matrix #1154

christopherbunn commented Sep 9, 2020 •

edited

Loading

codecov bot commented Sep 9, 2020 •

edited

Loading

gsheni commented Sep 10, 2020 •

edited

Loading

freddyaboulton Sep 10, 2020

freddyaboulton left a comment

freddyaboulton Sep 10, 2020

christopherbunn Sep 11, 2020

freddyaboulton Sep 10, 2020

gsheni Sep 11, 2020

freddyaboulton left a comment

Added labels for the row index of confusion matrix #1154

Added labels for the row index of confusion matrix #1154

Conversation

christopherbunn commented Sep 9, 2020 • edited Loading

codecov bot commented Sep 9, 2020 • edited Loading

Codecov Report

gsheni commented Sep 10, 2020 • edited Loading

freddyaboulton Sep 10, 2020

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Sep 10, 2020

Choose a reason for hiding this comment

christopherbunn Sep 11, 2020

Choose a reason for hiding this comment

freddyaboulton Sep 10, 2020

Choose a reason for hiding this comment

gsheni Sep 11, 2020

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

christopherbunn commented Sep 9, 2020 •

edited

Loading

codecov bot commented Sep 9, 2020 •

edited

Loading

gsheni commented Sep 10, 2020 •

edited

Loading