Explain Predictions #1016

freddyaboulton · 2020-08-03T16:54:20Z

Pull Request Description

Closes #986 #955

Implements the design for explain_predictions and explain_predictions_best_worst in this design document.

Demo of what the user sees

Regression - Boston Housing Dataset

Binary - Breast Cancer Dataset

Multiclass - Iris Dataset

For more examples and the complete output per pipeline, see this PR

Changes to docs

See the updates here.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-08-03T16:58:00Z

Codecov Report

Merging #1016 into main will increase coverage by 0.18%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1016      +/-   ##
==========================================
+ Coverage   99.72%   99.90%   +0.18%     
==========================================
  Files         181      181              
  Lines        9748     9998     +250     
==========================================
+ Hits         9721     9989     +268     
+ Misses         27        9      -18

Impacted Files	Coverage Δ
evalml/pipelines/classification_pipeline.py	`100.00% <100.00%> (ø)`
...alml/pipelines/prediction_explanations/__init__.py	`100.00% <100.00%> (ø)`
...pelines/prediction_explanations/_user_interface.py	`100.00% <100.00%> (ø)`
...ml/pipelines/prediction_explanations/explainers.py	`100.00% <100.00%> (ø)`
...assification_pipeline_tests/test_classification.py	`100.00% <100.00%> (ø)`
...peline_tests/explanations_tests/test_explainers.py	`100.00% <100.00%> (ø)`
...ne_tests/explanations_tests/test_user_interface.py	`100.00% <100.00%> (ø)`
evalml/tests/component_tests/test_components.py	`99.59% <0.00%> (+0.40%)`	⬆️
evalml/automl/automl_search.py	`99.55% <0.00%> (+0.44%)`	⬆️
.../automl_tests/test_automl_search_classification.py	`100.00% <0.00%> (+0.45%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbc315f...d9b0efb. Read the comment docs.

freddyaboulton · 2020-08-03T21:34:17Z

docs/source/user_guide/model_understanding.ipynb

@@ -132,9 +132,9 @@
   "outputs": [],
   "source": [
    "# get the predicted probabilities associated with the \"true\" label\n",
-    "y = y.map({'malignant': 0, 'benign': 1})\n",
+    "y_encoded = y.map({'malignant': 0, 'benign': 1})\n",


Doing this so that the calls to explain_predictions_best_worst don't break (pipeline was fit on string labels but then we would be passing in int labels)

Got it. How about:

Move this y_encoded = y.map({'malignant': 0, 'benign': 1}) down into the chunk about prediction explanations

Don't change whats passed to graph_precision_recall_curve and graph_roc_curve. Not necessary, right?

graph_precision_recall_curve and graph_roc_curve need ints and the prediction explanations need strings (since that is what the pipeline is originally fit on). I think we have to change what gets passed to precision_recall_curve and roc_curve unless we only use int labels in the entire notebook (the original labels are strings)?

freddyaboulton · 2020-08-03T21:37:09Z

evalml/pipelines/prediction_explanations/_user_interface.py


-    # Classification
-    if isinstance(shap_values, list):


This logic got split into the _SHAPMultiClassTableMaker, _SHAPBinaryTableMaker, _SHAPRegressionTableMaker. I think this is more maintainable and easier to understand.

freddyaboulton · 2020-08-03T21:38:13Z

evalml/pipelines/prediction_explanations/explainers.py

+    return table_maker(shap_values, normalized_shap_values, top_k, include_shap_values)
+
+
+def _abs_error(y_true, y_pred):


_abs_error and _cross_entropy don't have to be defined in this file but didn't want to create a new module since they would only be used in this file.

Hmmm, makes me wonder if they're useful in standard_metrics 🤔

Yea that's what I was thinking too but the difference between these functions and the ones in standard_metrics is that these functions return a float for each data point but standard_metrics return a float for the entire cv fold.

freddyaboulton · 2020-08-03T21:39:32Z

evalml/tests/pipeline_tests/classification_pipeline_tests/test_classification.py

+
+
+@pytest.mark.parametrize("problem_type", ["binary", "multi"])
+def test_pipeline_has_classes_property(logistic_regression_binary_pipeline_class,


Had to add this test because I added a _classes property to the ClassificationPipeline class.

Got it.

This makes me think a) we should just bite the bullet and make that property public b) can you also do a similar test with an int-type target instead of str-type, because those could be treated differently by the encoder if we alter that code in the future.

I added the test for an int-type target!

evalml/tests/pipeline_tests/explanations_tests/test_user_interface.py

freddyaboulton · 2020-08-03T21:41:55Z

evalml/tests/pipeline_tests/explanations_tests/test_explainers.py

+                          (ProblemTypes.MULTICLASS, multiclass_no_best_worst_answer)])
+@patch("evalml.pipelines.prediction_explanations.explainers._DEFAULT_METRICS")
+@patch("evalml.pipelines.prediction_explanations.explainers.explain_prediction")
+def test_explain_predictions_custom_index(explain_prediction_mock, mock_default_metrics,


I wanted to add a test that made sure the way we indexed the features dataframe didn't use the dataframe's index. I was afraid custom indexes would break future refactorings of the code but maybe I'm being paranoid lol.

Amazing. Great thinking. I respect your paranoia 🤣

freddyaboulton · 2020-08-03T21:47:13Z

evalml/pipelines/prediction_explanations/_user_interface.py

+
+
+class _ReportSectionMaker:
+    """Make a prediction explanation report.


I think this docstring helps explain how the following classes fit together:

_HeadingMaker

_SHAPTableMaker

_EmptyPredictedValuesMaker

_ClassificationPredictedValuesMaker

_RegressionPredictedValuesMaker

I originally wrote a working implementation without this structure but there were too many slight differences between the expected output of explain_predictions and explain_predictions_best_worst depending on whether it is a regression or classification problem that I decided that breaking up the differences into their own classes was the best way to go.

angela97lin

Left some tiny comments for now, but not quite done looking through yet :'D Looking good though!

angela97lin · 2020-08-04T17:52:38Z

docs/source/user_guide/model_understanding.ipynb

+    "\n",
+    "For regression problems, the default metric is the absolute difference between the prediction and target value. We can specify our own ranking function by passing in a function to the `metric` parameter. This function will be called on `y_true` and `y_pred`. By convention, lower scores are better.\n",
+    "\n",
+    "At the top of each table, you can see the predicted probabilities, target value, and error on that prediction.\n"


Minor nitpick: At the top of each table, you can see --> At the top of each table, we can see

Just to be consistent with previous paragraphs' use of "us" and "our"?

Sounds good!

angela97lin · 2020-08-04T17:53:52Z

docs/source/user_guide/model_understanding.ipynb

+    "\n",
+    "For regression problems, the default metric is the absolute difference between the prediction and target value. We can specify our own ranking function by passing in a function to the `metric` parameter. This function will be called on `y_true` and `y_pred`. By convention, lower scores are better.\n",
+    "\n",
+    "At the top of each table, you can see the predicted probabilities, target value, and error on that prediction.\n"


I'm a tiny bit confused since we're jumping back and forth between classification and regression. We talk about regression problems but then mention for the table that we can see predicted probabilities--but we can only see that for classification problems, right?

The behavior is slightly different between regression and classification so I'm trying to give as much detail as possible. But you're right that since this example is classification, it can be confusing to talk about regression.

Yup, that makes sense! Maybe would just help to restructure (ex: move regression paragraph after example)?

docs/source/user_guide/model_understanding.ipynb

angela97lin · 2020-08-04T18:06:12Z

evalml/pipelines/prediction_explanations/_user_interface.py

+    dtypes = ["t", "t"]
+    alignment = ["c", "c"]
+
+    if include_shap_values:
+        dtypes.append("f")
+        alignment.append("c")


(Nitpick/style)
Could do something like (syntax not guaranteed :p)

dtypes = ["t", "t", "f"] if include_shap_values else ["t", "t"] alignment = ["t", "t", "f"] if include_shap_values else ["c", "c"]

evalml/pipelines/prediction_explanations/_user_interface.py

angela97lin · 2020-08-04T18:12:41Z

evalml/pipelines/prediction_explanations/explainers.py

+    return table_maker(shap_values, normalized_shap_values, top_k, include_shap_values)
+
+
+def _abs_error(y_true, y_pred):


Hmmm, makes me wonder if they're useful in standard_metrics 🤔

evalml/pipelines/prediction_explanations/explainers.py

evalml/tests/pipeline_tests/explanations_tests/test_user_interface.py

dsherry

@freddyaboulton this is really great. The guide reads real easy and the APIs and the tests look solid.

I left a few suggestions on the guide/docs.

I left some comments on organization in explainers.py. I think its fine to keep most of the impl classes and methods private-named, but perhaps then we should have explainers.py contain only public-named stuff.

I also wonder if there's a way to have explain_predictions_best_worst call explain_predictions, left a couple comments about that.

I left a discussion about the classification pipeline _classes accessor, adding more unit testing. I think its fine to keep it private-named for now.

Nothing blocking merge IMO. We should resolve the code organization / naming discussions though.

dsherry · 2020-08-05T23:07:50Z

docs/source/user_guide/model_understanding.ipynb

@@ -132,9 +132,9 @@
   "outputs": [],
   "source": [
    "# get the predicted probabilities associated with the \"true\" label\n",
-    "y = y.map({'malignant': 0, 'benign': 1})\n",
+    "y_encoded = y.map({'malignant': 0, 'benign': 1})\n",


Got it. How about:

Move this y_encoded = y.map({'malignant': 0, 'benign': 1}) down into the chunk about prediction explanations

Don't change whats passed to graph_precision_recall_curve and graph_roc_curve. Not necessary, right?

docs/source/user_guide/model_understanding.ipynb

evalml/tests/pipeline_tests/explanations_tests/test_explainers.py

evalml/pipelines/prediction_explanations/explainers.py

evalml/tests/pipeline_tests/explanations_tests/test_user_interface.py

evalml/pipelines/prediction_explanations/explainers.py

dsherry · 2020-08-06T22:11:19Z

evalml/tests/pipeline_tests/classification_pipeline_tests/test_classification.py

+
+
+@pytest.mark.parametrize("problem_type", ["binary", "multi"])
+def test_pipeline_has_classes_property(logistic_regression_binary_pipeline_class,


Got it.

This makes me think a) we should just bite the bullet and make that property public b) can you also do a similar test with an int-type target instead of str-type, because those could be treated differently by the encoder if we alter that code in the future.

freddyaboulton · 2020-08-07T22:25:47Z

@dsherry I think I addressed all of your feedback!

Regarding code organization:

I created a _TableMaker base class. _SHAPRegressionTableMaker, _SHAPBinaryTableMaker, and _SHAPMultiClassTableMakernow inherit from _TableMaker.
I moved the implementation code that was in explainers to _user_interface.py. The only slight hiccup is that before _SHAPTableMaker called explain_prediction. To avoid circular imports, I created a new function called _make_single_prediction_shap_table. Both explain_prediction and _SHAPTableMaker now call this function.
I created a _SectionMaker base class. _HeadingMaker, _EmptyPredictedValuesMaker, _ClassificationPredictedValuesMaker, _RegressionPredictedValuesMaker, _SHAPTableMaker now inherit from _SectionMaker.

dsherry

LGTM!

…_best_worst.

…ore modular.

…tions and explain_predictions_best_worst.

…cs with more user-friendly output.

…reference and fixing docs.

…e model understanding user guide.

…ne, a helpful ValueError will be raised.

…e classes to make the structure more clear.

… ints.

freddyaboulton force-pushed the 986-explain-best-worst-random-k branch 2 times, most recently from ac9355a to 2410871 Compare August 3, 2020 18:58

freddyaboulton requested review from dsherry, angela97lin, jeremyliweishih and eccabay and removed request for dsherry, angela97lin and jeremyliweishih August 3, 2020 21:32

freddyaboulton marked this pull request as ready for review August 3, 2020 21:32

auto-assign bot assigned freddyaboulton Aug 3, 2020

freddyaboulton commented Aug 3, 2020

View reviewed changes

freddyaboulton added this to the August 2020 milestone Aug 3, 2020

freddyaboulton commented Aug 3, 2020

View reviewed changes

evalml/tests/pipeline_tests/explanations_tests/test_user_interface.py Show resolved Hide resolved

freddyaboulton commented Aug 3, 2020

View reviewed changes

angela97lin reviewed Aug 4, 2020

View reviewed changes

eccabay reviewed Aug 5, 2020

View reviewed changes

evalml/tests/pipeline_tests/explanations_tests/test_user_interface.py Outdated Show resolved Hide resolved

freddyaboulton force-pushed the 986-explain-best-worst-random-k branch from 634e1bc to 6ea9d6a Compare August 5, 2020 20:43

dsherry approved these changes Aug 6, 2020

View reviewed changes

freddyaboulton force-pushed the 986-explain-best-worst-random-k branch from 1f05fa2 to 1e104b6 Compare August 7, 2020 21:41

dsherry approved these changes Aug 10, 2020

View reviewed changes

freddyaboulton added 21 commits August 10, 2020 14:54

Working implementation of explain_predictions and explain_predictions…

716a224

…_best_worst.

Refactoring explainers so that differences between report types are m…

3ffa371

…ore modular.

Updating release notes for PR 1016.

506afcd

Adding tests for error metrics.

206c982

Moving release note for PR 1016 to upcoming release.

814ffdb

Adding test for classification pipeline _classes property.

5878115

Adding section to model understanding user guide about explain_predic…

5df0f15

…tions and explain_predictions_best_worst.

Adding test for custom metric. Replacing names of default error metri…

9dfbf5b

…cs with more user-friendly output.

Adding explain_predictions and explain_predictions_best_worst to api …

13d6f8d

…reference and fixing docs.

Adding Predicted Value as a field in the classification reports.

509d09a

Making metrics public.

1679d24

Reducing the text output in the prediction explanations section of th…

e407ccb

…e model understanding user guide.

If a user tries to access pipeline._classes before fitting the pipeli…

38afc69

…ne, a helpful ValueError will be raised.

Fixing typo in docs.

e41d909

Moving the ReportSectionMakers to _user_interface and adding some bas…

0835fd0

…e classes to make the structure more clear.

Adding a test for pipeline._classes for problems where the labels are…

9012cd0

… ints.

Modifying changelog for 1016.

8a747e3

Adding docstring for _make_single_prediction_shap_table.

b3e1aaa

Making _TableMaker private.

1bb4825

Fixing lint.

a95efce

Not mocking shap values in test_explain_prediction_value_error.

d9b0efb

freddyaboulton force-pushed the 986-explain-best-worst-random-k branch from 277783b to d9b0efb Compare August 10, 2020 18:57

freddyaboulton merged commit 49556bb into main Aug 10, 2020

freddyaboulton deleted the 986-explain-best-worst-random-k branch August 10, 2020 19:17

dsherry mentioned this pull request Aug 25, 2020

Release v0.13.1 #1101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain Predictions #1016

Explain Predictions #1016

freddyaboulton commented Aug 3, 2020 •

edited

Loading

codecov bot commented Aug 3, 2020 •

edited

Loading

freddyaboulton Aug 3, 2020

dsherry Aug 5, 2020

freddyaboulton Aug 7, 2020

freddyaboulton Aug 3, 2020

freddyaboulton Aug 3, 2020

angela97lin Aug 4, 2020

freddyaboulton Aug 4, 2020

freddyaboulton Aug 3, 2020

dsherry Aug 6, 2020

freddyaboulton Aug 7, 2020

freddyaboulton Aug 3, 2020 •

edited

Loading

dsherry Aug 6, 2020

freddyaboulton Aug 3, 2020

angela97lin left a comment

angela97lin Aug 4, 2020

freddyaboulton Aug 4, 2020

angela97lin Aug 4, 2020

freddyaboulton Aug 4, 2020

angela97lin Aug 5, 2020

angela97lin Aug 4, 2020

freddyaboulton Aug 4, 2020

angela97lin Aug 4, 2020

dsherry left a comment

dsherry Aug 5, 2020

dsherry Aug 6, 2020

freddyaboulton commented Aug 7, 2020 •

edited

Loading

dsherry left a comment

		return table_maker(shap_values, normalized_shap_values, top_k, include_shap_values)


		def _abs_error(y_true, y_pred):



		@pytest.mark.parametrize("problem_type", ["binary", "multi"])
		def test_pipeline_has_classes_property(logistic_regression_binary_pipeline_class,



		class _ReportSectionMaker:
		"""Make a prediction explanation report.

Explain Predictions #1016

Explain Predictions #1016

Conversation

freddyaboulton commented Aug 3, 2020 • edited Loading

Pull Request Description

Demo of what the user sees

Regression - Boston Housing Dataset

Binary - Breast Cancer Dataset

Multiclass - Iris Dataset

Changes to docs

codecov bot commented Aug 3, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Aug 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Aug 7, 2020 • edited Loading

dsherry left a comment

Choose a reason for hiding this comment

freddyaboulton commented Aug 3, 2020 •

edited

Loading

codecov bot commented Aug 3, 2020 •

edited

Loading

freddyaboulton Aug 3, 2020 •

edited

Loading

freddyaboulton commented Aug 7, 2020 •

edited

Loading