Explain Predictions (#1016)

* Working implementation of explain_predictions and explain_predictions_best_worst. * Refactoring explainers so that differences between report types are more modular. * Updating release notes for PR 1016. * Adding tests for error metrics. * Moving release note for PR 1016 to upcoming release. * Adding test for classification pipeline _classes property. * Adding section to model understanding user guide about explain_predictions and explain_predictions_best_worst. * Adding test for custom metric. Replacing names of default error metrics with more user-friendly output. * Adding explain_predictions and explain_predictions_best_worst to api reference and fixing docs. * Adding Predicted Value as a field in the classification reports. * Making metrics public. * Reducing the text output in the prediction explanations section of the model understanding user guide. * If a user tries to access pipeline._classes before fitting the pipeline, a helpful ValueError will be raised. * Fixing typo in docs. * Moving the ReportSectionMakers to _user_interface and adding some base classes to make the structure more clear. * Adding a test for pipeline._classes for problems where the labels are ints. * Modifying changelog for 1016. * Adding docstring for _make_single_prediction_shap_table. * Making _TableMaker private. * Fixing lint. * Not mocking shap values in test_explain_prediction_value_error.
alteryx · Aug 10, 2020 · 49556bb · 49556bb
1 parent bbc315f
commit 49556bb
Show file tree

Hide file tree

Showing 10 changed files with 877 additions and 96 deletions.
diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
@@ -237,6 +237,8 @@ Prediction Explanations
     :nosignatures:
 
     explain_prediction
+    explain_predictions
+    explain_predictions_best_worst
 
 
 .. currentmodule:: evalml.objectives

diff --git a/docs/source/release_notes.rst b/docs/source/release_notes.rst
@@ -4,6 +4,7 @@ Release Notes
 **Future Releases**
     * Enhancements
         * Split `fill_value` into `categorical_fill_value` and `numeric_fill_value` for Imputer :pr:`1019`
+        * Added `explain_predictions` and `explain_predictions_best_worst` for explaining multiple predictions with SHAP :pr:`1016`
     * Fixes
     * Changes
     * Documentation Changes

diff --git a/docs/source/user_guide/model_understanding.ipynb b/docs/source/user_guide/model_understanding.ipynb
@@ -132,9 +132,9 @@
    "outputs": [],
    "source": [
     "# get the predicted probabilities associated with the \"true\" label\n",
-    "y = y.map({'malignant': 0, 'benign': 1})\n",
+    "y_encoded = y.map({'malignant': 0, 'benign': 1})\n",
     "y_pred_proba = pipeline.predict_proba(X)[\"benign\"]\n",
-    "evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)"
+    "evalml.pipelines.graph_utils.graph_precision_recall_curve(y_encoded, y_pred_proba)"
    ]
   },
   {
@@ -154,7 +154,7 @@
    "source": [
     "# get the predicted probabilities associated with the \"benign\" label\n",
     "y_pred_proba = pipeline.predict_proba(X)[\"benign\"]\n",
-    "evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)"
+    "evalml.pipelines.graph_utils.graph_roc_curve(y_encoded, y_pred_proba)"
    ]
   },
   {
@@ -163,7 +163,7 @@
    "source": [
     "## Explaining Individual Predictions\n",
     "\n",
-    "We can explain why the model made an individual prediction with the `explain_prediction` function. This will use the [Shapley Additive Explanations (SHAP)](https://github.com/slundberg/shap) algorithms to identify the top features that explain the predicted value. \n",
+    "We can explain why the model made an individual prediction with the [explain_prediction](../generated/evalml.pipelines.prediction_explanations.explain_prediction.ipynb) function. This will use the [Shapley Additive Explanations (SHAP)](https://github.com/slundberg/shap) algorithms to identify the top features that explain the predicted value. \n",
     "\n",
     "This function can explain both classification and regression models - all you need to do is provide the pipeline, the input features (must correspond to one row of the input data) and the training data. The function will return a table that you can print summarizing the top 3 most positive and negative contributing features to the predicted value.\n",
     "\n",
@@ -191,6 +191,77 @@
     "\n",
     "This functionality is currently **not supported** for **XGBoost** models or **CatBoost multiclass** classifiers."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Explaining Multiple Predictions\n",
+    "\n",
+    "When debugging machine learning models, it is often useful to analyze the best and worst predictions the model made. The [explain_predictions_best_worst](../generated/evalml.pipelines.prediction_explanations.explain_predictions_best_worst.ipynb) function can help us with this.\n",
+    "\n",
+    "This function will display the output of [explain_prediction](../generated/evalml.pipelines.prediction_explanations.explain_prediction.ipynb) for the best 2 and worst 2 predictions. By default, the best and worst predictions are determined by the absolute error for regression problems and [cross entropy](https://en.wikipedia.org/wiki/Cross_entropy) for classification problems.\n",
+    "\n",
+    "We can specify our own ranking function by passing in a function to the `metric` parameter. This function will be called on `y_true` and `y_pred`. By convention, lower scores are better.\n",
+    "\n",
+    "At the top of each table, we can see the predicted probabilities, target value, and error on that prediction. For a regression problem, we would see the predicted value instead of predicted probabilities.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from evalml.pipelines.prediction_explanations import explain_predictions_best_worst\n",
+    "\n",
+    "report = explain_predictions_best_worst(pipeline=pipeline, input_features=X, y_true=y,\n",
+    "                                        include_shap_values=True, num_to_explain=2)\n",
+    "\n",
+    "print(report)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We use a custom metric ([hinge loss](https://en.wikipedia.org/wiki/Hinge_loss)) for selecting the best and worst predictions. See this example:\n",
+    "\n",
+    "```python\n",
+    "import numpy as np\n",
+    "\n",
+    "def hinge_loss(y_true, y_pred_proba):\n",
+    "    \n",
+    "    probabilities = np.clip(y_pred_proba.iloc[:, 1], 0.001, 0.999)\n",
+    "    y_true[y_true == 0] = -1\n",
+    "    \n",
+    "    return np.clip(1 - y_true * np.log(probabilities / (1 - probabilities)), a_min=0, a_max=None)\n",
+    "\n",
+    "report = explain_predictions_best_worst(pipeline=pipeline, input_features=X, y_true=y,\n",
+    "                                        include_shap_values=True, num_to_explain=5, metric=hinge_loss)\n",
+    "\n",
+    "print(report)\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also manually explain predictions on any subset of the training data with the [explain_predictions](../generated/evalml.pipelines.prediction_explanations.explain_predictions.ipynb) function. Below, we explain the predictions on the first, fifth, and tenth row of the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from evalml.pipelines.prediction_explanations import explain_predictions\n",
+    "\n",
+    "report = explain_predictions(pipeline=pipeline, input_features=X.iloc[[0, 4, 9]], include_shap_values=True)\n",
+    "print(report)"
+   ]
   }
  ],
  "metadata": {

diff --git a/evalml/pipelines/classification_pipeline.py b/evalml/pipelines/classification_pipeline.py
@@ -59,6 +59,13 @@ def _decode_targets(self, y):
             originally had integer targets."""
         return self._encoder.inverse_transform(y.astype(int))
 
+    @property
+    def _classes(self):
+        """Gets the class names for the problem."""
+        if not hasattr(self._encoder, "classes_"):
+            raise AttributeError("Cannot access class names before fitting the pipeline.")
+        return self._encoder.classes_
+
     def _predict(self, X, objective=None):
         """Make predictions using selected features.
 

diff --git a/evalml/pipelines/prediction_explanations/__init__.py b/evalml/pipelines/prediction_explanations/__init__.py
@@ -1,2 +1,2 @@
 # flake8:noqa
-from .explainers import explain_prediction
+from .explainers import explain_prediction, explain_predictions_best_worst, explain_predictions