Adding Feature Value column to SHAP table. #1064

freddyaboulton · 2020-08-14T18:28:43Z

Pull Request Description

Docs Changes

Demo for Regression Problem

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-08-14T18:29:29Z

Codecov Report

Merging #1064 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##             main    #1064    +/-   ##
========================================
  Coverage   99.91%   99.91%            
========================================
  Files         188      191     +3     
  Lines       10296    10852   +556     
========================================
+ Hits        10287    10843   +556     
  Misses          9        9

Impacted Files	Coverage Δ
..._understanding/prediction_explanations/__init__.py	`100.00% <ø> (ø)`
evalml/pipelines/__init__.py	`100.00% <ø> (ø)`
evalml/tests/pipeline_tests/test_graphs.py	`100.00% <ø> (ø)`
evalml/utils/__init__.py	`100.00% <ø> (ø)`
evalml/automl/automl_search.py	`99.55% <100.00%> (+<0.01%)`	⬆️
evalml/exceptions/exceptions.py	`100.00% <100.00%> (ø)`
evalml/model_understanding/__init__.py	`100.00% <100.00%> (ø)`
evalml/model_understanding/graphs.py	`100.00% <100.00%> (ø)`
...derstanding/prediction_explanations/_algorithms.py	`100.00% <100.00%> (ø)`
...tanding/prediction_explanations/_user_interface.py	`100.00% <100.00%> (ø)`
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ecf3bd...184c87c. Read the comment docs.

angela97lin

This was a really good suggestion from the demos and this LGTM! 😁

…n-reports

…del_understanding_tests.

…of github.com:FeatureLabs/evalml into 1035-add-feature-values-column-to-explanation-reports

dsherry

Great! I left a question about how the call to PipelineBase._transform is happening now, a suggestion about using a pandas dataframe instead of a dict for table_maker, a suggestion about rounding, and a little test code suggested simplification. None blocking though.

dsherry · 2020-08-18T21:44:09Z

evalml/model_understanding/prediction_explanations/_algorithms.py

@@ -47,14 +47,13 @@ def _compute_shap_values(pipeline, features, training_data=None):
    if estimator.model_family == ModelFamily.BASELINE:
        raise ValueError("You passed in a baseline pipeline. These are simple enough that SHAP values are not needed.")

-    pipeline_features = pipeline._transform(features)


How are you able to delete this? Are you now calling PipelineBase._transform one step up the stack?

Yea, moved it to _make_single_prediction_shap_table to only call it once.

dsherry · 2020-08-18T22:06:46Z

evalml/model_understanding/prediction_explanations/_user_interface.py

-        row = [feature_name, display_text]
+        feature_value = pipeline_features[feature_name]
+        if pd.api.types.is_number(feature_value):
+            feature_value = np.round(feature_value, 2)


Is the rounding necessary? I think '{0:2f}'.format(feature_value) would be better. That way, you're guaranteed the output is limited to maximum 2 characters after the decimal point.

You're 100% right! Much better than calling round.

dsherry · 2020-08-18T22:07:11Z

evalml/model_understanding/prediction_explanations/_user_interface.py

-    dtypes = ["t", "t", "f"] if include_shap_values else ["t", "t"]
-    alignment = ["c", "c", "c"] if include_shap_values else ["c", "c"]
+    dtypes = ["t", "f", "t", "f"] if include_shap_values else ["t", "t", "t"]
+    alignment = ["c", "c", "c", "c"] if include_shap_values else ["c", "c", "c"]


What's this doing?

Setting the formatting for the columns but now that we are using string formatting to round rather than using numpy, we can use text for each column, which will simplify these lines.

dsherry · 2020-08-18T22:08:19Z

evalml/model_understanding/prediction_explanations/_user_interface.py

@@ -136,14 +142,18 @@ def _make_single_prediction_shap_table(pipeline, input_features, top_k=3, traini
    shap_values = _compute_shap_values(pipeline, input_features, training_data)
    normalized_shap_values = _normalize_shap_values(shap_values)

+    # We need a dict of type {column_name: feature value}
+    pipeline_features = pipeline._transform(input_features)
+    features_dict = dict(zip(pipeline_features.columns, *pipeline_features.values))


Why do you need a dict? Why not pass in the pandas df and index into it? The fewer datastructures we have floating around, the better.

I made this change!

Thanks, great!

dsherry · 2020-08-18T22:09:23Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_explainers.py

@@ -18,7 +18,11 @@
 def compare_two_tables(table_1, table_2):
    assert len(table_1) == len(table_2)
    for row, row_answer in zip(table_1, table_2):
-        assert row.strip().split() == row_answer.strip().split()
+        # To make it easier to compare header underline


Oh so this isn't fixed-length? Or something which you could easily calculate in the test?

I got lazy but I fixed it now by setting the correct length in the answer and removing this line 😬

dsherry · 2020-08-18T22:45:26Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_explainers.py

@@ -96,6 +100,7 @@ def test_explain_prediction(mock_normalize_shap_values,
    pipeline = MagicMock()
    pipeline.problem_type = problem_type
    pipeline._classes = ["class_0", "class_1", "class_2"]
+    pipeline._transform.return_value = pd.DataFrame({"a": [10], "b": [20], "c": [30], "d": [40]})


Cool. Could be nice to have the DF have more than one row, to make sure indexing works. Not sure if that's covered elsewhere

By the time we get to this point in the code, the df is guaranteed to only have one row. I'll add a comment to the test to explain!

Ah ok thanks

dsherry · 2020-08-18T22:46:27Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_user_interface.py

+
+    if include_string_features:
+        pipeline_features["a"] = "foo-feature"
+        pipeline_features["b"] = np.datetime64("2020-08-14")


Nice, great that you included datetimes / something other than numeric and string

dsherry · 2020-08-18T22:51:15Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_user_interface.py

    if include_shap_values:
        new_answer = copy.deepcopy(answer)
        for row in new_answer:
            row.append(values[row[0]][0])
    else:
        new_answer = answer

-    assert _make_rows(values, values, top_k, include_shap_values) == new_answer
+    if include_string_features:
+        new_answer = copy.deepcopy(new_answer)


This is a no-op, right?

Yea, deleted this since I went with your suggestion below!

dsherry · 2020-08-18T22:53:57Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_user_interface.py

+            if row[0] == "a":
+                row[1] = "foo-feature"
+            elif row[0] == "b":
+                row[1] = "2020-08-14"


I had to wrack my brain to remember if this would work in python!

Suggested simplification:

filtered_answer = [] for row in new_answer: val = row[1] if row[0] == "a": val = "foo-feature" elif row[0] == "b": val = "2020-08-14" filtered_answer.append((row[0], val))

I went with your suggestion! Although it looks a little bit different because row has four values and so we want to append four values to filtered_answer, not just two values.

dsherry · 2020-08-18T22:54:24Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_user_interface.py

+                     c 1.000 + 0.000
+                     d -1.560 -- -2.560
+                     e -1.800 -- -2.800
+                     f -1.900 -- -2.900""".splitlines()


These reports look good!

freddyaboulton · 2020-08-19T18:33:00Z

@dsherry Thanks for the feedback - I addressed your comments!

Adding Feature Value column to SHAP table.

6ed43de

Editing release_notes for PR 1064.

1ed7535

freddyaboulton marked this pull request as ready for review August 14, 2020 19:10

auto-assign bot assigned freddyaboulton Aug 14, 2020

freddyaboulton requested review from dsherry, angela97lin and jeremyliweishih August 14, 2020 19:11

angela97lin approved these changes Aug 17, 2020

View reviewed changes

freddyaboulton and others added 4 commits August 17, 2020 15:27

Merge branch 'main' into 1035-add-feature-values-column-to-explanatio…

209fab2

…n-reports

Merging in latest refactoring.

52d7f76

Moving updated prediction explanation tests from pipeline_tests to mo…

fd49410

…del_understanding_tests.

Merge branch '1035-add-feature-values-column-to-explanation-reports' …

d50a1ed

…of github.com:FeatureLabs/evalml into 1035-add-feature-values-column-to-explanation-reports

dsherry approved these changes Aug 18, 2020

View reviewed changes

freddyaboulton added 2 commits August 19, 2020 13:25

Using string formatting to round and making some minor tweaks to tests.

541636b

Adding comment explaining mocked return value of pipeline._transform

184c87c

freddyaboulton merged commit 2ae172a into main Aug 19, 2020

dsherry mentioned this pull request Aug 25, 2020

Release v0.13.1 #1101

Merged

freddyaboulton deleted the 1035-add-feature-values-column-to-explanation-reports branch October 22, 2020 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Feature Value column to SHAP table. #1064

Adding Feature Value column to SHAP table. #1064

freddyaboulton commented Aug 14, 2020 •

edited

Loading

codecov bot commented Aug 14, 2020 •

edited

Loading

angela97lin left a comment

dsherry left a comment

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 19, 2020

dsherry Aug 18, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton Aug 19, 2020

dsherry Aug 18, 2020

freddyaboulton commented Aug 19, 2020

Adding Feature Value column to SHAP table. #1064

Adding Feature Value column to SHAP table. #1064

Conversation

freddyaboulton commented Aug 14, 2020 • edited Loading

Pull Request Description

Docs Changes

Demo for Regression Problem

codecov bot commented Aug 14, 2020 • edited Loading

Codecov Report

angela97lin left a comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Aug 19, 2020

freddyaboulton commented Aug 14, 2020 •

edited

Loading

codecov bot commented Aug 14, 2020 •

edited

Loading