Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate prediction explanations for derived features #1901

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Feb 25, 2021

Pull Request Description

Fixes #1347

We will only aggregate the values for the features that we know the provenance of. Otherwise, no aggregation will happen. Which is basically the current behavior.

Sample output on titanic dataset

	Best 1 of 5

		Predicted Probabilities: [0: 0.996, 1: 0.004]
		Predicted Value: 0
		Target Value: 0
		Cross Entropy: 0.004
		Index ID: 322

		    Feature Name           Feature Value         Contribution to      SHAP Value
		                                                    Prediction                  
		================================================================================
		         Age                   20.00                    +                0.28   
		        Fare                   69.55                    +                0.04   
		  Parents/Children              2.00                    -               -0.16   
		       Aboard                                                                   
		        Name             Mr. George John Jr             -               -0.31   
		                                Sage                                            
		       Pclass                   3.00                    -               -1.10   
		         Sex                    male                    --              -1.42   
		  Siblings/Spouses              8.00                   ---              -2.50   
		       Aboard 

Example of drill_down dict on titanic dataset

pred['explanations'][0]['explanations'][0]['drill_down']

{'Name': {'feature_names': ['POLARITY_SCORE(Name)',
   'LSA(Name)[1]',
   'DIVERSITY_SCORE(Name)',
   'LSA(Name)[0]',
   'MEAN_CHARACTERS_PER_WORD(Name)'],
  'feature_values': [0.0, -0.010237950687552993, 1.0, 0.3208657470990783, 3.6],
  'qualitative_explanation': ['+', '+', '+', '-', '-'],
  'quantitative_explanation': [0.06198793040637707,
   0.01567506508862504,
   0.0,
   -0.13351634379165223,
   -0.25702767905947055]},
 'Sex': {'feature_names': ['Sex_male', 'Sex_female'],
  'feature_values': [1.0, 0.0],
  'qualitative_explanation': ['-', '-'],
  'quantitative_explanation': [-0.36807136387713535, -1.0517515360270118]}}

Sample output of explain_predictions_best_worst on fraud dataset

	Best 1 of 5

		Predicted Probabilities: [False: 0.988, True: 0.012]
		Predicted Value: False
		Target Value: False
		Cross Entropy: 0.012
		Index ID: 754

		Feature Name      Feature Value      Contribution to Prediction   SHAP Value
		============================================================================
		  currency             SDG                       +                   0.00   
		  datetime     2019-04-05 11:16:41               -                  -0.00   
		  provider          Discover                     -                  -0.00   
		   amount             73.00                    -----                -2.36   


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #1901 (f8ffba7) into main (c499006) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1901     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         267      267             
  Lines       21536    21715    +179     
=========================================
+ Hits        21530    21709    +179     
  Misses          6        6             
Impacted Files Coverage Δ
...derstanding/prediction_explanations/_algorithms.py 97.8% <100.0%> (+0.7%) ⬆️
...tanding/prediction_explanations/_user_interface.py 100.0% <100.0%> (ø)
...nderstanding/prediction_explanations/explainers.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
...s/prediction_explanations_tests/test_algorithms.py 100.0% <100.0%> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <100.0%> (ø)
...ediction_explanations_tests/test_user_interface.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c499006...f8ffba7. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review February 26, 2021 16:26
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a great addition. Nice job.

provenance (dict): A mapping from a feature in the original data to the names of the features that were created
from that feature
Returns:
dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this doc string. Very clear. I think the return just needs a description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Arguments:
values (dict): A mapping of feature names to a list of SHAP values for each data point.
provenance (dict): A mapping from a feature in the original data to the names of the features that were created
from that feature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: period?

json_rows = _rows_to_dict(rows)
drill_down = self.make_drill_down_dict(self.provenance, shap_values[1], normalized_values[1],
pipeline_features, original_features, self.include_shap_values)
json_rows["drill_down"] = drill_down
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "json_rows" a carryover copy pasta? It's kinda weird in make_dict()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name to dict_rows!

@freddyaboulton freddyaboulton force-pushed the 1347-aggregate-prediction-explanations-for-categorical-text-features branch from edf28c1 to 0bebd6a Compare March 1, 2021 21:21
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left a few comments on docstrings and some nitpicks, but nothing blocking.



@pytest.fixture
def fraud_100():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! haha


table_maker = table_maker.make_text if output_format == "text" else table_maker.make_dict

table = table_maker(values, normalized_values, pipeline_features, top_k=3, include_shap_values=include_shap)
table = table_maker(values, normalized_values, values, normalized_values, pipeline_features, pipeline_features)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: I was really confused when I saw these input params repeated. Any chance you can add the keys, like:

table = table_maker(aggregated_shap_values=values,
                    aggregated_normalized_values=normalized_values,
                    shap_values=values,
                    normalized_values=normalized_values,
                    pipeline_features=pipeline_features,
                    original_features=pipeline_features)

just to make it a little clearer?

@abc.abstractmethod
def make_text(self, shap_values, normalized_values, pipeline_features, top_k, include_shap_values=False):
def make_text(self, aggregated_shap_values, aggregated_normalized_values,
shap_values, normalized_values, pipeline_features, orignal_features):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: original_features

json_output_for_class["class_name"] = _make_json_serializable(class_name)
json_output.append(json_output_for_class)
return {"explanations": json_output}


def _make_single_prediction_shap_table(pipeline, pipeline_features, index_to_explain, top_k=3,
def _make_single_prediction_shap_table(pipeline, pipeline_features, input_features, index_to_explain, top_k=3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this docstring to include input_features and index_to_explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@@ -395,7 +473,7 @@ def __init__(self, top_k_features, include_shap_values):
self.top_k_features = top_k_features
self.include_shap_values = include_shap_values

def make_text(self, index, pipeline, pipeline_features):
def make_text(self, index, pipeline, pipeline_features, input_features):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update doc string

@freddyaboulton freddyaboulton force-pushed the 1347-aggregate-prediction-explanations-for-categorical-text-features branch 3 times, most recently from f681308 to e497258 Compare March 3, 2021 15:08
@freddyaboulton freddyaboulton force-pushed the 1347-aggregate-prediction-explanations-for-categorical-text-features branch from e497258 to f8ffba7 Compare March 3, 2021 18:45
@freddyaboulton freddyaboulton merged commit 3b01866 into main Mar 3, 2021
@freddyaboulton freddyaboulton deleted the 1347-aggregate-prediction-explanations-for-categorical-text-features branch March 3, 2021 19:21
@dsherry dsherry mentioned this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prediction explanations should aggregate contributions across all levels of categorical features
3 participants