Add LIME explainability #2905

eccabay · 2021-10-13T15:21:03Z

…into three shorter tests, adding algorithm param

codecov · 2021-10-13T15:47:48Z

Codecov Report

Merging #2905 (bca99c3) into main (abbb7f3) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2905     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     302             
  Lines      28637   28842    +205     
=======================================
+ Hits       28544   28751    +207     
+ Misses        93      91      -2

Impacted Files	Coverage Δ
evalml/model_understanding/force_plots.py	`100.0% <ø> (ø)`
...derstanding/prediction_explanations/_algorithms.py	`99.3% <100.0%> (+0.3%)`	⬆️
...prediction_explanations/_report_creator_factory.py	`100.0% <100.0%> (ø)`
...tanding/prediction_explanations/_user_interface.py	`100.0% <100.0%> (ø)`
...nderstanding/prediction_explanations/explainers.py	`100.0% <100.0%> (ø)`
...s/prediction_explanations_tests/test_algorithms.py	`100.0% <100.0%> (+1.8%)`	⬆️
...s/prediction_explanations_tests/test_explainers.py	`100.0% <100.0%> (ø)`
...ediction_explanations_tests/test_user_interface.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abbb7f3...bca99c3. Read the comment docs.

eccabay · 2021-10-15T12:28:47Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_explainers.py

 @pytest.mark.parametrize(
-    "problem_type,output_format,answer,explain_predictions_answer,custom_index",


This test looks like it was deleted because I refactored it into three separate tests for regression/multiclass/binary. This allowed the parameterization based on the itertools.product of the parameters rather than having to type everything out individually, making it a lot simpler to add an extra algorithm parameter to parametrize against.

freddyaboulton

@eccabay Thanks for this! I appreciate the test refactor and I'm glad 90% of this diff is just replacing shap with explainer lol.

I think this is almost ready but I think we can improve _compute_lime_values before merge.

I also recommend we parametrize the following tests over shap and lime:

test_categories_aggregated_linear_pipeline
test_categories_aggregated_text
test_categories_aggregated_date_ohe
test_categories_aggregated_pca_dag
test_categories_aggregated_but_not_those_that_are_dropped
test_categories_aggregated_when_some_are_dropped
test_explain_predictions_oversampler
test_explain_predictions_url_email
test_explain_predictions_report_shows_original_value_if_possible
test_explain_predictions_best_worst_report_shows_original_value_if_possible

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py

freddyaboulton · 2021-10-15T19:52:59Z

evalml/model_understanding/prediction_explanations/explainers.py

@@ -144,7 +147,7 @@ class ExplainPredictionsStage(Enum):
    PREPROCESSING_STAGE = "preprocessing_stage"
    PREDICT_STAGE = "predict_stage"
    COMPUTE_FEATURE_STAGE = "compute_feature_stage"
-    COMPUTE_SHAP_VALUES_STAGE = "compute_shap_value_stage"
+    COMPUTE_EXPLAINER_VALUES_STAGE = "compute_explainer_value_stage"


I think this will break the progress update code some of our users have set up. Maybe we should file an issue to their repo to get ahead of it.

Which repo is this?

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py

evalml/model_understanding/prediction_explanations/_user_interface.py

evalml/model_understanding/prediction_explanations/_algorithms.py

freddyaboulton · 2021-10-15T20:22:49Z

evalml/model_understanding/prediction_explanations/_algorithms.py

+        mode = "regression"
+
+    def array_predict(row):
+        row = pd.DataFrame(row, columns=feature_names)


Do we need pd.DataFrame? I wonder why we can't get rid of array_predict and just pass pipeline.estimator.predict or predict_proba

The necessity of array_predict is twofold:

The LIME implementation depends on the prediction function returning a numpy array and breaks otherwise, when we return dataframes from all our predictions. This function is called array_predict because at first all it did was call predict and then return it converted to a numpy array.

The LIME implementation converts our input data to numpy as well, so once it calls predict with our pipeline predict we lose any feature names, which causes issues with some of our components (for example, the imputer). The test you asked about below checks this case, as the example in the docs was failing before I added the conversion to DataFrame with the feature names.

This second reason may be mitigated with your point about calling estimator.predict instead of pipeline.predict, but unfortunately the first still holds.

bchen1116

Great work on this! Gonna post my comments for now, but I have 1 more file to go through.

Just left some nits/suggestions

docs/source/user_guide/model_understanding.ipynb

evalml/model_understanding/prediction_explanations/_algorithms.py

evalml/model_understanding/prediction_explanations/_user_interface.py

bchen1116

Can you also add tests to ensure that LIME can cover XGBoost/CatBoost?

evalml/model_understanding/prediction_explanations/explainers.py

bchen1116

Thanks for the changes! LGTM, just left one nit

evalml/model_understanding/prediction_explanations/explainers.py

freddyaboulton

Looks great to me @eccabay ! 🎉

Thank you for making all the changes and adding all the tests. The one thing is that I think lime doesn't actually support the kinds of catboost pipelines AutoMLSearch creates. I say we leave that for another issue and merge this in!

.github/meta.yaml

evalml/model_understanding/prediction_explanations/_user_interface.py

freddyaboulton · 2021-10-20T20:18:27Z

evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py

+            ), f"A SHAP value must be computed for every data point to explain!"
+
+
+def test_lime_catboost_xgboost(X_y_multi):


Unfortunately I do not think lime supports the pipelines with catboost estimators that do not have a one hot encoder (like those created by AutoMLSearch).

from evalml.demos import load_fraud from evalml.pipelines import BinaryClassificationPipeline from evalml.model_understanding import explain_predictions_best_worst import pytest X, y = load_fraud(100) X = X.ww[["region", "provider", "card_id"]] pl = BinaryClassificationPipeline({"Label Encoder": ["Label Encoder", "X", "y"], "RF": ["CatBoost Classifier", "X", "Label Encoder.y"]}) pl.fit(X, y) with pytest.raises(ValueError, match="could not convert string to float: 'Fairview Heights'"): print(explain_predictions_best_worst(pl, X, y, algorithm="lime"))

I think this is the problem @christopherbunn was having with shap + ensembles where these libraries assume all the features are numeric at this point. We don't run into this problem for the other estimators because all the features are engineered to doubles.

I suggest we temporarily disable catboost for lime and file an issue to figure out how to support this case. Thoughts?

Issue filed! #2942

eccabay added 12 commits October 7, 2021 09:18

Add _compute_lime_values to _algorithms

b0ad444

Rename all non-explicitly SHAP functions to explainers

727c81d

Add algorithm flag to switch between lime and shap values

3cfefc5

LIME working for regression problems

c4bb000

Merge branch 'main' into 1052_lime

d31162d

Add tests

f4ac68e

Refactor test_explain_predictions_best_worst_and_explain_predictions …

06e9a5d

…into three shorter tests, adding algorithm param

Merge branch 'main' into 1052_lime

bc3b3d0

So much lint fixing

581d6da

Update release notes

c574b6a

Add lime as a core requirement

4869056

Update docs

6040240

eccabay added 5 commits October 13, 2021 16:01

add missing tests

9a9e71c

Move lime from core-requirements to requirements

7ca5c34

update dependency versions

e97a614

Merge branch 'main' into 1052_lime

712cf65

Small fixes to hopefully pass tests

4960987

eccabay self-assigned this Oct 15, 2021

eccabay commented Oct 15, 2021

View reviewed changes

Update minimum lime dependency

1f62825

eccabay marked this pull request as ready for review October 15, 2021 12:53

eccabay requested review from angela97lin, freddyaboulton, dsherry, christopherbunn, chukarsten, bchen1116, jeremyliweishih and ParthivNaresh October 15, 2021 12:53

Merge branch 'main' into 1052_lime

db7551c

freddyaboulton suggested changes Oct 15, 2021

View reviewed changes

eccabay added 8 commits October 18, 2021 14:15

Address PR comments

2463bf2

Merge branch 'main' into 1052_lime

77e38ac

Add lime to local conda recipe

b2315fc

Recombine explainer tests and pray codecov agrees

f4d96cb

increase lime min version

cf404ce

Add missing importorskips

1a2b046

Merge branch 'main' into 1052_lime

4be51af

Remove unnecessary test

b0c4b21

eccabay requested a review from freddyaboulton October 18, 2021 19:56

bchen1116 reviewed Oct 18, 2021

View reviewed changes

eccabay added 4 commits October 19, 2021 10:08

More PR comments

c994d41

Fix docs typo

5a79608

Merge branch 'main' into 1052_lime

8bd9416

Return features to array_predict for catboost and xgboost

a747880

bchen1116 reviewed Oct 19, 2021

View reviewed changes

evalml/model_understanding/prediction_explanations/explainers.py Outdated Show resolved Hide resolved

eccabay added 2 commits October 20, 2021 09:34

Confirm LIME can support xgboost and catboost

b98e0c8

oops lint

b7d02f4

bchen1116 approved these changes Oct 20, 2021

View reviewed changes

evalml/model_understanding/prediction_explanations/explainers.py Outdated Show resolved Hide resolved

freddyaboulton approved these changes Oct 20, 2021

View reviewed changes

eccabay mentioned this pull request Oct 21, 2021

Enable CatBoost with LIME explain_predictions #2942

Open

eccabay added 5 commits October 21, 2021 09:32

Disable catboost with lime in explain_predictions

86bb7f3

Merge branch 'main' into 1052_lime

1a1b09b

Add another importorskip

1434c35

Merge branch 'main' into 1052_lime

1eea7c5

Merge branch 'main' into 1052_lime

bca99c3

eccabay merged commit 8a0c4ea into main Oct 21, 2021

eccabay deleted the 1052_lime branch October 21, 2021 18:13

chukarsten mentioned this pull request Oct 27, 2021

Release v0.36.0 #2974

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LIME explainability #2905

Add LIME explainability #2905

eccabay commented Oct 13, 2021 •

edited

Loading

codecov bot commented Oct 13, 2021 •

edited

Loading

eccabay Oct 15, 2021

freddyaboulton left a comment

freddyaboulton Oct 15, 2021

eccabay Oct 15, 2021

freddyaboulton Oct 15, 2021

eccabay Oct 15, 2021

bchen1116 left a comment

bchen1116 left a comment

bchen1116 left a comment

freddyaboulton left a comment

freddyaboulton Oct 20, 2021

eccabay Oct 21, 2021

		@pytest.mark.parametrize(
		"problem_type,output_format,answer,explain_predictions_answer,custom_index",

		), f"A SHAP value must be computed for every data point to explain!"


		def test_lime_catboost_xgboost(X_y_multi):

Add LIME explainability #2905

Add LIME explainability #2905

Conversation

eccabay commented Oct 13, 2021 • edited Loading

codecov bot commented Oct 13, 2021 • edited Loading

Codecov Report

eccabay Oct 15, 2021

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Oct 15, 2021

Choose a reason for hiding this comment

eccabay Oct 15, 2021

Choose a reason for hiding this comment

freddyaboulton Oct 15, 2021

Choose a reason for hiding this comment

eccabay Oct 15, 2021

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Oct 20, 2021

Choose a reason for hiding this comment

eccabay Oct 21, 2021

Choose a reason for hiding this comment

eccabay commented Oct 13, 2021 •

edited

Loading

codecov bot commented Oct 13, 2021 •

edited

Loading