Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LIME explainability #2905

Merged
merged 43 commits into from Oct 21, 2021
Merged

Add LIME explainability #2905

merged 43 commits into from Oct 21, 2021

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Oct 13, 2021

Closes #1052

@codecov
Copy link

codecov bot commented Oct 13, 2021

Codecov Report

Merging #2905 (bca99c3) into main (abbb7f3) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2905     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     302             
  Lines      28637   28842    +205     
=======================================
+ Hits       28544   28751    +207     
+ Misses        93      91      -2     
Impacted Files Coverage Δ
evalml/model_understanding/force_plots.py 100.0% <ø> (ø)
...derstanding/prediction_explanations/_algorithms.py 99.3% <100.0%> (+0.3%) ⬆️
...prediction_explanations/_report_creator_factory.py 100.0% <100.0%> (ø)
...tanding/prediction_explanations/_user_interface.py 100.0% <100.0%> (ø)
...nderstanding/prediction_explanations/explainers.py 100.0% <100.0%> (ø)
...s/prediction_explanations_tests/test_algorithms.py 100.0% <100.0%> (+1.8%) ⬆️
...s/prediction_explanations_tests/test_explainers.py 100.0% <100.0%> (ø)
...ediction_explanations_tests/test_user_interface.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abbb7f3...bca99c3. Read the comment docs.

@eccabay eccabay self-assigned this Oct 15, 2021
Comment on lines 522 to -482
@pytest.mark.parametrize(
"problem_type,output_format,answer,explain_predictions_answer,custom_index",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks like it was deleted because I refactored it into three separate tests for regression/multiclass/binary. This allowed the parameterization based on the itertools.product of the parameters rather than having to type everything out individually, making it a lot simpler to add an extra algorithm parameter to parametrize against.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay Thanks for this! I appreciate the test refactor and I'm glad 90% of this diff is just replacing shap with explainer lol.

I think this is almost ready but I think we can improve _compute_lime_values before merge.

I also recommend we parametrize the following tests over shap and lime:

test_categories_aggregated_linear_pipeline
test_categories_aggregated_text
test_categories_aggregated_date_ohe
test_categories_aggregated_pca_dag
test_categories_aggregated_but_not_those_that_are_dropped
test_categories_aggregated_when_some_are_dropped
test_explain_predictions_oversampler
test_explain_predictions_url_email
test_explain_predictions_report_shows_original_value_if_possible
test_explain_predictions_best_worst_report_shows_original_value_if_possible

@@ -144,7 +147,7 @@ class ExplainPredictionsStage(Enum):
PREPROCESSING_STAGE = "preprocessing_stage"
PREDICT_STAGE = "predict_stage"
COMPUTE_FEATURE_STAGE = "compute_feature_stage"
COMPUTE_SHAP_VALUES_STAGE = "compute_shap_value_stage"
COMPUTE_EXPLAINER_VALUES_STAGE = "compute_explainer_value_stage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break the progress update code some of our users have set up. Maybe we should file an issue to their repo to get ahead of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which repo is this?

mode = "regression"

def array_predict(row):
row = pd.DataFrame(row, columns=feature_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need pd.DataFrame? I wonder why we can't get rid of array_predict and just pass pipeline.estimator.predict or predict_proba

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The necessity of array_predict is twofold:

  • The LIME implementation depends on the prediction function returning a numpy array and breaks otherwise, when we return dataframes from all our predictions. This function is called array_predict because at first all it did was call predict and then return it converted to a numpy array.
  • The LIME implementation converts our input data to numpy as well, so once it calls predict with our pipeline predict we lose any feature names, which causes issues with some of our components (for example, the imputer). The test you asked about below checks this case, as the example in the docs was failing before I added the conversion to DataFrame with the feature names.

This second reason may be mitigated with your point about calling estimator.predict instead of pipeline.predict, but unfortunately the first still holds.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! Gonna post my comments for now, but I have 1 more file to go through.

Just left some nits/suggestions

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests to ensure that LIME can cover XGBoost/CatBoost?

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! LGTM, just left one nit

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @eccabay ! 🎉

Thank you for making all the changes and adding all the tests. The one thing is that I think lime doesn't actually support the kinds of catboost pipelines AutoMLSearch creates. I say we leave that for another issue and merge this in!

.github/meta.yaml Show resolved Hide resolved
), f"A SHAP value must be computed for every data point to explain!"


def test_lime_catboost_xgboost(X_y_multi):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I do not think lime supports the pipelines with catboost estimators that do not have a one hot encoder (like those created by AutoMLSearch).

from evalml.demos import load_fraud
from evalml.pipelines import BinaryClassificationPipeline
from evalml.model_understanding import explain_predictions_best_worst
import pytest

X, y = load_fraud(100)
X = X.ww[["region", "provider", "card_id"]]

pl = BinaryClassificationPipeline({"Label Encoder": ["Label Encoder", "X", "y"],
                                   "RF": ["CatBoost Classifier", "X", "Label Encoder.y"]})
pl.fit(X, y)
with pytest.raises(ValueError, match="could not convert string to float: 'Fairview Heights'"):
    print(explain_predictions_best_worst(pl, X, y, algorithm="lime"))

I think this is the problem @christopherbunn was having with shap + ensembles where these libraries assume all the features are numeric at this point. We don't run into this problem for the other estimators because all the features are engineered to doubles.

I suggest we temporarily disable catboost for lime and file an issue to figure out how to support this case. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue filed! #2942

@eccabay eccabay merged commit 8a0c4ea into main Oct 21, 2021
@eccabay eccabay deleted the 1052_lime branch October 21, 2021 18:13
@chukarsten chukarsten mentioned this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add LIME explainability
3 participants