-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LIME explainability #2905
Add LIME explainability #2905
Conversation
…into three shorter tests, adding algorithm param
Codecov Report
@@ Coverage Diff @@
## main #2905 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 302 302
Lines 28637 28842 +205
=======================================
+ Hits 28544 28751 +207
+ Misses 93 91 -2
Continue to review full report at Codecov.
|
@pytest.mark.parametrize( | ||
"problem_type,output_format,answer,explain_predictions_answer,custom_index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks like it was deleted because I refactored it into three separate tests for regression/multiclass/binary. This allowed the parameterization based on the itertools.product
of the parameters rather than having to type everything out individually, making it a lot simpler to add an extra algorithm
parameter to parametrize against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay Thanks for this! I appreciate the test refactor and I'm glad 90% of this diff is just replacing shap
with explainer
lol.
I think this is almost ready but I think we can improve _compute_lime_values
before merge.
I also recommend we parametrize the following tests over shap and lime:
test_categories_aggregated_linear_pipeline
test_categories_aggregated_text
test_categories_aggregated_date_ohe
test_categories_aggregated_pca_dag
test_categories_aggregated_but_not_those_that_are_dropped
test_categories_aggregated_when_some_are_dropped
test_explain_predictions_oversampler
test_explain_predictions_url_email
test_explain_predictions_report_shows_original_value_if_possible
test_explain_predictions_best_worst_report_shows_original_value_if_possible
evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py
Outdated
Show resolved
Hide resolved
@@ -144,7 +147,7 @@ class ExplainPredictionsStage(Enum): | |||
PREPROCESSING_STAGE = "preprocessing_stage" | |||
PREDICT_STAGE = "predict_stage" | |||
COMPUTE_FEATURE_STAGE = "compute_feature_stage" | |||
COMPUTE_SHAP_VALUES_STAGE = "compute_shap_value_stage" | |||
COMPUTE_EXPLAINER_VALUES_STAGE = "compute_explainer_value_stage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will break the progress update code some of our users have set up. Maybe we should file an issue to their repo to get ahead of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which repo is this?
evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py
Outdated
Show resolved
Hide resolved
evalml/tests/model_understanding_tests/prediction_explanations_tests/test_algorithms.py
Outdated
Show resolved
Hide resolved
evalml/model_understanding/prediction_explanations/_user_interface.py
Outdated
Show resolved
Hide resolved
mode = "regression" | ||
|
||
def array_predict(row): | ||
row = pd.DataFrame(row, columns=feature_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need pd.DataFrame
? I wonder why we can't get rid of array_predict
and just pass pipeline.estimator.predict
or predict_proba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessity of array_predict
is twofold:
- The LIME implementation depends on the prediction function returning a numpy array and breaks otherwise, when we return dataframes from all our predictions. This function is called
array_predict
because at first all it did was call predict and then return it converted to a numpy array. - The LIME implementation converts our input data to numpy as well, so once it calls predict with our pipeline predict we lose any feature names, which causes issues with some of our components (for example, the imputer). The test you asked about below checks this case, as the example in the docs was failing before I added the conversion to DataFrame with the feature names.
This second reason may be mitigated with your point about calling estimator.predict
instead of pipeline.predict
, but unfortunately the first still holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! Gonna post my comments for now, but I have 1 more file to go through.
Just left some nits/suggestions
evalml/model_understanding/prediction_explanations/_algorithms.py
Outdated
Show resolved
Hide resolved
evalml/model_understanding/prediction_explanations/_algorithms.py
Outdated
Show resolved
Hide resolved
evalml/model_understanding/prediction_explanations/_algorithms.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests to ensure that LIME can cover XGBoost/CatBoost?
evalml/model_understanding/prediction_explanations/explainers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! LGTM, just left one nit
evalml/model_understanding/prediction_explanations/explainers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me @eccabay ! 🎉
Thank you for making all the changes and adding all the tests. The one thing is that I think lime doesn't actually support the kinds of catboost pipelines AutoMLSearch creates. I say we leave that for another issue and merge this in!
evalml/model_understanding/prediction_explanations/_user_interface.py
Outdated
Show resolved
Hide resolved
), f"A SHAP value must be computed for every data point to explain!" | ||
|
||
|
||
def test_lime_catboost_xgboost(X_y_multi): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I do not think lime supports the pipelines with catboost estimators that do not have a one hot encoder (like those created by AutoMLSearch).
from evalml.demos import load_fraud
from evalml.pipelines import BinaryClassificationPipeline
from evalml.model_understanding import explain_predictions_best_worst
import pytest
X, y = load_fraud(100)
X = X.ww[["region", "provider", "card_id"]]
pl = BinaryClassificationPipeline({"Label Encoder": ["Label Encoder", "X", "y"],
"RF": ["CatBoost Classifier", "X", "Label Encoder.y"]})
pl.fit(X, y)
with pytest.raises(ValueError, match="could not convert string to float: 'Fairview Heights'"):
print(explain_predictions_best_worst(pl, X, y, algorithm="lime"))
I think this is the problem @christopherbunn was having with shap + ensembles where these libraries assume all the features are numeric at this point. We don't run into this problem for the other estimators because all the features are engineered to doubles.
I suggest we temporarily disable catboost for lime and file an issue to figure out how to support this case. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue filed! #2942
Closes #1052