Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series: Explain Predictions #1818

Merged
merged 16 commits into from Feb 12, 2021
Merged

Time Series: Explain Predictions #1818

merged 16 commits into from Feb 12, 2021

Conversation

chukarsten
Copy link
Collaborator

Addresses: #1504

Oh my goodness. You've been waiting for it, so here it is.

@chukarsten chukarsten force-pushed the 1504-ts_predictions_take_2 branch 2 times, most recently from 9c85412 to db9d8c2 Compare February 11, 2021 16:38
@codecov
Copy link

codecov bot commented Feb 11, 2021

Codecov Report

Merging #1818 (cb0e9cc) into main (2747fbc) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #1818     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        255     255             
  Lines      20571   20608     +37     
=======================================
+ Hits       20549   20586     +37     
  Misses        22      22             
Impacted Files Coverage Δ
...derstanding/prediction_explanations/_algorithms.py 97.2% <100.0%> (ø)
...prediction_explanations/_report_creator_factory.py 100.0% <100.0%> (ø)
...tanding/prediction_explanations/_user_interface.py 100.0% <100.0%> (ø)
...nderstanding/prediction_explanations/explainers.py 100.0% <100.0%> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2747fbc...cb0e9cc. Read the comment docs.

@@ -82,7 +82,7 @@ def _compute_shap_values(pipeline, features, training_data=None):

# More than 100 datapoints can negatively impact runtime according to SHAP
# https://github.com/slundberg/shap/blob/master/shap/explainers/kernel.py#L114
sampled_training_data_features = pipeline.compute_estimator_features(shap.sample(training_data, 100)).to_dataframe()
sampled_training_data_features = shap.sample(training_data, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this to avoid having to compute the features twice.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten Looks good to me! It was a pleasure getting this done together. I just have some minor clean-up style comments.

@@ -316,7 +326,7 @@ def make_dict(self, index, y_pred, y_true, scores, dataframe_index):
"predicted_value": _make_json_serializable(self.predicted_values[index]),
"target_value": _make_json_serializable(y_true.iloc[index]),
"error_name": self.error_name,
"error_value": _make_json_serializable(scores.iloc[index]),
"error_value": _make_json_serializable(scores[index]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the error metrics are now stored in a numpy array.


# Check that the computed features to be explained aren't NaN.
for exp_idx in range(len(exp["explanations"])):
assert not np.isnan(np.array(exp["explanations"][exp_idx]["explanations"][0]["feature_values"])).any()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not checking the output against a specified value. The other tests check that the logic for selecting the rows that are explained and displaying the right features/shap values is correct so I think running real data through and verifying it completes without showing NaNs is sufficient.

@chukarsten chukarsten force-pushed the 1504-ts_predictions_take_2 branch 3 times, most recently from 6a5f70c to b556cee Compare February 12, 2021 00:08
@chukarsten chukarsten marked this pull request as ready for review February 12, 2021 00:09
@chukarsten chukarsten merged commit 2e561c7 into main Feb 12, 2021
@chukarsten chukarsten deleted the 1504-ts_predictions_take_2 branch February 12, 2021 16:50
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants