New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1434 prediction explanations dataframe output #1781
1434 prediction explanations dataframe output #1781
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1781 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 247 247
Lines 19573 19618 +45
=========================================
+ Hits 19564 19609 +45
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
@@ -399,30 +491,51 @@ def test_output_format_checked(): | |||
@pytest.mark.parametrize("problem_type,output_format,answer,explain_predictions_answer,custom_index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an epic parameterization.
input_features, y_true = pd.DataFrame(data=range(15)), pd.Series(range(15)) | ||
with pytest.raises(ValueError, match="Parameter output_format must be either text or dict. Received foo"): | ||
with pytest.raises(ValueError, match="Parameter output_format must be either text, dict, or dataframe. Received foo"): | ||
explain_predictions_best_worst(None, input_features, y_true=y_true, output_format="foo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably get rid of the foo and bar tests, right? They don't add anything the "xml" one does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
"quantitative_explanation": [0], | ||
}) | ||
# Use side effect so that we always get a new copy of the dataframe | ||
mock_make_table.side_effect = lambda *args, **kwargs: shap_table.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fancy and cool. I'm going to need to go figure out what this does.
elif output_format == "dataframe": | ||
# Check dataframes equal without caring about column order | ||
assert sorted(best_worst_report.columns.tolist()) == sorted(answer.columns.tolist()) | ||
pd.testing.assert_frame_equal(best_worst_report, answer[best_worst_report.columns]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't assert_frame_equal compare the columns as well? L597 might be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but in the event best_worst_report
has fewer columns than expected, the check will pass because answer[best_worst_report.columns]
(needed to reorder the columns) will subset to the columns in best_worst_report
.
The root cause is that I modify the index_id
column at the beginning of the test to add the expected custom index and this unfortunately messes up the column order. I don't think there's a way to get assert_frame_equal to ignore the column order so this two-step check verifies the same columns are present in both and that they contain the same values. I think this is ok given how many different scenarios we're testing. The alternative would be to manually specify the answer for each test but that would be hairy.
f25c120
to
cf35e57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton this is looking great!
I left some comments and questions on the implementation. Nothing blocking, but it would be great to close out those conversations soon.
@@ -246,6 +253,10 @@ def make_text(self, *args, **kwargs): | |||
def make_dict(self, *args, **kwargs): | |||
"""Makes the report section formatted as a dictionary.""" | |||
|
|||
@abc.abstractmethod | |||
def make_dataframe(self, *args, **kwargs): | |||
"""Makes the report section formatted as a dictionary.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean " formatted as a dataframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
for key, value in heading.items(): | ||
if key == "probabilities": | ||
for class_name, probability in value.items(): | ||
shap_table[f'label_{class_name}_probability'] = probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this for
with
labels = [f'label_{class_name}_probability' for class_name in value.keys()]
shap_table[labels] = value.values()
I guess that code would rely on value.values()
being ordered in the same way as value.keys()
, and I don't see this set up as an OrderedDict
, so maybe what you have is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think we can add another for-comprehension to make sure the order is the same but at that point the main benefit of that implementation over this one is saving an indent. So I think I'll stick to this one!
shap_table['rank'] = heading['index'] | ||
shap_table['prefix'] = heading['prefix'] | ||
report.append(shap_table) | ||
df = pd.concat(report).reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this code will create one df for each prediction, and then concatenate all the columns together horizontally? How do users know which columns correspond to which prediction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To resolve: They'll use the prediction_number
column we talked about above!
@@ -423,3 +451,25 @@ def make_dict(self, data): | |||
data.input_features)["explanations"] | |||
report.append(section) | |||
return {"explanations": report} | |||
|
|||
def make_dataframe(self, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way we can have this implementation use _make_single_prediction_shap_table
to make the tables per-prediction, and then concatenate them together? This would avoid confusion and bugs stemming from current or future differences in the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what we're doing! self.table_maker.make_dataframe(index, data.pipeline, data.input_features)
calls _make_single_prediction_shap_table
to make the shap table per prediction. Then we add metadata to each table (prediction probabilities, etc) and then we concat all the tables at the end.
"report = explain_predictions(pipeline=pipeline, input_features=X.iloc[[0, 4, 9]], include_shap_values=True)\n", | ||
"print(report)" | ||
"report = explain_predictions(pipeline=pipeline, input_features=X.iloc[[0, 4, 9]], include_shap_values=True,\n", | ||
" output_format='dataframe')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton could you explain how to read the output here?
Each grouping of rows corresponds to a different prediction explanation, yes? And that's indicated by the "rank" column. What confuses me is that "rank" ranges from 0 to n-1 whereas the indices you passed to input_features
are 0, 4 and 9. I'm concerned someone looking at the dataframe output will have trouble matching those row groups up with the original dataframe row that info corresponds to.
If I'm following correctly here, I suggest we either find a way to get the original dataframe index into this dataframe report, or we change our language a bit and call this "prediction number" instead of "rank". The latter option seems more appealing.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry You're spot on about how to read the table based on the grouping! I agree with the rename of rank
to prediction_number
. Adding the original dataframe index is possible but we don't do that in explain_prediction
for the other output formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -448,8 +448,18 @@ | |||
"source": [ | |||
"from evalml.model_understanding.prediction_explanations import explain_predictions\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I see the updates below are for the "Explaining Multiple Predictions" section. Should we also update the "Explaining Individual Predictions" section? I see that example still uses print(table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a section on changing the output format for explain_prediction
in the "Changing output formats" section!
Pull Request Description
Fixes #1434
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.