-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dictionary Output Format for Explain Predictions #1107
Add Dictionary Output Format for Explain Predictions #1107
Conversation
…del_understanding_tests.
…of github.com:FeatureLabs/evalml into 1035-add-feature-values-column-to-explanation-reports
…into 1036-json-output-for-prediction-explanations
Codecov Report
@@ Coverage Diff @@
## main #1107 +/- ##
========================================
Coverage 99.91% 99.91%
========================================
Files 194 195 +1
Lines 10959 11126 +167
========================================
+ Hits 10950 11117 +167
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton looks good!
Some comments:
- Please update the prediction explanations user guide with an example of how to use this to generate JSON.
- For the JSON schema, perhaps you should call "explanation" "explanations", since the value there is a list of explanations, right?
- I left some thoughts on how to reorganize the code so that you don't need to add a separate class for dict vs text. I think I got a bit confused and was suggesting
to_json
methods--I believeto_dict
would be more appropriate, given that our top-level methods are now returning dict, not a JSON string. - I left a comment about adding a unit test which runs the method with dict format, writes the dict to JSON, reads it back to dict format and expects it equals the expected value. I think having this will serve as a helpful ref to devs on what the schema format is in full, make it easy to change.
- There's a lot of code getting changed. I do wonder if there's a way to delete some code in the implementation(s) here.
- Food for thought: backwards compatibility.
- Should we include a "schema version" tag somewhere, which we'd increment every time we make a breaking change? I actually don't think we should do this now, but we may consider it
- In general, it would be nice if we ended up with a design pattern for serialization/deserialization where deserializing (JSON to dict) always results in conversion to the new format/version, regardless of how old the old version was. The way to do it is to add
if
s to deserialize ordered by legacy version from oldest to newest, and then every time you make a schema change, you add anotherif
to get from the previous version to the new version - I believe we're eventually going to need a pattern like this for our components and pipelines.
value = float(value) | ||
elif pd.api.types.is_bool(value): | ||
value = bool(value) | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton could you use the JSONEncoder abstraction instead of using this? I've found it can be helpful. You basically just need to write something like this, where if there are any np types in the input, you convert them to list or whatever. The advantage would be that we're handling not only booleans/ints/floats but anything else which came up
https://pynative.com/python-serialize-numpy-ndarray-into-json/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you could do something like this:
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
elif isinstance(obj, np.bool_):
return bool(obj)
elif isinstance(obj, (datetime, date)):
return obj.isoformat()
elif isinstance(obj, pd.Interval):
return str(obj)
elif issubclass(obj, PipelineBase):
return obj.name
elif hasattr(obj, "name") and callable(obj.name):
return obj.name
else:
return super(CustomEncoder, self).default(obj)
import simplejson as json
with open(output_filepath, "w") as fp:
json.dumps(d, allow_nan=True, ignore_nan=True, cls=CustomEncoder)
- This can be extended as more custom Classes are added (see PipelineBase).
- In addition, I would recommend using
simplejson
instead of json. The default JSON library will not convert NaN to null (NaN is not JSON compliant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! I think this is out of scope for this PR though because we want to return a dictionary rather than a json string to the user.
convert_numeric_to_string=False) | ||
json_rows = _rows_to_dict(rows) | ||
json_rows["class_name"] = _make_json_serializable(self.class_names[1]) | ||
return {"explanation": [json_rows]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton why not do
return jsonify({
"prediction_explanation": {
"class_name": self.class_names[1],
"rows": _make_rows(...)
})
}
Or I guess you'd have to replace jsonify
with the JSONEncoder abstraction I mentioned elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for the schema, I suggest that you store use dicts rather than lists, except for the rows themselves of course which I'm assuming is a list of stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip about the json encoder! but just to be sure, we're sticking with the current api and adding a test to make sure our dict reports are json serializable right?
Regarding the schema, I am using a list instead of
"prediction_explanation": {
"class_name": self.class_names[1],
"rows": _make_rows(...)
})
because I want to keep the output format the same between regression, binary, and multiclass. For multiclass, we need to explain multiple predictions, so I'm using a list of one element for regression and binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think so. I wrote this when I thought we'd want to return a JSON string, but it makes more sense to return a dict, and then convert to JSON string elsewhere.
("text", ProblemTypes.MULTICLASS): _TextMultiClassSHAPTable(class_names), | ||
("dict", ProblemTypes.REGRESSION): _DictRegressionSHAPTable(), | ||
("dict", ProblemTypes.BINARY): _DictBinarySHAPTable(class_names), | ||
("dict", ProblemTypes.MULTICLASS): _DictMultiClassSHAPTable(class_names)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is quick to run. But if not you could wrap each of the values here in a future to avoid evaluating on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you know what, I think that comment counts as premature optimization 😂 can ignore
def __call__(self, rank): | ||
prefix = self.prefixes[(rank // self.n_indices)] | ||
rank = rank % self.n_indices | ||
return {"prefix": prefix, "index": rank + 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I don't know how all of these classes fit together. But I do wonder if they should follow a different pattern:
__init__
: takes all parameters needed to use the class. Here I think that would include rank?__str__
: convert to stringto_json
: convert to JSON
Then your code which renders all this can call str(..inst)
on the instances, and your code which computes JSON can call inst.to_json()
on each.
If this is a bad idea for some reason I'm not seeing, or requires significant refactoring, no worries.
table = table.splitlines() | ||
# Indent the rows of the table to match the indentation of the entire report. | ||
return ["\t\t" + line + "\n" for line in table] + ["\n\n"] | ||
|
||
|
||
class _DictSHAPTable(_SectionMaker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE my previous comment, I don't understand why you need separate "Text" vs "Dict" objects. Why not define __str__
and to_json
methods on each object, add the relevant code for each, and call it a day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question and something I thought about when I was sketching out the implementation!
I thought the main benefit of having separate classes for each output format is that we can ensure all the _SectionMaker
s and _ReportMakers
have the same api. We can delegate the logic for which class to create for which output to the _report_creator_factory
and this way explain_predictions
and explain_predictions_best_worst
look the same regardless of the output format:
report_maker = _report_creator_factory(...)
return report_maker.make_report(...)
If we handle different output formats as different methods in the same class, I think the code would look something like this:
class _UniversalReportMaker:
"""Make a prediction explanation report that is formatted as text."""
def __init__(self, heading_maker, predicted_values_maker, table_maker):
self.heading_maker = heading_maker
self.make_predicted_values_maker = predicted_values_maker
self.table_maker = table_maker
def make_text_report(self, data):
"""Make a report and format as text."""
def make_dict_report(self, data):
"""Make a report and format as a dict. """
and explain_predictions
and explain_predictions_best_worst
would look like this:
report = _report_maker_factory(..)
if output_format == "text":
report.make_text_report()
else output_format == "dict":
report.make_dict_report()
I guess the main difference is that my implementation minimizes the number of places where we need to check the output format (only in _report_creator_factory
) at the expense of more classes while your suggestion minimizes the number of classes at the expense of having to check the output format in more places.
I thought only having to check the output format in one place is better than having to add a check for the output format wherever we may need to create a report but happy to go either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. So you're treating this text vs dict as a config option which is provided to the objects which do the assembly. That makes sense.
But with that strategy, somewhere you'll have to have code which joins the text together, vs for dicts it adds the results to another dict or something. If you're gonna add that code, how much more painful is it to define two separate paths to generate the results as text vs as dict? Because you're already halfway there.
I don't feel strongly about this, just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - we're basically half-way to your suggestion! I think we can get the best of both proposals with the following:
- Consolidate all
_SectionMaker
and_ReportMaker
classes into one class withmake_text
andmake_dict
methods. - Modify the
_report_maker_factory
function to return the right method for the output format instead of the right class for the output format. Example:
return _UniversalReportMaker(...).make_text
instead of
return _UniversalReportMaker(...)
- Change
explain_predictions_best_worst
andexplain_predictions
like so:
report_maker = _report_maker_factory(...)
return report_maker(data)
instead of
report_maker = _report_maker_factory(...)
return report_maker.make_report(data)
This way we'll condense the number of classes and only check the output format in one place!
data.y_true, data.errors) | ||
section["explanation"] = self.table_maker(index, data.pipeline, data.input_features)["explanation"] | ||
report.append(section) | ||
return {"explanations": report} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here for the others. Why define a separate class DictReportMaker
for this? Why not have one class which can convert to str or to json?
An advantage of using to_json
methods is it makes it hard to miss what the JSON schema is and how to change it. That'd make it easy for us to update in the future.
""" | ||
return _make_single_prediction_shap_table(pipeline, input_features, top_k, training_data, include_shap_values) | ||
if output_format not in {"text", "dict"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do "str" and "json" instead? Ah, I haven't read to the end of your PR yet--I'm betting "dict" makes sense since you compute the dict first and then do the conversion to JSON in one line... will circle back if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, nope, I don't see that. But I think that could be a nice alternative.
Previously I had suggested adding "to_json" methods to your lower-level classes. Now I'm thinking those should be "to_dict", and then you can have one place where you do something like
class PandasNumpyEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, numpy.ndarray):
return obj.tolist()
# i'm not doing this right, i think, but you get the idea
if isinstance(obj, np.int):
return int(obj)
if isinstance(obj, np.int):
return float(obj)
...
return JSONEncoder.default(self, obj)
report_dict = inst.to_dict()
encodedNumpyData = json.dumps(numpyData, cls=PandasNumpyEncoder)
return encodedNumpyData
I guess it depends where we need to do this final conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry for the verbosity here... I like the API you have in place, where you can get either string or dict output.
Do we have a unit test which writes this stuff to JSON, reads it back in from string to dict and verifies the contents match what's expected? We should add that. Why: to ensure that all the pandas/numpy types are handled, and so we have some unit tests which write out the full expected JSON schema (in one line), which will make this easier to understand and maintain.
We should also add example usage to the prediction explanation user guide. Perhaps your addition of that will clarify my confusion here about how this would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion for the test to write stuff to json and read it back! And I'll add an example to the user guide as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, the output looks great! I agree with @dsherry that there were a few places I got confused between the string and dict classes and do wonder if a to_json
method would clear up some of that confusion. Otherwise, this is some great stuff. I imagine we'll want something similar for pipeline serialization soon :D
@@ -19,6 +20,7 @@ def _make_rows(shap_values, normalized_values, pipeline_features, top_k, include | |||
normalized_values (dict): Normalized SHAP values. Same structure as shap_values parameter. | |||
top_k (int): How many of the highest/lowest features to include in the table. | |||
include_shap_values (bool): Whether to include the SHAP values in their own column. | |||
convert_numeric_to_string (bool): Whether numeric values should be converted to strings for numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be converted to strings for numeric...? :o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
…utput-for-prediction-explanations
…et dict output for explain predictions.
@dsherry I've made the following changes:
Let me know if you want to see further changes! |
@@ -100,6 +101,9 @@ def calculate_shap_for_test(training_data, y, pipeline_class, n_points_to_explai | |||
product(interpretable_estimators, all_problems, all_n_points_to_explain)) | |||
def test_shap(estimator, problem_type, n_points_to_explain, X_y_binary, X_y_multi, X_y_regression): | |||
|
|||
if estimator == LightGBMClassifier: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add this check because I couldn't fit a lightgbm classifier with latest changes pushed to master (how lightgbm handles categoricals). This check is only temporary since lightgbm classifier will be cleaned up soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I think we can get rid of this check. I ran into this locally because I had an older version of pandas. Noted in #1114
"import json\n", | ||
"report = explain_predictions_best_worst(pipeline=pipeline, input_features=X, y_true=y,\n", | ||
" num_to_explain=1, include_shap_values=True, output_format=\"dict\")\n", | ||
"print(json.dumps(report, indent=2))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think printing out the report adds too much length to the page but let me know if you want me to not print the dictionary report.
Pull Request Description
Fixes #1036 by adding a
output_format
parameter toexplain_prediction
,explain_predictions
, andexplain_predictions_best_worst
that can take valuestext
ordict
.Also refactors the prediction explanations module a bit:
Text
orDict
to the class names of the section/report creators_report_creator_factory
function to instantiate the right report creator class given the output format and problem data._make_table
to_make_text_table
Docs changes here
I'm providing sample output for one row of the entire report for the
explain_predictions_best_worst
function. Theexplain_predictions
output would look the same except that therank
andpredicted_values
fields would be empty.Example Regression Output (sample of entire report)
Example Binary Output (sample of entire report)
Example Multiclass Output (sample of entire report)
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.