Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dictionary Output Format for Explain Predictions #1107

Merged
merged 28 commits into from
Aug 28, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Aug 26, 2020

Pull Request Description

Fixes #1036 by adding a output_format parameter to explain_prediction, explain_predictions, and
explain_predictions_best_worst that can take values text or dict.

Also refactors the prediction explanations module a bit:

  • Adds Text or Dict to the class names of the section/report creators
  • Introduces a _report_creator_factory function to instantiate the right report creator class given the output format and problem data.
  • Renames _make_table to _make_text_table

Docs changes here

I'm providing sample output for one row of the entire report for the explain_predictions_best_worst function. The explain_predictions output would look the same except that the rank and predicted_values fields would be empty.

Example Regression Output (sample of entire report)

   {
      "rank": {
        "prefix": "best",
        "index": 1
      },
      "predicted_values": {
        "probabilities": null,
        "predicted_value": 14.519,
        "target_value": 14.5,
        "error_name": "Absolute Difference",
        "error_value": 0.019
      },
      "explanation": [
        {
          "feature_names": [
            "NOX",
            "ZN",
            "AGE",
            "PTRATIO",
            "TAX",
            "LSTAT"
          ],
          "feature_values": [
            0.584,
            0.0,
            86.1,
            20.2,
            666.0,
            17.64
          ],
          "qualitative_explanation": [
            "+",
            "-",
            "-",
            "-",
            "-",
            "---"
          ],
          "quantitative_explanation": [
            0.7127328882949426,
            -0.01669613357334446,
            -0.07604936424119804,
            -0.9936399945421127,
            -1.1764128703290715,
            -3.8682602903151064
          ],
          "class_name": null
        }

Example Binary Output (sample of entire report)

   {
      "rank": {
        "prefix": "best",
        "index": 1
      },
      "predicted_values": {
        "probabilities": {
          "benign": 0.0,
          "malignant": 1.0
        },
        "predicted_value": "malignant",
        "target_value": "malignant",
        "error_name": "Cross Entropy",
        "error_value": 0.00031254883829995967
      },
      "explanation": [
        {
          "feature_names": [
            "worst concave points",
            "worst perimeter",
            "worst radius",
            "smoothness error",
            "mean fractal dimension",
            "symmetry error"
          ],
          "feature_values": [
            0.265,
            184.6,
            25.74,
            0.006522,
            0.07016,
            0.02324
          ],
          "qualitative_explanation": [
            "+",
            "+",
            "+",
            "+",
            "+",
            "-"
          ],
          "quantitative_explanation": [
            0.08693785207503062,
            0.059386962447437616,
            0.0579943788906821,
            0.0003669556779688147,
            0.000273189782787694,
            -0.0004114281592120522
          ],
          "class_name": "malignant"
        }

Example Multiclass Output (sample of entire report)

   {
      "rank": {
        "prefix": "worst",
        "index": 5
      },
      "predicted_values": {
        "probabilities": {
          "setosa": 0.0,
          "versicolor": 0.466,
          "virginica": 0.534
        },
        "predicted_value": "virginica",
        "target_value": "virginica",
        "error_name": "Cross Entropy",
        "error_value": 0.6278103406544334
      },
      "explanation": [
        {
          "feature_names": [
            "sepal length (cm)",
            "sepal width (cm)",
            "petal width (cm)",
            "petal length (cm)"
          ],
          "feature_values": [
            6.1,
            2.6,
            1.4,
            5.6
          ],
          "qualitative_explanation": [
            "-",
            "-",
            "--",
            "---"
          ],
          "quantitative_explanation": [
            -0.020997624703161553,
            -0.028462009883359413,
            -0.12072158775665602,
            -0.15481877765682317
          ],
          "class_name": "setosa"
        },
        {
          "feature_names": [
            "petal width (cm)",
            "sepal width (cm)",
            "sepal length (cm)",
            "petal length (cm)"
          ],
          "feature_values": [
            1.4,
            2.6,
            6.1,
            5.6
          ],
          "qualitative_explanation": [
            "+++",
            "+",
            "+",
            "--"
          ],
          "quantitative_explanation": [
            0.2898933751710956,
            0.03141550010385456,
            0.0009566071268076427,
            -0.16435808907332083
          ],
          "class_name": "versicolor"
        },
        {
          "feature_names": [
            "petal length (cm)",
            "sepal length (cm)",
            "sepal width (cm)",
            "petal width (cm)"
          ],
          "feature_values": [
            5.6,
            6.1,
            2.6,
            1.4
          ],
          "qualitative_explanation": [
            "++++",
            "+",
            "-",
            "--"
          ],
          "quantitative_explanation": [
            0.3191768667301441,
            0.020041017576353896,
            -0.0029534902204951447,
            -0.1691717874144393
          ],
          "class_name": "virginica"
        }

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Aug 26, 2020

Codecov Report

Merging #1107 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #1107    +/-   ##
========================================
  Coverage   99.91%   99.91%            
========================================
  Files         194      195     +1     
  Lines       10959    11126   +167     
========================================
+ Hits        10950    11117   +167     
  Misses          9        9            
Impacted Files Coverage Δ
...prediction_explanations/_report_creator_factory.py 100.00% <100.00%> (ø)
...tanding/prediction_explanations/_user_interface.py 100.00% <100.00%> (ø)
...nderstanding/prediction_explanations/explainers.py 100.00% <100.00%> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.00% <100.00%> (ø)
...ediction_explanations_tests/test_user_interface.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f85f95...763672c. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review August 26, 2020 18:56
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton looks good!

Some comments:

  • Please update the prediction explanations user guide with an example of how to use this to generate JSON.
  • For the JSON schema, perhaps you should call "explanation" "explanations", since the value there is a list of explanations, right?
  • I left some thoughts on how to reorganize the code so that you don't need to add a separate class for dict vs text. I think I got a bit confused and was suggesting to_json methods--I believe to_dict would be more appropriate, given that our top-level methods are now returning dict, not a JSON string.
  • I left a comment about adding a unit test which runs the method with dict format, writes the dict to JSON, reads it back to dict format and expects it equals the expected value. I think having this will serve as a helpful ref to devs on what the schema format is in full, make it easy to change.
  • There's a lot of code getting changed. I do wonder if there's a way to delete some code in the implementation(s) here.
  • Food for thought: backwards compatibility.
    • Should we include a "schema version" tag somewhere, which we'd increment every time we make a breaking change? I actually don't think we should do this now, but we may consider it
    • In general, it would be nice if we ended up with a design pattern for serialization/deserialization where deserializing (JSON to dict) always results in conversion to the new format/version, regardless of how old the old version was. The way to do it is to add ifs to deserialize ordered by legacy version from oldest to newest, and then every time you make a schema change, you add another if to get from the previous version to the new version
    • I believe we're eventually going to need a pattern like this for our components and pipelines.

value = float(value)
elif pd.api.types.is_bool(value):
value = bool(value)
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton could you use the JSONEncoder abstraction instead of using this? I've found it can be helpful. You basically just need to write something like this, where if there are any np types in the input, you convert them to list or whatever. The advantage would be that we're handling not only booleans/ints/floats but anything else which came up

https://pynative.com/python-serialize-numpy-ndarray-into-json/

Copy link
Contributor

@gsheni gsheni Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you could do something like this:

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        elif isinstance(obj, np.bool_):
            return bool(obj)
        elif isinstance(obj, (datetime, date)):
            return obj.isoformat()
        elif isinstance(obj, pd.Interval):
            return str(obj)
        elif issubclass(obj, PipelineBase):
            return obj.name
        elif hasattr(obj, "name") and callable(obj.name):
            return obj.name
        else:
            return super(CustomEncoder, self).default(obj)
import simplejson as json
with open(output_filepath, "w") as fp:
    json.dumps(d, allow_nan=True, ignore_nan=True, cls=CustomEncoder)
  • This can be extended as more custom Classes are added (see PipelineBase).
  • In addition, I would recommend using simplejson instead of json. The default JSON library will not convert NaN to null (NaN is not JSON compliant).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I think this is out of scope for this PR though because we want to return a dictionary rather than a json string to the user.

convert_numeric_to_string=False)
json_rows = _rows_to_dict(rows)
json_rows["class_name"] = _make_json_serializable(self.class_names[1])
return {"explanation": [json_rows]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton why not do

return jsonify({
    "prediction_explanation": {
        "class_name": self.class_names[1],
        "rows": _make_rows(...)
    })
}

Or I guess you'd have to replace jsonify with the JSONEncoder abstraction I mentioned elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the schema, I suggest that you store use dicts rather than lists, except for the rows themselves of course which I'm assuming is a list of stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip about the json encoder! but just to be sure, we're sticking with the current api and adding a test to make sure our dict reports are json serializable right?

Regarding the schema, I am using a list instead of

"prediction_explanation": {
        "class_name": self.class_names[1],
        "rows": _make_rows(...)
    })

because I want to keep the output format the same between regression, binary, and multiclass. For multiclass, we need to explain multiple predictions, so I'm using a list of one element for regression and binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so. I wrote this when I thought we'd want to return a JSON string, but it makes more sense to return a dict, and then convert to JSON string elsewhere.

("text", ProblemTypes.MULTICLASS): _TextMultiClassSHAPTable(class_names),
("dict", ProblemTypes.REGRESSION): _DictRegressionSHAPTable(),
("dict", ProblemTypes.BINARY): _DictBinarySHAPTable(class_names),
("dict", ProblemTypes.MULTICLASS): _DictMultiClassSHAPTable(class_names)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is quick to run. But if not you could wrap each of the values here in a future to avoid evaluating on this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you know what, I think that comment counts as premature optimization 😂 can ignore

def __call__(self, rank):
prefix = self.prefixes[(rank // self.n_indices)]
rank = rank % self.n_indices
return {"prefix": prefix, "index": rank + 1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I don't know how all of these classes fit together. But I do wonder if they should follow a different pattern:

  • __init__: takes all parameters needed to use the class. Here I think that would include rank?
  • __str__: convert to string
  • to_json: convert to JSON

Then your code which renders all this can call str(..inst) on the instances, and your code which computes JSON can call inst.to_json() on each.

If this is a bad idea for some reason I'm not seeing, or requires significant refactoring, no worries.

table = table.splitlines()
# Indent the rows of the table to match the indentation of the entire report.
return ["\t\t" + line + "\n" for line in table] + ["\n\n"]


class _DictSHAPTable(_SectionMaker):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE my previous comment, I don't understand why you need separate "Text" vs "Dict" objects. Why not define __str__ and to_json methods on each object, add the relevant code for each, and call it a day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question and something I thought about when I was sketching out the implementation!

I thought the main benefit of having separate classes for each output format is that we can ensure all the _SectionMakers and _ReportMakers have the same api. We can delegate the logic for which class to create for which output to the _report_creator_factory and this way explain_predictions and explain_predictions_best_worst look the same regardless of the output format:

report_maker = _report_creator_factory(...)
return report_maker.make_report(...)

If we handle different output formats as different methods in the same class, I think the code would look something like this:

class _UniversalReportMaker:
    """Make a prediction explanation report that is formatted as text."""

    def __init__(self, heading_maker, predicted_values_maker, table_maker):
        self.heading_maker = heading_maker
        self.make_predicted_values_maker = predicted_values_maker
        self.table_maker = table_maker

    def make_text_report(self, data):
        """Make a report and format as text.""" 

    def make_dict_report(self, data):
        """Make a report and format as a dict. """

and explain_predictions and explain_predictions_best_worst would look like this:

report = _report_maker_factory(..)
if output_format == "text":
    report.make_text_report()
else output_format == "dict":
    report.make_dict_report()

I guess the main difference is that my implementation minimizes the number of places where we need to check the output format (only in _report_creator_factory) at the expense of more classes while your suggestion minimizes the number of classes at the expense of having to check the output format in more places.

I thought only having to check the output format in one place is better than having to add a check for the output format wherever we may need to create a report but happy to go either way!

Copy link
Contributor

@dsherry dsherry Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. So you're treating this text vs dict as a config option which is provided to the objects which do the assembly. That makes sense.

But with that strategy, somewhere you'll have to have code which joins the text together, vs for dicts it adds the results to another dict or something. If you're gonna add that code, how much more painful is it to define two separate paths to generate the results as text vs as dict? Because you're already halfway there.

I don't feel strongly about this, just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - we're basically half-way to your suggestion! I think we can get the best of both proposals with the following:

  1. Consolidate all _SectionMaker and _ReportMaker classes into one class with make_text and make_dict methods.
  2. Modify the _report_maker_factory function to return the right method for the output format instead of the right class for the output format. Example:
return _UniversalReportMaker(...).make_text

instead of

return _UniversalReportMaker(...)
  1. Change explain_predictions_best_worst and explain_predictions like so:
report_maker = _report_maker_factory(...)
return report_maker(data)

instead of

report_maker = _report_maker_factory(...)
return report_maker.make_report(data)

This way we'll condense the number of classes and only check the output format in one place!

data.y_true, data.errors)
section["explanation"] = self.table_maker(index, data.pipeline, data.input_features)["explanation"]
report.append(section)
return {"explanations": report}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here for the others. Why define a separate class DictReportMaker for this? Why not have one class which can convert to str or to json?

An advantage of using to_json methods is it makes it hard to miss what the JSON schema is and how to change it. That'd make it easy for us to update in the future.

"""
return _make_single_prediction_shap_table(pipeline, input_features, top_k, training_data, include_shap_values)
if output_format not in {"text", "dict"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do "str" and "json" instead? Ah, I haven't read to the end of your PR yet--I'm betting "dict" makes sense since you compute the dict first and then do the conversion to JSON in one line... will circle back if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, nope, I don't see that. But I think that could be a nice alternative.

Previously I had suggested adding "to_json" methods to your lower-level classes. Now I'm thinking those should be "to_dict", and then you can have one place where you do something like

class PandasNumpyEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray):
            return obj.tolist()
        # i'm not doing this right, i think, but you get the idea
        if isinstance(obj, np.int):
            return int(obj)
        if isinstance(obj, np.int):
            return float(obj)
        ...
        return JSONEncoder.default(self, obj)
report_dict = inst.to_dict()
encodedNumpyData = json.dumps(numpyData, cls=PandasNumpyEncoder)
return encodedNumpyData

I guess it depends where we need to do this final conversion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry for the verbosity here... I like the API you have in place, where you can get either string or dict output.

Do we have a unit test which writes this stuff to JSON, reads it back in from string to dict and verifies the contents match what's expected? We should add that. Why: to ensure that all the pandas/numpy types are handled, and so we have some unit tests which write out the full expected JSON schema (in one line), which will make this easier to understand and maintain.

We should also add example usage to the prediction explanation user guide. Perhaps your addition of that will clarify my confusion here about how this would be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion for the test to write stuff to json and read it back! And I'll add an example to the user guide as well!

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the output looks great! I agree with @dsherry that there were a few places I got confused between the string and dict classes and do wonder if a to_json method would clear up some of that confusion. Otherwise, this is some great stuff. I imagine we'll want something similar for pipeline serialization soon :D

@@ -19,6 +20,7 @@ def _make_rows(shap_values, normalized_values, pipeline_features, top_k, include
normalized_values (dict): Normalized SHAP values. Same structure as shap_values parameter.
top_k (int): How many of the highest/lowest features to include in the table.
include_shap_values (bool): Whether to include the SHAP values in their own column.
convert_numeric_to_string (bool): Whether numeric values should be converted to strings for numeric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be converted to strings for numeric...? :o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

@freddyaboulton
Copy link
Contributor Author

@dsherry I've made the following changes:

  • Added a section to the user guide on how to get dictionary output here
  • Consolidated all the differences between the Text and Dict versions of _SectionMakers and _ReportMakers
  • Renamed explanation to explanations in the dictionary output
  • Added a test to check if we can write our dictionaries to json and read them back.

Let me know if you want to see further changes!

@@ -100,6 +101,9 @@ def calculate_shap_for_test(training_data, y, pipeline_class, n_points_to_explai
product(interpretable_estimators, all_problems, all_n_points_to_explain))
def test_shap(estimator, problem_type, n_points_to_explain, X_y_binary, X_y_multi, X_y_regression):

if estimator == LightGBMClassifier:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this check because I couldn't fit a lightgbm classifier with latest changes pushed to master (how lightgbm handles categoricals). This check is only temporary since lightgbm classifier will be cleaned up soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, I think we can get rid of this check. I ran into this locally because I had an older version of pandas. Noted in #1114

"import json\n",
"report = explain_predictions_best_worst(pipeline=pipeline, input_features=X, y_true=y,\n",
" num_to_explain=1, include_shap_values=True, output_format=\"dict\")\n",
"print(json.dumps(report, indent=2))"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think printing out the report adds too much length to the page but let me know if you want me to not print the dictionary report.

@freddyaboulton freddyaboulton merged commit f16ea6a into main Aug 28, 2020
This was referenced Sep 17, 2020
@freddyaboulton freddyaboulton deleted the 1036-json-output-for-prediction-explanations branch October 22, 2020 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add JSON output for prediction explanations
4 participants