Added support for Individual Conditional Expectation plots to graph_partial_dependencies()#2386
Added support for Individual Conditional Expectation plots to graph_partial_dependencies()#2386christopherbunn merged 13 commits intomainfrom
graph_partial_dependencies()#2386Conversation
62f693c to
8632015
Compare
Codecov Report
@@ Coverage Diff @@
## main #2386 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 283 283
Lines 25316 25478 +162
=======================================
+ Hits 25216 25378 +162
Misses 100 100
Continue to review full report at Codecov.
|
christopherbunn
left a comment
There was a problem hiding this comment.
Some implementation details to consider when reviewing:
|
|
||
| def partial_dependence( | ||
| pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100 | ||
| pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100, kind="average" |
There was a problem hiding this comment.
We could potentially make kind="both" by default, but it would throw an error when doing a two-way PD. If we changed the behavior to be a silent change or a warning, I would be okay with doing this as a default
| y=part_dep["partial_dependence"], | ||
| name="Partial Dependence", | ||
| line=dict(width=3), | ||
| line=dict(width=3, color="rgb(99,110,250)"), |
There was a problem hiding this comment.
The RGB number corresponds to the default blue that I extracted from the default MacOS color picker. I don't think there's like an alias to this exact color in plotly, but I could be missing it from the docs
| with pytest.warns(None) as graph_valid: | ||
| graph_partial_dependence(clf, X, features=0, grid_resolution=20) | ||
| assert len(graph_valid) == 1 | ||
| assert len(graph_valid) == 0 |
There was a problem hiding this comment.
Is there a particular reason that graph_partial_dependence was set to 1 here while the other graphing functions were 0?
There was a problem hiding this comment.
I think we should move the grid_resolution back to 5. That was part of @freddyaboulton 's unit test speed up MR! But sorry, don't know the answer to that question.
| _go.Scatter( | ||
| x=x, | ||
| y=ice_data[sample], | ||
| line=dict(width=0.5, color="gray"), |
There was a problem hiding this comment.
There's some leeway to adjust this but from tweaking it a bit this is the best color and thickness combo for the ICE plot lines.
| if kind == "individual" or kind == "both": | ||
| raise ValueError( | ||
| "Individual conditional expectation plot can only be created with a one-way partial dependence plot" | ||
| ) |
There was a problem hiding this comment.
Currently, I'm leaving out the ICE plot when plotting a two-way PD. I think the current behavior of raising an error is the best move, although I've considered putting in a warning message in the log. Open to suggestions for this behavior 🙂
c959aff to
582fa79
Compare
christopherbunn
left a comment
There was a problem hiding this comment.
Some more testing specific notes:
| data = np.concatenate([fig_dict["data"][i]["y"][:window_length], | ||
| fig_dict["data"][i + len(X) + 1]["y"][:window_length], | ||
| fig_dict["data"][i + 2 * len(X) + 2]["y"][:window_length]]) |
There was a problem hiding this comment.
For this particular test, I ended up doing some weird matrix math to wrangle the plotly data. For each sample, this snippet essentially combines the ICE data for each class to one array and compares it to the data for that particular sample in the partial_dependence() output data. The +1 and +2 accounts for the additional PD trace that is in each plot.
| data = np.concatenate([fig_dict["data"][i]["y"][:window_length], | ||
| fig_dict["data"][i + len(X)]["y"][:window_length], | ||
| fig_dict["data"][i + 2 * len(X)]["y"][:window_length]]) |
There was a problem hiding this comment.
Same as the other comment, but the +1 and +2 aren't here since we're only plotting the ICE data.
| assert len(data["x"]) == 5 | ||
| assert len(data["y"]) == 5 | ||
| assert data["name"] == label | ||
| assert data["name"] == "Partial Dependence: " + label |
There was a problem hiding this comment.
Ended up changing the PD trace name to mention that it is the partial dependence line to differentiate it from the ICE plots.
cc15700 to
a662a6d
Compare
chukarsten
left a comment
There was a problem hiding this comment.
Awesome, great work. I need to do some more due dilligence as I can't run this at the Pentagon and visualize it, but the code and tests look good. I would just say let's stick with the lower grid resolutions, if possible, to try and reduce partial dependence run time. If it can all be tied to one variable, that would be better but I can do that in another PR. Also, I think shying away from the "if not" logic would be a bit more readable. Really great work though for your first week back!
| with pytest.warns(None) as graph_valid: | ||
| graph_partial_dependence(clf, X, features=0, grid_resolution=20) | ||
| assert len(graph_valid) == 1 | ||
| assert len(graph_valid) == 0 |
There was a problem hiding this comment.
I think we should move the grid_resolution back to 5. That was part of @freddyaboulton 's unit test speed up MR! But sorry, don't know the answer to that question.
evalml/tests/model_understanding_tests/test_partial_dependence.py
Outdated
Show resolved
Hide resolved
evalml/tests/model_understanding_tests/test_partial_dependence.py
Outdated
Show resolved
Hide resolved
2c1b70a to
490af18
Compare
chukarsten
left a comment
There was a problem hiding this comment.
Thanks for the changes, friendo!
490af18 to
cd1351b
Compare
This implementation extends
graph_partial_dependencies()by adding support for plotting Individual Conditional Expectations (ICE) behind a Partial Dependency plot. In addition,partial_dependence()has been updated to be able to generate the ICE data (asgraph_partial_dependencies()uses this function to get the plot data).Single Class:

Multiclass:

Multiclass ICE only:

By default, ICE plots are not shown when partial dependency plots are created. Setting
kind='individual'orkind='both'will show the ICE plot.Caveats:
partial_dependence()is able to calculate the data behind the two-way partial dependency plot and return it as a list of DataFrames (with each DataFrame representing one sample). For now, aValueErroris thrown when an ICE plot is attempted for a two-way partial dependency.Once the implementation clears, I can update the documentation example to show an ICE plot.
Resolves #2025