Skip to content

Added support for Individual Conditional Expectation plots to graph_partial_dependencies()#2386

Merged
christopherbunn merged 13 commits intomainfrom
2025_ice_plots
Jun 24, 2021
Merged

Added support for Individual Conditional Expectation plots to graph_partial_dependencies()#2386
christopherbunn merged 13 commits intomainfrom
2025_ice_plots

Conversation

@christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Jun 16, 2021

This implementation extends graph_partial_dependencies() by adding support for plotting Individual Conditional Expectations (ICE) behind a Partial Dependency plot. In addition, partial_dependence() has been updated to be able to generate the ICE data (as graph_partial_dependencies() uses this function to get the plot data).

Single Class:
image

Multiclass:
image

Multiclass ICE only:
image

By default, ICE plots are not shown when partial dependency plots are created. Setting kind='individual' or kind='both' will show the ICE plot.

Caveats:

  • Does not work for creating two-way partial dependency plots as there is no clean way to display ICE plots over a Contour plot. partial_dependence() is able to calculate the data behind the two-way partial dependency plot and return it as a list of DataFrames (with each DataFrame representing one sample). For now, a ValueError is thrown when an ICE plot is attempted for a two-way partial dependency.

Once the implementation clears, I can update the documentation example to show an ICE plot.

Resolves #2025

@CLAassistant
Copy link

CLAassistant commented Jun 16, 2021

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #2386 (c7932f9) into main (f5a17c4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2386     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25316   25478    +162     
=======================================
+ Hits       25216   25378    +162     
  Misses       100     100             
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 100.0% <100.0%> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)
...del_understanding_tests/test_partial_dependence.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5a17c4...c7932f9. Read the comment docs.

Copy link
Contributor Author

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some implementation details to consider when reviewing:


def partial_dependence(
pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100
pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100, kind="average"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially make kind="both" by default, but it would throw an error when doing a two-way PD. If we changed the behavior to be a silent change or a warning, I would be okay with doing this as a default

y=part_dep["partial_dependence"],
name="Partial Dependence",
line=dict(width=3),
line=dict(width=3, color="rgb(99,110,250)"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RGB number corresponds to the default blue that I extracted from the default MacOS color picker. I don't think there's like an alias to this exact color in plotly, but I could be missing it from the docs

with pytest.warns(None) as graph_valid:
graph_partial_dependence(clf, X, features=0, grid_resolution=20)
assert len(graph_valid) == 1
assert len(graph_valid) == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that graph_partial_dependence was set to 1 here while the other graphing functions were 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the grid_resolution back to 5. That was part of @freddyaboulton 's unit test speed up MR! But sorry, don't know the answer to that question.

_go.Scatter(
x=x,
y=ice_data[sample],
line=dict(width=0.5, color="gray"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some leeway to adjust this but from tweaking it a bit this is the best color and thickness combo for the ICE plot lines.

Comment on lines +879 to +878
if kind == "individual" or kind == "both":
raise ValueError(
"Individual conditional expectation plot can only be created with a one-way partial dependence plot"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I'm leaving out the ICE plot when plotting a two-way PD. I think the current behavior of raising an error is the best move, although I've considered putting in a warning message in the log. Open to suggestions for this behavior 🙂

Copy link
Contributor Author

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more testing specific notes:

Comment on lines +1134 to +1136
data = np.concatenate([fig_dict["data"][i]["y"][:window_length],
fig_dict["data"][i + len(X) + 1]["y"][:window_length],
fig_dict["data"][i + 2 * len(X) + 2]["y"][:window_length]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular test, I ended up doing some weird matrix math to wrangle the plotly data. For each sample, this snippet essentially combines the ICE data for each class to one array and compares it to the data for that particular sample in the partial_dependence() output data. The +1 and +2 accounts for the additional PD trace that is in each plot.

Comment on lines +1166 to +1168
data = np.concatenate([fig_dict["data"][i]["y"][:window_length],
fig_dict["data"][i + len(X)]["y"][:window_length],
fig_dict["data"][i + 2 * len(X)]["y"][:window_length]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other comment, but the +1 and +2 aren't here since we're only plotting the ICE data.

assert len(data["x"]) == 5
assert len(data["y"]) == 5
assert data["name"] == label
assert data["name"] == "Partial Dependence: " + label
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up changing the PD trace name to mention that it is the partial dependence line to differentiate it from the ICE plots.

@christopherbunn christopherbunn force-pushed the 2025_ice_plots branch 2 times, most recently from cc15700 to a662a6d Compare June 18, 2021 17:49
@christopherbunn christopherbunn marked this pull request as ready for review June 18, 2021 17:49
@christopherbunn christopherbunn added enhancement An improvement to an existing feature. and removed enhancement An improvement to an existing feature. labels Jun 18, 2021
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, great work. I need to do some more due dilligence as I can't run this at the Pentagon and visualize it, but the code and tests look good. I would just say let's stick with the lower grid resolutions, if possible, to try and reduce partial dependence run time. If it can all be tied to one variable, that would be better but I can do that in another PR. Also, I think shying away from the "if not" logic would be a bit more readable. Really great work though for your first week back!

with pytest.warns(None) as graph_valid:
graph_partial_dependence(clf, X, features=0, grid_resolution=20)
assert len(graph_valid) == 1
assert len(graph_valid) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the grid_resolution back to 5. That was part of @freddyaboulton 's unit test speed up MR! But sorry, don't know the answer to that question.

@christopherbunn christopherbunn force-pushed the 2025_ice_plots branch 2 times, most recently from 2c1b70a to 490af18 Compare June 22, 2021 01:16
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, friendo!

@christopherbunn christopherbunn merged commit 1f285fc into main Jun 24, 2021
@christopherbunn christopherbunn deleted the 2025_ice_plots branch June 24, 2021 14:13
@dsherry dsherry mentioned this pull request Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding Individual Conditional Expectation (ICE) Plots to model_understanding

3 participants