Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial dependence plots #721

Merged
merged 119 commits into from Sep 9, 2022
Merged

Partial dependence plots #721

merged 119 commits into from Sep 9, 2022

Conversation

RobertSamoilescu
Copy link
Collaborator

@RobertSamoilescu RobertSamoilescu commented Jul 21, 2022

Implementation of the partial dependence (PD) and individual conditional expectation (ICE) leveragingsklearn implementation.
Some functionalities that it includes

  • PD and ICE for numerical features
  • PD and ICE for categorical features
  • PD and ICE for combinations of numerical and/or categorical features
  • Plots for all the above cases
  • Custom grids
  • Usage of any black-box model (i.e. not only restricted to sklearn estimators)

TODOs:

  • Method description notebook
  • Example usage notebook

Comment on lines +255 to +257
logger.warning('The length of `target_names` does not match the number of predicted outputs. '
'Ensure that the lengths match, otherwise a call to the `plot_pd` method might '
'raise an error or produce undesired labeling.')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a possibility to check whether the number of targets that we computed the PD/ICE for matches the length of target_names without a dummy call. Altough the user should be aware of that I decided to include a warning message in case that happens. Shall we go even further and raise an error?

@jklaise
Copy link
Member

jklaise commented Sep 8, 2022

Following an offline discussion it was decided to simplify the implementation and user interface by splitting the implementation into two distinct classes:

  • PartialDependence for use with black-box models calculating PD using a brute-force approach
  • TreePartialDependence for use with white-box models (currently only a small selection of sklearn estimators) that support a recursive algorithm for calculating PD which is faster than the brute-force approach

This allows us to remove all of the slightly confusing arguments discussed previously.

Also, @RobertSamoilescu checked that the recursive algorithm returns slightly different values than performing the brute-force PD on the same estimators which further justifies splitting the implementation into two public classes (similar to KernelShap and TreeShap.

@RobertSamoilescu
Copy link
Collaborator Author

@jklaise, check the note at the end of this section which confirms that the two methods differ in the values they return

Copy link
Member

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, looks great!

@jklaise jklaise merged commit 11c3dd4 into SeldonIO:master Sep 9, 2022
RobertSamoilescu added a commit to RobertSamoilescu/alibi that referenced this pull request Sep 9, 2022
* Partial API & docs

* included sanity checks, grid construction for both numerical and categorical features.

* Finalized explain and build_explanation. Included one feature numerical plots.

* included kwargs for all kinds of plots

* Implemented plotting functionality for every feature types and combinations.

* Fixed share y axis

* Minor plot fixes.

* Minor refactoring. Docstring for PartialDependece class

* Docstring for plotting function. Minor grammar corrections.

* Included test for sanity check. To be cleaned and optimized.

* Partial cleaning of the tests.

* Finalized params sanity checks tests.

* Included test for number of features.

* Test explanation shapes for numerical features.

* Included black-box wrapper for classification and regression & corresponding tests

* Included a PD simple example.

* Included Adult example. Minor one way numerical plot fix for labels.

* Solved flak8 warnings

* Solved mypy errors.

* Improved docs for plots.

* Some comments and TODOs.

* included list of TODOs

* Consider by default all single features to compute the pd for.

* Included custom grid_points.

* Updated categorical graph from barplot to lineplot.

* Included two target outputs for binary classification when response_method falls to predict_proba.

* Solved flake8 warnings.

* Solved mypy errors.

* isort and mypy

* Fixed fitted check for older versions of sklearn.

* Cleaned the docs for partial_dependence.py

* Tuple in numpy/list inconsistency

* Reintroducing seaborn because of the heatmap

* Introduced contour plots levels args.

* minor changes to the docs

* Started pdp bike dataset example

* Included argument to display custom number of ice curves.

* Unfinished example PD bike dataset

* Finalized partial dependece bike example.

* Draft method description. Some variable renaming.

* Refactoring - explanation object.

* Updated docs entries.

* Readme indentation

* setup.py minor correction

* Minor documentation corrections

* Removed initial pd exampled notebook

* Changed links to Introduction section

* Revert "Changed links to Introduction section"

This reverts commit 67dce6f.

* Fix problem with equation in method docs

* Included test for pd computation against the sklearn implementation for numerical and adapted categorical features.

* Addressed comments - part 1

* Addressed comments - part 2

* Removed seed.

* Included progress bar while explaining features

* A few changes to the pdp example notebook.

* Removed ipynb for examples.

* Addressed notebook comments.

* Addressed comments regarding the plots.

* Removed meta from data field.

* Literal for some arguments

* Allow 2 way PDP for kind both. Improved method description. Cleaned example.

* Minor cleaning.

* Updated docs in the method description.

* Corrections to the example text.

* Minor punctuation correction in the method description.

* Removed features_list from method notebook

* Update link: latest with stable

* Changed kernel env back to Python.

* Minor docstrings correction.

* Fixed spelling error in example notebook.

* Replaced centered with center. Improved docs for center flag.

* Removed seaborn from dependencies. Implemented on matplotlib heatmap.

* Included clarification for the response_method.

* Replaced sns heatmap plot with the matplotlib heatmap in method description.

* Removed wget.

* Fixed links. Moved model sanity check to constructor. Removed panadas installation. Updated tests.

* Integrate sklearn pd functions. In progress...

* Removed sklearn private methods

* Refactored blackbox case.

* Updated method page.

* Minor clarifications.

* Solved mypy issues.

* Addressed minor comments.

* Minor correction for error messages and comments conventions.

* Fixed test from previous commit.

* isort on utils/visualizations.py

* Fixed deciles computation.

* Fixed  for binary classification.

* Add process pd and ice for plotting.

* Included warning for the  length.

* Removed auto options

* Removed auto. Fixed tests and docs.

* Minor corrections.

* Revert to deciles on full dataset

* Changed a few error messages. Included one test for unknown kind.

* Improved error messages and included a warning for target_names

* Removed blank lines.

* Fixed tests.

* Split inital implementation into PartialDependence and TreePartialDependence.

* Fixed tests.

* Isort and removed unnecessary fixture from conftest.py

* Updated PD example.

* Updated method description.

* Removed uncessary classes and imports.

* Minor doc updates. Minor sanity checks refactoring.

* Improved docs for TreePartialDependence.

* Included ABC for the base class and removed for the derived one.

* Fixed deciles display

* Improved plots by adding pd limits

* Fixed min-max pd plots for share_y.

* Improved plots: zoom in for num and cat plots, add decile ticks for num-cat, sharey num-cats, updated images.

* Solved minor display bug in num-cat for deciles.

* Updated linear regression plots.

* Minor docstrings correction: pairs of features -> tuples of features.

* Removed unnecessary argument in _compute_pd_limits and updated return docs for helper plotting functions.

Co-authored-by: Ashley Scillitoe <ashley.scillitoe@seldon.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants