Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach
This repository contains the code, files and data for the JMLR submission "Model-agnostic Feature Importance and Effects with Dependent Features--A Conditional Subgroup Approach"
Partial dependence plots (PDP) and permutation feature importance (PFI) are popular interpretation methods of machine learning models. When features are dependent, both methods extrapolate to feature areas with low data density, which can cause misleading interpretations. To reduce extrapolation, conditional variants of PDP and PFI have been suggested, which sample from the conditional instead of the marginal distribution. We propose the cs-PDP and the cs-PFI which are conditional variants based on perturbations in subgroups that are constructed using decision trees. Using a novel data fidelity measure, we show that perturbation in subgroups preserves the data joint distribution better than the state-of-the-art. In a simulation, we show that cs-PFI recovers the ground-truth conditional PFI. The subgroups are described with decision rules which make the conditioning interpretable. They also provide nuanced interpretations of the feature dependence structure and allow the computation of additional feature effects and importance values within the subgroups. We demonstrate these more nuanced and richer explanations in an application.
All experiments are implemented with the R language and the paper is written with LaTeX.
Assuming you have R installed on your system, you can install the package dependencies with R via:
To reproduce the paper, go to the folder
paper/ and run follwing command in the shell:
cd paper make paper
./: Contains this README and the DESCRIPTION file that specify the R package dependencies
./data: Stores data used in experiments and application.
./experiments: Contains the scripts to produce the figures and results
./paper: Contains the tex files for the paper and the Makefile
./R: Contains custom R functions used in the experiments
./results: Stores intermediate results
© 2020 Christoph Molnar
The code of this repository is distributed under the MIT license. See below for details:
The MIT License (MIT) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The content of the paper is distributed under a Creative Commons license CC BY 4.0.