Connecting Interpretability and Robustness in Decision Trees through Separation

This repository contains the code of the experiments in the paper

Authors: Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri

Abstract

Recent research has recognized interpretability and robustness as essential properties of trustworthy classification. Curiously, a connection between robustness and interpretability was empirically observed, but the theoretical reasoning behind it remained elusive. In this paper, we rigorously investigate this connection. Specifically, we focus on interpretation using decision trees and robustness to $l_\infty$-perturbation. Previous works defined the notion of $r$-separation as a sufficient condition for robustness. We prove upper and lower bounds on the tree size in case the data is $r$-separated. We then show that a tighter bound on the size is possible when the data is linearly separated. We provide the first algorithm with provable guarantees both on robustness, interpretability, and accuracy in the context of decision trees. Experiments confirm that our algorithm yields classifiers that are both interpretable and robust and have high accuracy.

Installation

pip install -r requirements.txt

Install LP, QP Solvers

Install gurobi: https://www.cvxpy.org/install/index.html#install-with-gurobi-support
Install GLPK: https://www.gnu.org/software/glpk/
Install CVXOPT with GLPK support:

CVXOPT_BUILD_GLPK=1
CVXOPT_GLPK_LIB_DIR=/path/to/glpk-X.X/lib
CVXOPT_GLPK_INC_DIR=/path/to/glpk-X.X/include
pip install --upgrade cvxopt

submodules

git submodule init
git submodule update

Install LCPA

cd risk-slim
pip install -r requirements.txt
pip install .

For more LCPA installation instructions, please visit https://github.com/ustunb/risk-slim

Install robust decision tree (RobDT)

cd RobustTrees
git submodule update --init --recursive
./build.sh
cd python-package
python setup.py install

For more RobDT installation instructions, please visit https://github.com/chenhongge/RobustTrees

Scripts

Dataset process scripts

Experiment scripts

params.py: listed all parameters run
rsep_explain/datasets/init.py: load datasets
experiments/lin_sep_bbm_rob_3.py: run experiment for BBM-RS
experiments/dt_interpret_rob_3.py: run experiment for DT (Breiman et al., 1984)
experiments/xgboostrobdt_interpret_rob.py: run experiment for RobDT (Chen et al., 2019)
experiments/risk_slim_3.py: run experiment for LCAP (Ustun & Rudin, 2019)
experiments/calc_lin_separation.py: estimating the linear separateness of each dataset
experiments/calc_separation.py: estimating the $r$- separateness of each dataset

Figure/Table generation scripts

notebooks/case_study.ipynb: generate Table 1
notebooks/separation.ipynb: generate Table 2
notebooks/risk_score_3.ipynb: generate Table 3 and 4
notebooks/tradeoff.ipynb: generate images in Figure 1 and 4
notebooks/plot_bbm.ipynb: generate images in Figure 5

Parameters

usage: main.py [-h] [--no-hooks] --experiment
               {lin_sep_bbm_rob_3,risk_slim_3,dt_interpret_rob_3,xgboostrobdt_interpret_rob,calc_lin_separation,calc_separation}
               --dataset DATASET --preprocessor PREPROCESSOR --random_seed
               RANDOM_SEED --rsep RSEP

Datasets: {risk_ionosphere, risk_diabetes, risk_breastcancer, risk_adult, risk_mushroom, risk_mammo, risk_spambase, risk_bank, risk_careval, risk_compasbin, risk_ficobin, risk_bank_2, risk_heart}

Algorithm implementations

Boosting by majority (BBM)

Examples

The result of each example is outputed as a joblib pickle file named temp.pkl.

Run BBM-RS with $\tau = 0.05$ on the bank dataset.

python main.py --no-hooks --experiment lin_sep_bbm_rob_3 \
  --dataset risk_bank --preprocessor rminmax \
  --rsep 0.05 \
  --random_seed 0

Run RobDT with robust radius $ = 0.1$ on the mammo dataset.

python main.py --no-hooks --experiment xgboostrobdt_interpret_rob \
  --dataset risk_mammo --preprocessor rminmax \
  --rsep 0.1 \
  --random_seed 0

Run LCAP on the mammo dataset.

python main.py --no-hooks --experiment risk_slim_3 \
  --dataset risk_mammo --preprocessor rminmax \
  --random_seed 0

Run DT on the heart dataset.

python main.py --no-hooks --experiment dt_interpret_rob_3 \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the r-separateness of the heart dataset.

python main.py --no-hooks --experiment calc_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the linear separateness of the heart dataset.

python main.py --no-hooks --experiment calc_lin_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
experiments		experiments
notebooks		notebooks
rsep_explain		rsep_explain
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
env.sh		env.sh
main.py		main.py
params.py		params.py
requirements.txt		requirements.txt
run_tasks.py		run_tasks.py
utils.py		utils.py

yangarbiter/interpretable-robust-trees

Folders and files

Latest commit

History

Repository files navigation

Connecting Interpretability and Robustness in Decision Trees through Separation

Abstract

Installation

Install LP, QP Solvers

submodules

Install LCPA

Install robust decision tree (RobDT)

Scripts

Dataset process scripts

Experiment scripts

Figure/Table generation scripts

Parameters

Algorithm implementations

Examples

About

Topics

Resources

Stars

Watchers

Forks

Languages