Skip to content

yangarbiter/interpretable-robust-trees

Repository files navigation

Connecting Interpretability and Robustness in Decision Trees through Separation

This repository contains the code of the experiments in the paper

Connecting Interpretability and Robustness in Decision Trees through Separation

Authors: Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri

Abstract

Recent research has recognized interpretability and robustness as essential properties of trustworthy classification. Curiously, a connection between robustness and interpretability was empirically observed, but the theoretical reasoning behind it remained elusive. In this paper, we rigorously investigate this connection. Specifically, we focus on interpretation using decision trees and robustness to $l_\infty$-perturbation. Previous works defined the notion of $r$-separation as a sufficient condition for robustness. We prove upper and lower bounds on the tree size in case the data is $r$-separated. We then show that a tighter bound on the size is possible when the data is linearly separated. We provide the first algorithm with provable guarantees both on robustness, interpretability, and accuracy in the context of decision trees. Experiments confirm that our algorithm yields classifiers that are both interpretable and robust and have high accuracy.

Installation

pip install -r requirements.txt

Install LP, QP Solvers

CVXOPT_BUILD_GLPK=1
CVXOPT_GLPK_LIB_DIR=/path/to/glpk-X.X/lib
CVXOPT_GLPK_INC_DIR=/path/to/glpk-X.X/include
pip install --upgrade cvxopt

submodules

git submodule init
git submodule update

Install LCPA

cd risk-slim
pip install -r requirements.txt
pip install .

For more LCPA installation instructions, please visit https://github.com/ustunb/risk-slim

Install robust decision tree (RobDT)

cd RobustTrees
git submodule update --init --recursive
./build.sh
cd python-package
python setup.py install

For more RobDT installation instructions, please visit https://github.com/chenhongge/RobustTrees

Scripts

Dataset process scripts

Experiment scripts

Figure/Table generation scripts

Parameters

usage: main.py [-h] [--no-hooks] --experiment
               {lin_sep_bbm_rob_3,risk_slim_3,dt_interpret_rob_3,xgboostrobdt_interpret_rob,calc_lin_separation,calc_separation}
               --dataset DATASET --preprocessor PREPROCESSOR --random_seed
               RANDOM_SEED --rsep RSEP

Datasets: {risk_ionosphere, risk_diabetes, risk_breastcancer, risk_adult, risk_mushroom, risk_mammo, risk_spambase, risk_bank, risk_careval, risk_compasbin, risk_ficobin, risk_bank_2, risk_heart}

Algorithm implementations

Examples

The result of each example is outputed as a joblib pickle file named temp.pkl.

Run BBM-RS with $\tau = 0.05$ on the bank dataset.

python main.py --no-hooks --experiment lin_sep_bbm_rob_3 \
  --dataset risk_bank --preprocessor rminmax \
  --rsep 0.05 \
  --random_seed 0

Run RobDT with robust radius $ = 0.1$ on the mammo dataset.

python main.py --no-hooks --experiment xgboostrobdt_interpret_rob \
  --dataset risk_mammo --preprocessor rminmax \
  --rsep 0.1 \
  --random_seed 0

Run LCAP on the mammo dataset.

python main.py --no-hooks --experiment risk_slim_3 \
  --dataset risk_mammo --preprocessor rminmax \
  --random_seed 0

Run DT on the heart dataset.

python main.py --no-hooks --experiment dt_interpret_rob_3 \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the r-separateness of the heart dataset.

python main.py --no-hooks --experiment calc_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the linear separateness of the heart dataset.

python main.py --no-hooks --experiment calc_lin_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0