Skip to content
{{ message }}

# davidinouye / automated-dependence-plots

Code for automated dependence plots developed in our UAI 2020 paper.

Switch branches/tags
Nothing to show

## Files

Failed to load latest commit information.
Type
Name
Commit time

# Automated Dependence Plots

The following repository is the code for the following paper:

Automated Dependence Plots
David I. Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, Pradeep Ravikumar
Uncertainty in Artificial Intelligence (UAI), 2020.

If you use this code, please cite this paper:

``````@inproceedings{inouye2020adp,
author = {Inouye, David I and Leqi, Liu and Kim, Joon Sik and Aragam, Bryon and Ravikumar, Pradeep},
booktitle = {Uncertainty in Artificial Intelligence},
title = {{Automated Dependence Plots}},
year = {2020}
}
``````

## Introduction

How can we audit black-box machine learning models to detect undesirable behaviours?

Visualizing the output of a model via dependence plots is a classical technique to understand how model predictions change as we vary the inputs. Automated dependence plots (ADPs) are a way to automate the manual selection of interesting or relevant dependence plots by optimizing over the space of dependence plots.

The basic idea is to define a utility function that quantifies how "interesting" or "relevant" a plot is---for example, this could be directions over which the model changes abruptly (`LeastLipschitzUtility`), is non-monotonic (`LeastMonotonicUtility`), or changes the most from a constant function (`LeastConstantUtility`). The steps are as follows:

1. Define a plot utility measure (or use a pre-defined utility)
2. Optimize over directions in feature space to find plots with the highest utility
3. Visualize the dependence plot in this direction

For example, the following figure highlights the combination of two features over which a model exhibits the most non-monotonic behaviour, and displays the output of the model as you vary these features: A more interesting example finds interesting directions in the latent space of a generative model (in this case, a VAE trained on the MNIST dataset): ## Basic structure of module

This `adp` module contains four main submodules that can be used together or separately:

1. `adp.curve` - This submodule handles creating directional curves (defined by directional vector `v`) in the input space centered on a target point usually denoted by `x0`. In particular, this module handles boundes for the curve based on a training dataset. There is also some simple support for handling categorical variables.
2. `adp.utility` - This submodule defines multiple concrete utility measures for evaluating the interestingness or usefulness of various directional curves.
3. `adp.optimize` - This submodule handles optimization over possible curves to automatically optimize the specified utility.
4. `adp.plot` - This submodule handles plotting of curves including handling curves that vary in more than one dimension and showing the alternative models that depend on the utility selected.

## Quickstart

To setup an environment (via conda), download data and pretrained models, and run notebooks to generate figures, simply run the following commands:

``````make conda-env
source activate adp-env || conda activate adp-env
make data
make models
make test
``````

## Examples

We have provided two Jupyter notebook tutorials to illustrate the use of ADPs:

These tutorials showcase the basic functionality and structure of the code. From here, users can extend these examples to custom plot utility measures and more complex datasets.

## Installation

### Requirements

We use a recent version of scikit-learn (0.23) that includes partial dependence plots (might also be included in sklearn 0.22). To setup a conda environment and install requirements:

``````make conda-env
conda activate adp-env
``````

Or to do it manually:

``````conda env create -f environment.yml
conda activate adp-env
``````

To remove this environment:

``````conda env remove --name adp-env
``````

### Data and pretrained models setup

For simplicity, it's probably best to download all data and pretrained models before going through the tests. The longest setup is for the GTSRB sign dataset but it should only take a few minutes. Just run the following make commands to setup both the models and data

``````make models
make data
``````

### Notebooks

Notebooks should be run by starting Jupyter notebooks in the notebooks folder. This is to make sure the relative paths work correctly for loading the module and data/models.

```cd notebooks/
jupyter notebook```

### Figures

NOTE: Figures may be slightly different than original paper because we updated to using sklearn 0.23 instead of 0.19 that was originally used in the paper. If you want to reproduce the exact figures, please see the tag v0.0.1 (note the environment is also different so you will need to create a new environment).

Each figure can be reproduced by running the following notebooks:

1. Figure 1 - figure-loan-optimize.ipynb
2. Figure 2 - notebooks/figure-contrast-with-local-approximation.ipynb
3. Figure 3 - figure-lipschitz-bounded.ipynb
4. Figure 4 - figure-loan-model-comparison.ipynb
5. Figure 5 - figure-selection-bias.ipynb
6. Figure 6 - figure-streetsign.ipynb
7. Figure 7 - figure-vae-mnist.ipynb
8. One appendix figure - figure-domain-mismatch-loan.ipynb

### To run notebooks from command line

`jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute notebooks/NOTEBOOK_NAME.ipynb`

We have provided a Makefile for running all the notebooks. Merely run the following command to execute all notebooks (output goes intout notebooks/results/NOTEBOOKNAME.out. An *.error file will be generated if the notebook failed and a *.success file will be generated if the notebook ran successfully.

`make test`

## About

Code for automated dependence plots developed in our UAI 2020 paper.

3 tags

## Packages 0

No packages published

•
•