Skip to content

Commit

Permalink
added the README for the replication package
Browse files Browse the repository at this point in the history
  • Loading branch information
brunyuriy committed Sep 2, 2017
1 parent 907f2f6 commit 7659269
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 2 deletions.
112 changes: 112 additions & 0 deletions ESEC.FSE.2017.Experimental.Replication/README.md
@@ -0,0 +1,112 @@
# ESEC/FSE 2017 Experimental Replication Package

Our paper [Fairness Testing: Testing Software for
Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf)
was published in [ESEC/FSE 2017](http://esec-fse17.uni-paderborn.de/).
This page contains the replication package to repeat the experiments
described in the paper, and reproduce Figures 1 and 2. (Figure 3 is a
theoretical result.)

## Requirements

The code has been tested with Python v.2.7 and associated libraries for that
version of Python. This describes both our code's dependencies, and the
dependencies of the underlying subject system our code uses in the evaluation.

The `GenerateFigure1TestSuites.sh` and `GenerateFigure2TestSuites.sh` scripts
require three python libraries, `matplotlib`, `scikit-learn`, and `numpy`.
There are many ways to install these. For example, if you use
[MacPorts](https://www.macports.org/), to make sure Python and the three
packages are installed, run

```
port install python27
port install py27-matplotlib
port install py27-scikit-learn
port install py27-numpy
```

## The results

This replication package reproduces results of two experiments.

Experiment 1 produces the data for Figure 1 in [Fairness Testing: Testing
Software for
Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
This experiment computes the group and causal discrimination scores for a
total of 20 instances of the eight subject software systems.

Experiment 2 produces data for Figure 2 in [Fairness Testing: Testing
Software for
Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
This experiment computes the sets of sensitive characteristics that the 20
subject instances discriminate against causally at least 5% and that
contribute to subsets of characteristics that are discriminated against at
least 75%.

## Reproducing the results

Reproducing each table consists of two steps:
1. Using Themis to produce a test suite for each of the 20 instances of the
eight subject systems (this process also executes the test suites).
2. Post-processing the results.

Step 1, for both figures, takes a long time to execute. So the replication
package ships with the produced test suites. Thus it is possible to skip step
1 and run step 2 straight away. This process is very fast and produces the
data you see in Figures 1 and 2.

There are four scripts in the replication package (one for each step for each
of the two figures):

### `Figure1/GenerateFigure1TestSuites.sh`

This script produces the necessary test suites for Figure 1. There are
multiple test suites per subject system instance, and 20 subject system
instances. Each instance has to be trained on training data before being
executed. Thus, this script takes a very long time to execute.

This script populates the `Figure1/Scripts` directory. (Recall that this
directory is already pre-populated with the scripts so that it is possible to
skip this script to save time.

### `Figure1/GenerateFigure1.sh`

This script processes the data in the `Figure1/Scripts` directory (that
either comes with the replication package or is generated by
`Figure1/GenerateFigure1TestSuites.sh`) to produce the tabulated data in
Figure 1.

### `Figure2/GenerateFigure2TestSuites.sh`

This script produces the necessary test suites for Figure 2. Again, there are
multiple test suites per subject system instance, and 20 subject system
instances, and each instance has to be trained on training data before being
executed. Thus, this script takes a very long time to execute.

This script populates the `Figure2/Scripts` directory. (Recall that this
directory is already pre-populated with the scripts so that it is possible to
skip this script to save time.

### `Figure2/GenerateFigure2.sh`

This script processes the data in the `Figure2/Scripts` directory (that
either comes with the replication package or is generated by
`Figure2/GenerateFigure2TestSuites.sh`) to produce the tabulated data in
Figure 2.

## Note on nondeterminism

Our replication package goes to great lengths to eliminate sources of
nondeterminism. While Themis uses randomness in its test suite generation, it
uses a seed parameter to make the randomness deterministic. These seeds are
encoded in the above scripts. However, the underlying subject systems also
exhibit nondeterminism. We cannot control this nondeterminism, which is
typical when using real-world, off-the-shelf software. Because Themis is
adaptive and its test suite generation depends on the underlying system's
outputs on the inputs Themis generates, the subject systems' nondeterminism
affects the test suites. As such, running
`Figure1/GenerateFigure1TestSuites.sh` and
`Figure2/GenerateFigure2TestSuites.sh` will produce slightly different test
suites each time. These differences may result in small differences in the
final, processed data.
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -5,8 +5,8 @@ system. For the best explanation of the underlying problem Themis solves,
Themis algorithms, and an evaluation of Themis, read our paper [Fairness
Testing: Testing Software for
Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
This work won an ACM Distinguished Paper Award at the ESEC/FSE 2017
conference.
This work won an ACM Distinguished Paper Award at the
[ESEC/FSE 2017](http://esec-fse17.uni-paderborn.de/) Conference.

This repository contains Themis (in `Themis1.0/`), instructions for using
Themis (this `README.md`), and subject systems on which Themis has been
Expand Down

0 comments on commit 7659269

Please sign in to comment.