added the README for the replication package

LASER-UMASS · Sep 2, 2017 · 7659269 · 7659269
1 parent 907f2f6
commit 7659269
Show file tree

Hide file tree

Showing 2 changed files with 114 additions and 2 deletions.
diff --git a/ESEC.FSE.2017.Experimental.Replication/README.md b/ESEC.FSE.2017.Experimental.Replication/README.md
@@ -0,0 +1,112 @@
+# ESEC/FSE 2017 Experimental Replication Package
+
+Our paper [Fairness Testing: Testing Software for
+Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf) 
+was published in [ESEC/FSE 2017](http://esec-fse17.uni-paderborn.de/). 
+This page contains the replication package to repeat the experiments
+described in the paper, and reproduce Figures 1 and 2. (Figure 3 is a
+theoretical result.)
+
+## Requirements
+
+The code has been tested with Python v.2.7 and associated libraries for that
+version of Python. This describes both our code's dependencies, and the
+dependencies of the underlying subject system our code uses in the evaluation.
+
+The `GenerateFigure1TestSuites.sh` and `GenerateFigure2TestSuites.sh` scripts
+require three python libraries, `matplotlib`, `scikit-learn`, and `numpy`.
+There are many ways to install these. For example, if you use
+[MacPorts](https://www.macports.org/), to make sure Python and the three
+packages are installed, run
+
+```
+port install python27
+port install py27-matplotlib
+port install py27-scikit-learn
+port install py27-numpy
+```
+
+## The results
+
+This replication package reproduces results of two experiments. 
+
+Experiment 1 produces the data for Figure 1 in [Fairness Testing: Testing
+Software for
+Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
+This experiment computes the group and causal discrimination scores for a
+total of 20 instances of the eight subject software systems.
+
+Experiment 2 produces data for Figure 2 in [Fairness Testing: Testing
+Software for
+Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
+This experiment computes the sets of sensitive characteristics that the 20
+subject instances discriminate against causally at least 5% and that
+contribute to subsets of characteristics that are discriminated against at
+least 75%.
+
+## Reproducing the results
+
+Reproducing each table consists of two steps: 
+1. Using Themis to produce a test suite for each of the 20 instances of the
+eight subject systems (this process also executes the test suites).
+2. Post-processing the results.  
+
+Step 1, for both figures, takes a long time to execute. So the replication
+package ships with the produced test suites. Thus it is possible to skip step
+1 and run step 2 straight away. This process is very fast and produces the
+data you see in Figures 1 and 2.  
+
+There are four scripts in the replication package (one for each step for each
+of the two figures):
+
+### `Figure1/GenerateFigure1TestSuites.sh`
+
+This script produces the necessary test suites for Figure 1. There are
+multiple test suites per subject system instance, and 20 subject system
+instances. Each instance has to be trained on training data before being
+executed. Thus, this script takes a very long time to execute.
+
+This script populates the `Figure1/Scripts` directory. (Recall that this
+directory is already pre-populated with the scripts so that it is possible to
+skip this script to save time.
+
+### `Figure1/GenerateFigure1.sh`
+
+This script processes the data in the `Figure1/Scripts` directory (that
+either comes with the replication package or is generated by
+`Figure1/GenerateFigure1TestSuites.sh`) to produce the tabulated data in
+Figure 1.
+
+### `Figure2/GenerateFigure2TestSuites.sh`
+
+This script produces the necessary test suites for Figure 2. Again, there are
+multiple test suites per subject system instance, and 20 subject system
+instances, and each instance has to be trained on training data before being
+executed. Thus, this script takes a very long time to execute.
+
+This script populates the `Figure2/Scripts` directory. (Recall that this
+directory is already pre-populated with the scripts so that it is possible to
+skip this script to save time.
+
+### `Figure2/GenerateFigure2.sh`
+
+This script processes the data in the `Figure2/Scripts` directory (that
+either comes with the replication package or is generated by
+`Figure2/GenerateFigure2TestSuites.sh`) to produce the tabulated data in
+Figure 2.
+
+## Note on nondeterminism
+
+Our replication package goes to great lengths to eliminate sources of
+nondeterminism. While Themis uses randomness in its test suite generation, it
+uses a seed parameter to make the randomness deterministic. These seeds are
+encoded in the above scripts. However, the underlying subject systems also
+exhibit nondeterminism. We cannot control this nondeterminism, which is
+typical when using real-world, off-the-shelf software. Because Themis is
+adaptive and its test suite generation depends on the underlying system's
+outputs on the inputs Themis generates, the subject systems' nondeterminism
+affects the test suites. As such, running
+`Figure1/GenerateFigure1TestSuites.sh` and
+`Figure2/GenerateFigure2TestSuites.sh` will produce slightly different test
+suites each time. These differences may result in small differences in the
+final, processed data.
diff --git a/README.md b/README.md
@@ -5,8 +5,8 @@ system. For the best explanation of the underlying problem Themis solves,
 Themis algorithms, and an evaluation of Themis, read our paper [Fairness
 Testing: Testing Software for
 Discrimination](http://people.cs.umass.edu/~brun/pubs/pubs/Galhotra17fse.pdf).
- This work won an ACM Distinguished Paper Award at the ESEC/FSE 2017
-conference.
+This work won an ACM Distinguished Paper Award at the 
+[ESEC/FSE 2017](http://esec-fse17.uni-paderborn.de/) Conference.
 
 This repository contains Themis (in `Themis1.0/`), instructions for using
 Themis (this `README.md`), and subject systems on which Themis has been