Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
Figure1
Figure2
README.md

README.md

ESEC/FSE 2017 Experimental Replication Package

Our paper Fairness Testing: Testing Software for Discrimination was published in ESEC/FSE 2017. This page contains the replication package to repeat the experiments described in the paper, and reproduce Figures 1 and 2. (Figure 3 is a theoretical result.)

Requirements

The code has been tested with Python v2.7 and associated libraries for that version of Python. This describes both our code's dependencies, and the dependencies of the underlying subject system our code uses in the evaluation.

The GenerateFigure1TestSuites.sh and GenerateFigure2TestSuites.sh scripts require three python libraries, matplotlib, scikit-learn, and numpy. There are many ways to install these. For example, if you use MacPorts, to make sure Python and the three packages are installed, run

port install python27
port install py27-matplotlib
port install py27-scikit-learn
port install py27-numpy

The results

This replication package reproduces results of two experiments.

Experiment 1 produces the data for Figure 1 in Fairness Testing: Testing Software for Discrimination. This experiment computes the group and causal discrimination scores for a total of 20 instances of the eight subject software systems.

Experiment 2 produces data for Figure 2 in Fairness Testing: Testing Software for Discrimination. This experiment computes the sets of sensitive characteristics that the 20 subject instances discriminate against causally at least 5% and that contribute to subsets of characteristics that are discriminated against at least 75%.

Reproducing the results

Reproducing each table consists of two steps:

  1. Using Themis to produce a test suite for each of the 20 instances of the eight subject systems (this process also executes the test suites).
  2. Post-processing the results.

Step 1, for both figures, takes a long time to execute. So the replication package ships with the produced test suites. Thus it is possible to skip step 1 and run step 2 straight away. This process is very fast and produces the data you see in Figures 1 and 2.

There are four scripts in the replication package (one for each step for each of the two figures):

Figure1/GenerateFigure1TestSuites.sh

This script produces the necessary test suites for Figure 1. There are multiple test suites per subject system instance, and 20 subject system instances. Each instance has to be trained on training data before being executed. Thus, this script takes a very long time to execute.

This script populates the Figure1/Scripts directory. (Recall that this directory is already pre-populated with the scripts so that it is possible to skip this script to save time.

Figure1/GenerateFigure1.sh

This script processes the data in the Figure1/Scripts directory (that either comes with the replication package or is generated by Figure1/GenerateFigure1TestSuites.sh) to produce the tabulated data in Figure 1.

Figure2/GenerateFigure2TestSuites.sh

This script produces the necessary test suites for Figure 2. Again, there are multiple test suites per subject system instance, and 20 subject system instances, and each instance has to be trained on training data before being executed. Thus, this script takes a very long time to execute.

This script populates the Figure2/Scripts directory. (Recall that this directory is already pre-populated with the scripts so that it is possible to skip this script to save time.

Figure2/GenerateFigure2.sh

This script processes the data in the Figure2/Scripts directory (that either comes with the replication package or is generated by Figure2/GenerateFigure2TestSuites.sh) to produce the tabulated data in Figure 2.

Note on nondeterminism

Our replication package goes to great lengths to eliminate sources of nondeterminism. While Themis uses randomness in its test suite generation, it uses a seed parameter to make the randomness deterministic. These seeds are encoded in the above scripts. However, the underlying subject systems also exhibit nondeterminism. We cannot control this nondeterminism, which is typical when using real-world, off-the-shelf software. Because Themis is adaptive and its test suite generation depends on the underlying system's outputs on the inputs Themis generates, the subject systems' nondeterminism affects the test suites. As such, running Figure1/GenerateFigure1TestSuites.sh and Figure2/GenerateFigure2TestSuites.sh will produce slightly different test suites each time. These differences may result in small differences in the final, processed data.