If you have any questions about this repository, please feel free to open an issue or send me an email at bast@cs.uni-freiburg.de with subject "ESA 2018 experiment".
The ESA 2018 Experiment was an in-depth analysis of two parallel program committees reviewing the complete set of submissions independently. This repository provides the (anonymized) data behind the experiment, we well as a Python script to analyze and visualize the data in various ways. It also contains the blog post published at BLOG@CACM: https://github.com/ad-freiburg/esa2018-experiment/blob/master/BLOGPOST.md The slides from the report presented at the business meeting of the conference can be found here: http://ad-publications.informatik.uni-freiburg.de/ESA_experiment_Bast_2018.pdf
The anonymized data is given in six files, one for each PC and each reviewing phase.
For example, scores-phase1-pc2.tsv
contains a snapshot of the scores from PC2 after Phase 1.
There is one line per submission, and one submission has the same line number in all files.
The first eight columns are pairs of review score and confidence score.
Most submissions received three reviews, in which case the seventh and eigth column are empty.
The voting results after Phase 3 are recorded in an additional ninth and tenth column (the average score and confidence from the votes).
Here are explanation of a few details from the blog post.
They also explain by example how the analyze.py
script works.
Some of the explanations refer to the slides from the link above.
-
If a fraction pi of the papers are accepted with probability ai, then the expected overlap is Σi pi ai2 / Σi pi ai. For the simple model, where each paper is accepted independently from the others with the same fixed acceptance rate, the expected overlap is simply that acceptance rate.
-
See slide 5 for the exact semantics of the scores. See slide 9 for the detailed scores of the 9 papers, which were a "clear accept" in at least one PC.
-
Run
python3 analyze.py
to see the definition of thel5
score for each paper (a single rule-based score from -2, -1, 0, +1, +2). It is also explained on slide 10. To see the confusion matrix between the two PCs after each phase, runpython3 analyze.py l5 --confusion-pcs
. To see how often which score was given by which PC, runpython3 analyze.py l5 --print
and execute the produced gnuplot script. The bars on the left show the clear rejects after each reviewing phase. -
To compute the Kendall tau correlation of the upper part of the ranking of the two PCs, run
python3 analyze.py avt
. The script also explains theavt
score (the l5 score, but set to zero if not at least one reviewer gave the paper a +2). To compute the p-values of an R-test, runpython3 analyze.py avt --rtest
.