Skip to content

UNIST-LOFT/poracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What this repository is about

In our paper titled Poracle: Testing Patches Under Preservation Conditions To Combat the Overfitting Problem of Program Repair, we propose a new patch validation approach. Our approach involves (1) generalizing an existing failing test to add a preservation condition (the condition under which program behavior should be preserved between a buggy version and its patched version) and (2) performing differential fuzzing whose goal is to find out witness input for which a buggy version and its patched version produce different output despite the fact that the provided preservation condition is satisfied.

In this repository, we share the artifacts we used in our study. Our artifacts consist of

Downloading and accessing the container

Our Docker container image can be downloaded as follows:

$ docker pull poracle100/poracle-reproduce:latest

Once the image is downloaded, a container can be started as follows.

$ docker run -p 1000:22 --name poracle-reproduce --rm -d poracle100/poracle-reproduce:latest

Note that our container runs sshd, and it can be accessed through port 1000 as follows (you can replace port number 1000 with another port you want).

$ ssh root@localhost -p 1000

The password of the container is "poracle".

Basic usage

Our main script (i.e., poracle) is available in the /poracle-experiments directory.

cd /poracle-experiments/

The following shows an example usage of poracle.

$ poracle configs/Patch7.json --duration 10m

where time limit for fuzzing is set to 10 minutes, and the information about the target patch is given through configs/Patch7.json which contains the following:

{"ID": "Patch7", "tool": "Nopol2015", "correctness": "Incorrect", "bug_id": "5", "project": "Chart",
  "target": ["org/jfree/data/xy/XYSeries.java:563"]}
  • ID: patch ID
  • tool: the tool that produced the patch
  • correctness: the label of the patch (either Correct or Incorrect)
  • bug_id: bug ID as defined in Defects4J
  • project: project as defined in Defects4J
  • target: patch line(s)

Once the poracle script terminates normally, the following line will be printed (we internally use a fuzzer and the result may be different occasionally):

INFO     poracle         Results: [(<Judge.REJECT: 1>, <Validate.MATCH: 0>)]

Judge.REJECT denotes that the patch is recommended to be rejected (please ignore number 1 which simply refers to the ID for Judge.REJECT). Validate.MATCH will be explained later. Note that we reject the patch when an output difference is observed between a given patched version and its original version, while the preservation condition holds. To see which input causes the output difference, type the following: (the .poracle directory is created when running the poracle script)

$ ls .poracle/fuzz-results/test1/1/diff_out/

When a patch is recommended to be rejected, the diff_out directory contains one input file. Let's say the diff_out directory contains id_000000005. The id_000000005 file is written in the binary format, and its corresponding text-format file should be available in the following path (replace id_000000005 with the actual ID that is obtained):

.poracle/log/test1/1/ORG/id_000000005/IN.log

At the time of writing this document, the following IN.log file was obtained (you may obtain different values due to the randomness of fuzzing).

-1;-2;1;0.8926526466663729;0.9953236612191507

The five values delimited by the semicolon represent the values of the five parameters of the following test:

public void testBug1955483(@InRange(minInt=-4, maxInt=6) int i1,
                           @InRange(minInt=-4, maxInt=6) int i2,
                           @InRange(minInt=-4, maxInt=6) int i3,
                           @InRange(minDouble=0, maxDouble=1) double ch1,
                           @InRange(minDouble=0, maxDouble=1) double ch2) {
    ...
    series.addOrUpdate(i1, i2);
    ...
}

where the @InRange annotation is used to define the range of the five random variables. We assigned a range based on the original value of each random variable. For example, i1 is used in series.addOrUpdate(i1, i2) which corresponds to series.addOrUpdate(1.0, 1.0) of the original test in Defects4J. Notice that we replace value 1.0 into range [-4, 6]. Note that the specified ranges of the parameters are only initial ones, and the actual ranges are adjusted during the fuzzing process, as described in the paper.

When input id_000000005 is used, a different output is produced between the original version and its patched version. To see the difference, do the following:

$ diff .poracle/log/test1/1/ORG/id_000000005/OUT.log .poracle/log/test1/1/PATCH/id_000000005/OUT.log

You may see a result similar to the following:

1c1
< -2.0;-1.0;-1.0;-2.0;1.0;-2.0
\ No newline at end of file
---
> -1.0;-2.0;-2.0;-1.0;1.0;-2.0
\ No newline at end of file

Our poracle script also validates its patch validation result by making use of a ground-truth correct version available through Defects4J. By running the correct version with the obtained difference-witnessing input (id_000000005 in our running example), it can be checked whether a different output is also obtained between the original version and its correct version.

Recall that the poracle script produces the following result.

INFO     poracle         Results: [(<Judge.REJECT: 1>, <Validate.MATCH: 0>)]

Validate.MATCH in the output denotes the fact that output difference is indeed also observed between the original version and its (developer-provided) fixed version. In other words, our recommendation matches the ground-truth. The following table provides the description of 4 possible cases. Note that when no difference-witnessing input is found, poracle recommends ACCEPT and its validity is checked with the ground-truth label available in our benchmark.

Judge Validate Description
Judge.REJECT Validate.MATCH A rejection recommendation is correct
Judge.REJECT Validate.MISMATCH A rejection recommendation is incorrect
Judge.ACCEPT Validate.MATCH An acceptance recommendation is correct
Judge.ACCEPT Validate.MISMATCH A rejection recommendation is incorrect

How to reproduce the experimental results

Our experimental results can be reproduced by running therunexp_all script as follows:

$ cd /poracle-experiments/ && ./runexp_all reproduce 1

Therunexp_all script enumerates over all patches in our benchmark, and stores generated files in the poracle_results/reproduce/1 directory (The 1 refers to the experiment ID).

The obtained experimental results can be printed out by running the summary script.

$ cd /poracle-experiments
$ summary poracle_results

The following shows a snippet of the output of summary (Table 3 of our paper is extracted from this output).

summary

The "label" column shows the labels (either Correct or Incorrect) available in our benchmark extracted from the existing work. The description of the "verdict" and "consistency" columns follow the description of "Judge" and "Validate" provided earlier. "A" and "R" represent "Judge.ACCEPT" and "Judge.REJECT", respectively, and "C" and "I" represent "Validate.MATCH" and "Validate.MISMATCH", respectively.

How to reproduce the JAID experiment results

We used JAID tool to generate patches for all the bugs we have. We have used "Autofix" option of JAID. We stored all the JAID patches into the patches directory. Then we ran Poracle with the Jaid patches to see if we can reject many incorrect patches while keeping correct patches. Our JAID experiment result can be reproduced by running therunexp_all_jaid script as follows:

$ cd /poracle-experiments/ && ./runexp_all_jaid math105 1 --rank 265 --bug-version math_105

Here, --rank 265 means the original rank of the correct patch. We used this option to let the script know the rank of the correct patch, so that it will run all the patches before this number. Note that we also checked whether the correct patch is rejected or not. --bug-version math_105 means, it is running the patches for Math105 bug. The runexp_all_jaid script enumerates over all Jaid patches for Math105 (the subject buggy version), and stores generated results in the poracle_results/1 directory (The 1 refers to the experiment ID).

The obtained experimental results can be printed out on the screen by running the summary script.

$ cd /poracle-experiments
$ summary poracle_results

How to obtain the latest experimental results

Our latest experimental results are available in the /poracle-experiments/latest_results' directory. Running the summary` script will show the results on the screen.

$ cd /poracle-experiments
$ summary latest_results

Evidence for mislabeled patches

As a byproduct of our approach, we found that 4 patches in our dataset obtained from the existing study are mislabeled. While these patches are labeled correct in the original dataset, Poracle found an input that causes output difference between the buggy version and its patched version. We validate these results with the ground-truth correct versions available in Defects4J benchmark, and indeed correct and patched versions produce different results when the same inputs found by Poracle are used. In this repository, we provide concrete evidence for those mislabeled patches --- i.e., witness tests whose executions result in different behavior between patches versions and their ground-truth correct versions. By using these witness tests, behavioral discrepancy can be easily observed, without having to install and run Poracle.

Recall that Poracle takes as input a parameterized test, as shown in the following.

public void testBug1955483(@InRange(minInt=-4, maxInt=6) int i1,
                           @InRange(minInt=-4, maxInt=6) int i2,
                           @InRange(minInt=-4, maxInt=6) int i3,
                           @InRange(minDouble=0, maxDouble=1) double ch1,
                           @InRange(minDouble=0, maxDouble=1) double ch2) {
    ...
    series.addOrUpdate(i1, i2);
    ...
}

We earlier showed that Poracle generated the following input:

-1;-2;1;0.8926526466663729;0.9953236612191507

Based on this input, we prepared the following JUnit test:

public void testBug1955483() {
    ...
    series.addOrUpdate(-1, -2);
    ...
}

Notice that we replace i1 and i2 with their corresponding difference-witnessing input values -1 and -2. The obtained witness test is available here.

We now describe how to reproduce behavioral discrepancy between an Patch7 (an incorrect patch that is labeled correct) and its ground-truth correct version. The following commands checkout Chart 5b (a buggy version), overwrite JQF_XYSeries.java with its patched version, and build the project.

$ bash -c "cd /tmp && \
  defects4j checkout -p Chart -v 5b -w Chart5p && \
  cp /poracle-experiments/mislabeled_patches/Patch7/source/org/jfree/data/xy/XYSeries.java Chart5p/source/org/jfree/data/xy/ && \
  cp /poracle-experiments/mislabeled_patches/Patch7/tests/org/jfree/data/xy/junit/JQF_XYSeriesTests.java Chart5p/tests/org/jfree/data/xy/junit/ && \
  defects4j compile -w Chart5p"

Similarly, the following commands build Chart 5f (a ground-truth correct version).

$ bash -c "cd /tmp && \
  defects4j checkout -p Chart -v 5f -w Chart5f && \
  cp /poracle-experiments/mislabeled_patches/Patch7/tests/org/jfree/data/xy/junit/JQF_XYSeriesTests.java Chart5f/tests/org/jfree/data/xy/junit/ && \
  defects4j compile -w Chart5f"

The output of testBug1955483 for the patched version can be obtained using the following commands.

$ bash -c "cd /tmp/Chart5p && \
  ant -f /poracle-experiments/benchmark/defects4j/framework/projects/defects4j.build.xml \
      -Dd4j.home=/poracle-experiments/benchmark/defects4j \
      -Dbasedir=/tmp/Chart5p \
      -DOUTFILE=/tmp/Chart5p/failing_tests -Dtest.entry.class=org.jfree.data.xy.junit.JQF_XYSeriesTests \
      -Dtest.entry.method=testBug1955483 run.dev.tests && \
  cd -"

The output will include the following line:

[junit] [-1.0, -2.0, -2.0, -1.0, 1.0, -2.0]

Meanwhile, the result of the same test can be obtained from the ground-truth correct version using the following command.

$ bash -c "cd /tmp/Chart5f && \\
  ant -f /poracle-experiments/benchmark/defects4j/framework/projects/defects4j.build.xml \
      -Dd4j.home=/poracle-experiments/benchmark/defects4j \
      -Dbasedir=/tmp/Chart5f -DOUTFILE=/tmp/Chart5f/failing_tests \
      -Dtest.entry.class=org.jfree.data.xy.junit.JQF_XYSeriesTests \
      -Dtest.entry.method=testBug1955483 run.dev.tests && \\
  cd -"

The output will include the following line:

[junit] [-2.0, -1.0, -1.0, -2.0, 1.0, -2.0]

Notice the difference between the two outputs.

The following table provides links to witness tests and patched files for all 4 mislabeled patches.

Patch ID Project Bug ID Witness Test Patched file
Patch7 Chart 5 JQF_XYSeriesTests.java XYSeries.java
Patch26 Lang 58 JQF_NumberUtilsTest.java NumberUtils.java
Patch54 Math 73 JQF_BrentSolverTest.java BrentSolver.java
Patch192 Lang 35 JQF_ArrayUtilsAddTest.java ArrayUtils.java

Generalized tests

The generalized tests we used in our experiments are available in the deltas directory. In each sub-directory, we use the following name convention [Project]_bug[bug_id] such as Math/Math_bug2. We provide a table that contains links to the generalized tests, along with the following additional columns:

  • The #Patches column shows the number of patches that can be validated with a given generalized test.
  • The Pattern column shows a pattern we used to prepare a preservation condition. We use the following 4 patterns whose description is available in our paper (see Section 4.1).
    • UE (Unexpected Exception)
      • We enclose the original test code inside a try-catch block to ignore abnormal termination of the buggy version.
    • CC (Complementary Cases)
      • Behavioral preservation is enforced for the inputs complementary to the original test input.
    • EGA (Existing Assertion)
      • We use an existing assertion condition as a preservation condition.
    • RI (Reference Implementation)
      • We use a reference implementation that performs the same functionality in a preservation condition.
  • The Complexity column shows the complexity of the preservation condition that is defined as the number of relation operators used (e.g., >, >=) and the number of Boolean connectives, && and ||. We do not count the negation operator (!). Note that "!(x > y)" can be expressed with "x <= y" without using !.

Table of generalized tests

Project ID #Patches Pattern Complexity
Math 2 10 CC 1
Math 3 4 UE 0
Math 4 14 UE 0
Math 5 10 CC 1
Math 7 3 UE 0
Math 8 9 UE 0
Math 22 5 EGA 2
Math 24 2 EGA 1
Math 25 3 CC 1
Math 28 15 UE 0
Math 32 6 UE 0
Math 33 9 RI 5
Math 34 3 CC 2
Math 35 8 CC 3
Math 39 2 UE 0
Math 40 8 UE 0
Math 41 6 EGA 5
Math 42 5 EGA 2
Math 44 2 UE 0
Math 49 7 UE 0
Math 50 23 CC 0
Math 53 7 CC 1
Math 57 9 CC 1
Math 58 12 UE 0
Math 61 3 CC 1
Math 69 4 EGA 1
Math 70 11 UE 0
Math 71 5 EGA 3
Math 73 10 CC 10
Math 78 5 UE 0
Math 80 19 RI 1
Math 81 28 UE 0
Math 82 20 RI 7
Math 84 15 RI 1
Math 85 24 UE 0
Math 87 4 RI 1
Math 88 14 RI 5
Math 89 7 CC 1
Math 90 4 CC 1
Math 93 4 EGA 5
Math 95 15 UE 0
Math 97 7 UE 0
Math 99 4 CC 7
Math 104 3 CC 1
Math 105 5 EGA 1
Chart 1 3 EGA 1
Chart 3 2 CC 5
Chart 5 4 UE 0
Chart 7 2 CC 1
Chart 9 2 UE 0
Chart 13 5 UE 0
Chart 14 2 UE 0
Chart 15 3 UE 0
Chart 17 2 UE 0
Chart 19 2 CC 1
Chart 21 3 CC 1
Chart 25 5 UE 0
Chart 26 4 UE 0
Time 4 6 CC 1
Time 7 7 UE 0
Time 11 7 EGA 1
Time 12 2 EGA 1
Time 14 3 UE 0
Time 15 4 CC 15
Time 16 2 EGA 2
Time 18 3 UE 1
Time 19 8 EGA 1
Lang 24 7 RI 1
Lang 35 2 EGA 9
Lang 39 9 RI 0
Lang 44 6 RI 1
Lang 46 7 EGA 1
Lang 51 3 RI 1
Lang 53 5 RI 1
Lang 55 5 EGA 1
Lang 57 2 EGA 1
Lang 58 15 RI 1
Average 6.87 1.58

User Study

The user study materials are available in this repository. Check out the user-study folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published