This repository contains a collection of 20 problems taken from Leetcode for the 1st edition of the Automated Program Repair Competition (APR-COMP 2024). Each problem has 5 solutions generated by OpenAI's GPT-3.5 Turbo and GPT-4 chat models. The solutions are located in the respective problem folders.
The benchmark distribution is 1:2:1 in difficulty Easy:Medium:Hard.
The problems are set up as python projects, running on Python 3.9 using the pytest package for tests.
The (in)correctness of each solution has been manually evaluated, ensuring that each solution has passed and failed at least one test case in Leetcode's judging system.
The test suite used for evaluation is generated from the public test cases provided by Leetcode and using a generator based fuzzer using the source code of the Fuzzing book over a reference solution. Every problem's directory contains a reference.py
file, which is a reference solution collected from Leetcode's forums. The implementation of the test suite generation is in the testcases
folder. To generate the test suite, execute the generate_tests.sh
script to generate the public and private test suite.
For further information, the file testcases/<PROBLEM>/fuzzer.py
and testcases/<PROBLEM>/bug.py
presents the fuzzer and test harness over the testcases/<PROBLEM>/reference.py
file, based on reference.py
.
- To regenerate the metadata, execute the
metadata-generator.py
script. This script will traverse all problems, insert the subject scripts (run_test
,setup_subject
,install_deps
) and generate metadata entries with all required information. To ensure correct execution of the script, ensure that Java 11 and Maven are installed on the machine and this repository must be located in the home directory of the user due to the usage of relative paths. To change this requirement, modify line 6 inrun_test_local
.
The repository contains a folder "crawler" which contains the code used to generate solutions using GPT-3.5 and GPT-4 with a crawler. To execute the crawler, add an OpenAI API key in crawler.py
line 44. The key must have access to GPT-4.
In order to reproduce the results from APR-COMP, ensure that the benchmark is on the apr-comp-<YEAR>
branch. Subsequently, invoke cerberus with all tool configs with valkyrie.cerberus.config
being last in order to run as a validation over the generated patches. After all runs have executed, run the process_results.py
script to get the final data in a file called aggregated.json
.