Auto GPT Agent which automatically fixes code based on failing tests.
Tests can be a reliable description of the expected behavior of a program. When structured well, failing test results can be analysed by GPT-4 to determine possible fixes to the code. GPT-4 can then automatically implement fixes and verify they work by running the tests again.
In your project directory, run:
npx autoheal
Uses OpenAI's GPT-3.5-turbo or GTP-4 APIs. Requires OpenAI API key.
You can press [Enter] during the run to pause the process and provide a hint better guide autoheal.
autoheal will modify files in your project
Be sure to commit any unsaved changes before running
autoheal will run tests with file modifications made by GPT
It may not be wise to run if your test suite has potentially destructive side effects (e.g. modifying a database or connected to remote services)
This project is still very experimental and may not always produce good results, so run with caution. The following factors can influence the effectiveness of autoheal:
Simplier bugs or features that can be resolved in changes to single files will have most success.
Test failures that provide enough information (diffs, stack traces etc.) to determine possible paths to fix will have best results. Running tests in a mode that only output failing tests may improve results.
Projects with smaller and well-named files have better results. Autoheal's strategy is limited by openAI's token limit, so infers details by file names.
You can provide a freeform hint to autoheal to provide more specific details (e.g., specific files, or possible ways to fix the bug). This can be useful when the test failure output is not enough to determine a fix.
Using GPT-4 is much more reliable than GPT-3.5-turbo because it generally produces better results and has a larger token limit. I do not have access, but suspect OpenAI's 32k token model will enable much more effective strategies in the near future.
GPT-4 is very capable at writing code, however it can be challenging describing the specifics of the software you want to develop to GPT-4 as well as verify the software behaves in the intended way without subtle bugs. Automated tests can serve as a way to precisely describe the specifications of software and to automatically verify intended functionality. TDD can be used to more precisely steer GPT-4's development power.