This is the Redemption of False Positives project.
'Redemption' Automated Code Repair Tool Copyright 2023, 2024 Carnegie Mellon University. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN 'AS-IS' BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Licensed under a MIT (SEI)-style license, please see License.txt or contact permission@sei.cmu.edu for full terms. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This Software includes and/or makes use of Third-Party Software each subject to its own license. DM23-2165The Redemption tool makes repairs to C/C++ source code based on alerts produced by certain static-analysis tools.
For more details of the background and capabilities of the Redemption tool, see this presentation from the SEI Research Review 2023.
Redemption is currently able to identify alerts from the following static-analysis tools:
Tool | Version | License | Container | Origin |
---|---|---|---|---|
Clang-tidy | 16.0.6 | LLVM Release | silkeh/clang:latest | https://clang.llvm.org/extra/clang-tidy |
Cppcheck | 2.4.1 | GPL v3 | facthunder/cppcheck:latest | https://cppcheck.sourceforge.io/ |
Rosecheckers | CMU (OSS) | ghcr.io/cmu-sei/cert-rosecheckers/rosebud:latest | https://github.com/cmu-sei/cert-rosecheckers |
You may run other tools, or create alerts manually; however Redemption has not been tested with alerts produced by other tools.
Redemption can currently repair the following categories of alerts. These alerts will often have a MITRE CWE number associated with them, or a rule in the SEI CERT C Coding Standard.
Category | CERT Rule ID | CWE ID |
---|---|---|
Null Pointer Dereference | EXP34-C | 476 |
Uninitialized Value Read | EXP33-C | 908 |
Ineffective Code | MSC12-C | 561 |
We hope to add more categories soon.
The code is designed to run inside a Docker container, which means you will need Docker. To build the Docker container:
docker build -f Dockerfile.prereq -t docker.cc.cert.org/redemption/prereq .
docker build -f Dockerfile.distrib -t docker.cc.cert.org/redemption/distrib .
This image contains all the dependencies needed to run the repair tool. (It does not actually contain the repair tool itself.)
This command starts a Bash shell in the container:
docker run -it --rm docker.cc.cert.org/redemption/distrib bash
Note that this container contains the entire Redemption code...this is useful if you intend to run Redemption, but have no plans to modify or tweak the code. If you plan to modify the code, use the prereq
container instead:
docker run -it --rm -v ${PWD}:/host -w /host docker.cc.cert.org/redemption/prereq bash
Unlike the distrib
container, the prereq
container does not contain the Redemption code...it just contains dependencies necessary to run Redemption. But the Redemption code lives outside the container, on a shared volume. This allows you to modify the Redemption code on the host, while accessing it within the container.
The tool has a simple sanity test that you can run in the distrib
container. It uses pytest to run all the tests in the /host/code/acr/test
directory of the container (all in functions with names starting with test_
) After launching the distrib
container, the following commands will run a few sanity tests:
pushd /host/code/acr/test
pytest
All tests should pass.
There is a test
Docker container that you can build and test with. It builds git
and zeek
. Building zeek takes about 40 minutes on one machine. To build and run the test
container:
docker build -f Dockerfile.test -t docker.cc.cert.org/redemption/test .
docker run -it --rm -v ${PWD}:/host -w /host docker.cc.cert.org/redemption/test bash
In the doc/examples
directory, there are several demos, each living in its own directory. The following table lists each demo; its title is the same as the folder containing the demo. The demos differ in the properties of the code they repair, and this is reflected in the Codebase
column:
| Demo Title | Codebase |
|------------------+---------------------------------------------------|
| simple
| Simple C source file |
| codebase
| Multi-file OSS codebase |
| separate_build
| OSS codebase requiring separate build environment |
If you are new to the Redemption tool, we recommend going through at least one of these demos. If you wish to repair a single C file, you should study the simple
demo. If you wish to repair a multi-file codebase, and you can build the codebase in the Redemption container, you should study the codebase
demo. If your code does not build in the Redemption container, you should study the separate_build
demo.
Inputs to the Redemption tool include:
- Codebase: This is a path to one or more C/C++ source code files. For multiple source files, the
compile_commands.json
file (frombear
, discussed more below) identifies the path for each file, while the-b
or--base-dir
argument tosup.py
(discussed more below) specifies the base directory of the project). - Build command: This can be a compile command or a build system like "make". Redemption needs this in order to learn which macros are defined and what switches are necessary to let Clang parse a source code file). Each alert contains a CERT coding rule or CWE ID, location, message and more - see doc/alert_data_format.md.
- SA tool alerts file: The Redemption tool produces outputs (for each SA alert from input, either a patch to repair the alert OR explanation in text why it cannot be repaired).
The command line tool takes the inputs listed above. The command-line tool's ear
module creates an enhanced abstract syntax tree (AST), its brain
module augments information about each alert and creates a more-enhanced AST and applies alerts and patches, and its glove
module produces repaired source code. Depending on arguments provided to the end-to-end script or superscript (discussed below), some or all of the steps are done and the way that is done can be specified per the arguments.
The base_dir
sets the base directory of the project being analyzed. When repairing a whole codebase, the base_dir
is the directory where the source code to be repaired is located. (When repairing a single file, specifying the base_dir
is not necessary.)
Our tool has three top-level Python scripts:
end_to_end_acr.py
: Runs on a single translation unit (a single.c
or.cpp
file, perhaps also repairing#include
-ed header files)sup.py
: Runs on all the C/C++ source files in a whole codebase (That is, it runs on a whole project, with all translation units specified in acompile_commands.json
file generated by thebear
OSS tool.)make_run_clang.py
: Runs on a single translation unit or all the files in a codebase. Produces a shell script containing Clang commands that Redemption would need to repair it. See the Codebases that cannot be built within the Redemption container section for more information.
This script runs automated code repair or parts of it, depending on the arguments specified.
Since the script and its arguments are subject to change, the best way to identify the current arguments and way to run it involves running the script with no arguments or with --help
as an argument. E.g., in a bash terminal, in the directory code/acr
, run:
./end_to_end_acr.py --help
The STEP_DIR
is a directory the tool uses to put intermediate files of the steps of the process, including output from brain
with enhanced AST and enhanced alerts, plus repairs that may be applied. This information is useful for understanding more about a particular repair, for instance if the repair result is not as you expected. The *.nulldom.json
intermediate files are stored there even if environment variable pytest_keep
is set to false
. Output of other individual modules (ear, brain, etc.) are stored in STEP_DIR
only if environment variable pytest_keep
is set to true
.
This script runs automated code repair or parts of it, depending on the arguments specified.
Since the script and its arguments are subject to change, the best way to identify the current arguments and way to run it involves running the script with no arguments or with --help
as an argument. E.g., in a bash terminal, in the directory code/acr
, run:
./sup.py --help
As specified above, the STEP_DIR
is a directory the tool uses to put intermediate files of the steps of the process, including output from brain
with enhanced AST and enhanced alerts, plus repairs that may be applied. This information is useful for understanding more about a particular repair, for instance if the repair result is not as you expected. The *.nulldom.json
intermediate files are stored there even if environment variable pytest_keep
is set to false
. Output of other individual modules (ear, brain, etc.) are stored in STEP_DIR
only if environment variable pytest_keep
is set to true
.
export acr_default_lang_std=foo # Adds "--std=foo" to the beginning of the arguments given to Clang.
export acr_emit_invocation=true # Show subprogram invocation information
export acr_gzip_ear_out=true # Compresses the AST (compresses the output of the ear module)
export acr_ignore_ast_id=true # Tests pass even if AST IDs are different
export acr_parser_cache=/host/code/acr/test/cache/ # Cache the output of the ear module, else set to ""
export acr_parser_cache_verbose=true # Print messages about the cache, for debugging/troubleshooting
export acr_show_progress=true # Show progress and timing
export acr_skip_dom=false # Skip dominator analysis
export acr_warn_unlocated_alerts=true # Warn when alerts cannot be located in AST
export pytest_keep=true # Keep output of individual modules (ear, brain, etc.). Regardless, the *.nulldom.json intermediate file is kept.
export pytest_no_catch=true # Break into debugger with "-m pdb" instead of catching exception
export REPAIR_MSC12=true # Repair MSC12-C alerts (By default, the system DOES do the repair. The system does not do this repair if this variable is set to `false`)
The Redemption Tool presumes that you have static-analysis (SA) tool output. It currently supports three SA tools: clang_tidy_oss
or cppcheck_oss
or rosecheckers_oss
. Each SA tool should produce a file with the alerts it generated. If $TOOL
represents your tool, instructions for generating the alerts file live in data/$TOOL/$TOOL.md
. We will assume you have run the tool, and created the alerts file, which we will call alerts.txt
. (The actual file need not be a text file). Finally, when you produced your SA output, the code you ran was in a directory which we'll call the $BASE_DIR
.
Next, you must convert the alerts.txt
format into a simple JSON format that the redemption tool understands. The alerts2input.py
file produces suitable JSON files. So you must run this script first; it will create the alerts.json
file with the alerts you will use.
python3 /host/code/analysis/alerts2input.py $BASE_DIR clang_tidy_oss alerts.txt alerts.json
For example, the test
Docker container contains the source code for Git, as well as Cppcheck output. So you can convert Cppcheck's output to alerts using this script:
python3 /host/code/analysis/alerts2input.py /oss/git cppcheck_oss /host/data/cppcheck/git/cppcheck.xml ./alerts.json
The alerts.json
file is a straightforward JSON file, and one can be created manually. This file consists of a list of alerts, each list element describes one alert. Each alert is a map with the following keys:
tool
: The static-analysis tool reporting the alertfile
: The source file to repair. May be a header (.h
) fileline
: Line number reported by toolcolumn
: (Optional) Column number reported by toolmessage
: String message reported by toolchecker
: The checker (that is, component of the SA tool) that reported the alertrule
: The CERT rule or CWE that the alert indicates (alert about a rule being violated or a weakness instance in the code)
The file, line, and rule fields are the only strictly required fields. However, the column and message fields are helpful if the file and line are not sufficient to identify precisely which code is being reported. Be warned that due to inconsistencies between the way different SA tools report alerts, our tool may misinterpret a manually-generated alert. The best way to ensure your alert is interpreted correctly is to fashion it to be as similar as possible to an alert generated by clang_tidy or cppcheck.
See the Alert Data Format document for more details about fields that may be used when describing alerts.
Next, you must indicate the compile commands that are used to build your project. The bear
command can be used to do this; it takes your build command and builds the project, recording the compile commands in a local compile_commands.json
file.
The following command, when run in the test
container, creates the compile_commands.json
file for git. (Note that this file already exists in the container, running this command would overwrite the file.)
cd /oss/git
bear -- make
To enable the Redemption container to run repairs on your local code directories, you should volume-share them with -v
when you launch the container. For example, to volume-share a local directory /code
:
docker run -it --rm -v /code:/myCode docker.cc.cert.org/redemption/distrib bash
See https://docs.docker.com/storage/volumes/ for more information on volume-sharing.
See https://docs.docker.com/reference/cli/docker/container/run/ for more information about options using the docker run
command.
Here is an example of how to run a built-in end-to-end automated code repair test, within the container (you can change the out
directory location or directory name, but you must create that directory before running the command):
pushd /host/code/acr
python3 ./end_to_end_acr.py /oss/git/config.c /host/data/compile_commands.git.json \
--alerts /host/data/test/sample.alerts.json --repaired-src test/out \
--base-dir /oss/git --repair-includes true
You can see the repairs made using this command:
diff -u /oss/git/hash.h /host/code/acr/test/out/hash.h
To test a single C file that needs no fancy compile commands, you can use the autogen
keyword instead of a compile_commands.json
file:
pushd /host/code/acr
python3 ./end_to_end_acr.py test/test_errors.c autogen \
--alerts test/test_errors.alerts.json \
--base-dir test --repaired-src test/out
You may need to share a volume, as discussed above.
Our tool requires that you have a single command to build the entire codebase. (There are some exceptions involving single files and autogen
, as mentioned above.) This command could be a call to clang
or some other compiler. It could be a build command like make
or ninja
. It could even be a shell script. To be precise, this can be any command that can be passed to bear
, a tool for generating a compilation database for clang
tooling. For more info on bear
, see: https://github.com/rizsotto/Bear ).
Run bear
on the single-command build system (e.g., run bear
on the makefile). Then, run the superscript code/acr/sup.py
, giving it the compile_commands.json
file created by bear
and specifying code, alerts, etc. as discussed in the above section on sup.py
.
Normally, Clang is invoked from within the Redemption tool. For codebases that cannot be built within the Redemption container, we also provide an alternative method: The Redemption tool can be split into three phases: The first phase is managed by the make_run_clang.py
program, which generates a shell script that uses Clang to generate syntactic data about the code. In the second phase, you then run this script on the platform where your code can be built. After running the script, you can run the ACR process in the final phase to use the files generated by Clang to repair the code.
This does presume that your platform that builds the code can run Clang with Redemption's patch. It can also run bear
and that you can produce static-analysis alerts for the code (independently of running Redemption).
See the Separate_Build Instructions for an example of repairing such a codebase.
This is a short example that demonstrates the command-line arguments used for running Clang separately: The call of make_run_clang.py
produces a shell script that runs Clang to generate the data necessary for Redemption to repair the source code. The call to end_to_end_acr.py
does the actual repair, using the data generated by Clang.
mkdir -p /host/code/acr/test/cache
cd /host/code/acr
./make_run_clang.py -c autogen -s test/macros_near_null_checks.c \
--output-clang-script test/cache/run_clang.sh
bash test/cache/run_clang.sh test/cache
./end_to_end_acr.py --repaired-src test/out test/macros_near_null_checks.c autogen \
--alerts test/macros_near_null_checks.alerts.json --raw-ast-dir test/cache
-
On the host (i.e., outside the Redemption docker container): Run bear to generate
compile_commands.json
. -
Mount the codebase in the Redemption container so that the base directory in the container is the same as it is in the host. (Making a symbolic link (
ln -s
) instead of directly mounting in the right place probably won't work, becauseos.path.realpath
resolves symlinks.) Then, inside the Redemption container:
mkdir -p /host/code/acr/test/cache
cd /host/code/acr
./make_run_clang.py -c autogen -s test/macros_near_null_checks.c \
--output-clang-script test/cache/run_clang.sh
- Copy
test/cache/run_clang.sh
to the host and run:
ast_out_dir=/tmp/ast_out
mkdir -p $ast_out_dir
bash run_clang.sh $ast_out_dir
cd $ast_out_dir
tar czf raw_ast_files.tar.gz *
- Copy
raw_ast_files.tar.gz
to the Redemption container and then:
cd /host/code/acr/test/cache
tar xf raw_ast_files.tar.gz
cd /host/code/acr
./end_to_end_acr.py --repaired-src test/out test/macros_near_null_checks.c autogen \
--alerts test/macros_near_null_checks.alerts.json --raw-ast-dir test/cache
Documentation detail about useful environment variables and further testing is in doc/regression_tests.md. Also, a lot more detail about our system that may be of interest to others interested in extending it or just understanding it better is in the doc directory.
When building a Docker container, if you get an error message such as:
error running container: from /usr/bin/crun creating container for [/bin/sh -c apt-get update]: sd-bus call: Transport endpoint is not connected: Transport endpoint is not connected : exit status 1
then try:
unset XDG_RUNTIME_DIR
unset DBUS_SESSION_BUS_ADDRESS
For more information, see containers/buildah#3887 (comment)
Contributions to the Redemption codebase are welcome! If you have code repairs or other contributions, we welcome that - you could submit a pull request via GitHub, or contact us if you'd prefer a different way.