Reads-From Fuzzer (RFF) is a tool for concurrency testing. See our paper from ASPLOS 2024 for details!
If you use our work for academic research, please cite our paper:
@inproceedings{10.1145/3620665.3640389,
author = {Wolff, Dylan and Shi, Zheng and Duck, Gregory J. and Mathur, Umang and Roychoudhury, Abhik},
title = {Greybox Fuzzing for Concurrency Testing},
year = {2024},
isbn = {9798400703850},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3620665.3640389},
doi = {10.1145/3620665.3640389},
booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
pages = {482–498},
numpages = {17},
location = {<conf-loc>, <city>La Jolla</city>, <state>CA</state>, <country>USA</country>, </conf-loc>},
series = {ASPLOS '24}
}
Please note that all scripts should be run from the base directory of the repo, as they may contain relative paths
Docker and Python3, e.g. on Ubuntu:
sudo snap install docker
sudo groupadd docker
sudo usermod -aG docker $USER
sudo apt install -y python3
python3 -m venv .venv
pip install -r requirements.txt
source .venv/bin/activate
sudo ./afl-complain.sh
The AFL system configuration steps may not work on WSL.
If this is the case, you can add the AFL_SKIP_CPUFREQ=1
into all docker containers (e.g. using -e
with docker run
).
GNU parallel
sudo apt install -y parallel
A recent Ubuntu version is recommended for a host system to ensure compatibility with various commands and convenience scripts to run experiments.
The RFF tool itself is completely set up in Docker.
To run outside of Docker, see the dockerfiles/Dockerfile.base
for the general system setup and dependencies.
Use docker to build benchmarks and real-world programs to run schedfuzz!
Running ./build_ck
builds all PERIOD benchmarks (SCTBench and ConVul) for RFF (may take up to 30 mins).
To build and run a fuzzing container individually:
docker build -f dockerfiles/Dockerfile.base -t schedfuzz-base .
docker build -f dockerfiles/Dockerfile.sum -t sum-fuzzer .
mkdir out
sudo ./afl-complain.sh
docker run -t -v $(pwd)/out:/opt/out sum-fuzzer
There is also a run_one.sh
convenience script to run a container with a volume mounted in and forward some environment variables for the PERIOD benchmarks.
To re-run schedules from the fuzzing run, or do other more interactive exploration, pop a shell in the container
(use docker exec
and the container name instead of docker run
and the image if container is already running).
docker run -it -v $(pwd)/out:/opt/out sum-fuzzer /bin/bash
Another convenience script ./run_dev.sh
will start a container with a mounted volume and pop an interactive shell in it.
Once you are inside the container, you can manually instrument and fuzz programs.
- To instrument a program, use
./instrument.sh <path to compiled prog>
in the container. - To fuzz the instrumented program manually set any necessary environment variables and use:
./fuzz.sh -i afl-in -o afl-out -d -- <instrumented program> <program arguments>
See the various dockerfiles in ./scalability
for examples
To run the scheduler without instrumentation:
LD_PRELOAD=$(pwd)/libsched.so <path to compiled prog.> <program arguments>
By passing the SCHEDULE=<path to schedule>
environment variable, you can replay a schedule generated by RFF.
sched-fuzz
-- contains the latest version of the RFF tool
sched.c
-- the binary instrumentation hooklibsched.cpp
-- the dynamic library that wraps pthread to do schedule serialization and control; does most of the heavy liftingAFL-2.57b
-- our modified version of AFL which drives the "schedule fuzzing" (fuzzing loop and mutations)
alternative-tools
-- the PCT and QLearning implementations are on different branches of the main repo, but are included in this directory for convenience of the artifact evaluation
Before running the PERIOD benchmarks, make sure to build the docker container, set up your system for AFL, and disable ASLR:
./build_ck.sh
sudo ./afl-complain.sh
For these benchmarks, you can use the environment variables FUZZERS
and TARGET_KEY
to select the fuzzers and target programs within that benchmark.
The FUZZERS
should be a comma separated list (no spaces).
These environment variables can be used in conjunction with the run_one.sh
script. e.g.:
FUZZERS=pos-only-schedfuzz,schedfuzz TARGET_KEY=CS/reorder_5 ./run_one.sh period
Other parameters, such as the AFL_TIMEOUT
and TIME_BUDGET
can be set in the container directly using the -e
option for docker run
(not using the run_one.sh
script).
Before conducting the experiments, build the images for RFF, PCT and QLearning by running ./build_all.sh
.
Also make sure to clean up any stray volumes created by prior runs or experimentation.
To start the experiment, run the ./e1.sh
script from the base directory of the project.
By default this will run at the parallelism of your system, but the bash script can be edited to restrict the number of cores used.
Note that this script does require sudo
to change ownership of the files output because the docker users is set to root.
The experiment will generate many docker volumes containing result.json
files with the run statistics.
These are then aggregated by scripts/period/parse-agg.py
into a new full_data.csv
file on your host system with the raw experimental results.
The scripts/data-analysis/analyze.py
script will then post-process this data, printing a Latex table to stdout and outputing a PNG file in assets/cum-scheds-to-bug.png
.
This experiment will take (54 programs / N cpus) * 5 minutes * 20 trials * 4 tools
.
Again, make sure to clean up any stray volumes before running, and make sure the base RFF image is built (./build_ck.sh
).
Run the ./e2.sh
script from the root directory of the project.
This will copy the raw read-from pair and read-from sequence hash data to the scripts/data-analysis/freq-data
directory.
Note that this script also needs sudo
to change ownership of output files from docker.
It will then run scripts/data-analysis/bar-freq.py
to process the data generate plots in assets/bar.png
.
This experiment should take only a few minutes (<15).
The behavior of the fuzzer can be changed by environment variables.
There are also convenient "abbreviations" for the PERIOD benchmarks than can be appended to the fuzzer name in the FUZZERS
environment variable (e.g. in run_one.sh
).
Some are below, see scripts/period/one.py
for a more exhaustive list.
| abbreviation | environment var + value | desc |
|-----------------------|-------------------------|----------------------------------------------------------------------|
| no-afl-cov | NO_AFL_COV=1 | No control flow feedback |
| no-pos | NO_POS=1 | No partial order sampling |
| pos-only | POS_ONLY=1 | Only partial order sampling |
| depth-3 | MAX_DEPTH=3 | Only RF schedules of length 3 or less |
| always-rand | ALWAYS_RAND=1 | Always change random seed on schedule mutation |
| max-sp | SCORE_PATTERN=max | Race predictor scoring |
| avg-sp | SCORE_PATTERN=avg | Race predictor scoring |
| thread-affinity-200 | THREAD_AFFINITY=200 | bias somewhat towards not switching threads |
| thread-affinity--200 | THREAD_AFFINITY=-200 | bias somewhat towards switching threads |
| thread-affinity-500 | THREAD_AFFINITY=500 | bias more towards not switching threads |
| thread-affinity--500 | THREAD_AFFINITY=-500 | bias more towards switching threads |
| thread-affinity-800 | THREAD_AFFINITY=800 | bias heavily towards not switching threads |
| thread-affinity--800 | THREAD_AFFINITY=-800 | bias heavily towards switching threads |
| max-multi-mutations-2 | MAX_MULTI_MUTATIONS=2 | allow insertion, deletion etc. of two RF's at once |
| max-multi-mutations-3 | MAX_MULTI_MUTATIONS=3 | allow insertion, deletion etc. of three RF's at once |
| all-pairs | ALL_PAIRS=1 | don't take pairs to flip in schedule from race predictor |
| all-rff | ALL_RFF=1 | get RF feedback from all observer RF's (not just those in schedule) |
| power-coe | POWER_COE=1 | give extra weight to rare RF's observed |
| stage-max-128 | SCHED_STAGE_MAX=128 | increase number of schedules explored in each "stage" of fuzzing |
| | JSON_SCHEDULE=1 | use a human readable JSON file to record each schedule (slow!) |
Some common configurations:
power-coe-always-rand-schedfuzz
-- i.e. our approach
pos-only-schedfuzz
-- partial order sampling
To run on some load/store heavy real world programs, naive binary instrumentation of all memory operations can be very expensive.
In most cases, only a very small subset of these operations are accessing memory that is shared across multiple threads.
To filter out the unenessaryh instrumentation, there is sched-fuzz/selective-instrument.sh
script which takes in a subset of instruction offsets in the binary to instrument.
This subset can be obtained from analyzing a full trace of the program, which can take an extremely long time to gather for large, load/store heavy programs (many hours).
Note that this step is not optimized and not used for any of the experiments in the paper, as the programs provided in the benchmarks are not as load/store heavy.
For the SQLite
program in the scalability
directory, built with python ./build_scale.py
, this has already been done.
Alternatively, you can skip the instrumentation step altogether, as in the x264
example in the scalability
directory.
RFF will still be able to serialize and test the program, just not at the granularity of individual memory operations (preemptions will only occur at pthread functions).
The binary instrumentation is not guaranteed to succeed, so usually about 97-99% of loads and stores are truly instrumented.
Right now malloc / free and other dynamic calls are not used as scheduling points, but this can be easily remedied.
See scripts/data-analysis/full-data.csv
for data from our tool / framework from E1.
See scripts/data-analysis/freq-data/*.csv
for data from our tool / framework from E2.
Additional data and analysis scripts for PERIOD etc. are all in scripts/data-analysis
.
e.g.
rm: cannot remove 'SafeStack.afl': No such file or directory
rm: cannot remove 'afl-in': No such file or directory