Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] A developer- and user-friendly AFLGo #131

Merged
merged 10 commits into from
Sep 24, 2023

Conversation

SonicStark
Copy link

I did some research based on AFLGo in the past few weeks, and also met a bunch of headache problems because of AFLGo. So I made it a bit more friendly.

Lots of big changes here compared to the original...

  • Front-port AFLGo to the last release of AFL with brand-new clear structure:
    • The AFLGo Graph Extractor (GE) and the AFLGo Instrumentor locates in ./instrument/.
    • The AFLGo Distance Calculator (DC) locates in ./distance/.
    • The AFLGo Fuzzer locates in ./afl-2.57b/.
    • The fuzzing scripts all locates in ./examples/.
    • The building script of AFLGo is ./build.sh.
  • Wrapper implementation of AFLGo Fuzzer with macro AFLGO_IMPL, making it easier for developers to track and understand.
  • A brief README of AFLGo instrumentation methods.
  • AFLGo Distance Calculator (DC) is divided into 3 parts: scripts, backends and tools.
    • scripts: They are main entrances of DC and face users directly. gen_distance_orig.sh is the original genDistance.sh and gen_distance_fast.py is the pythonic version (faster). The two now behaves consistently-binaries-directory wouldn't be searched recursively.
    • backends: Consisted of merge_callgraphs.py, distance.py and distance.bin. distance.bin is the C++ port of distance.py (built from distance.bin.cc, libboost and cmake used). gen_distance_orig.py calls distance.py while gen_distance_fast.py can call either of them (use -p/--python-only option to call the pythonic one).
    • tools: Current only add_edges.py itself.
  • Update the outdated scripts. Force pip to install networkx<3.0, switch to an older workable commit for DARPA CGC targets, add set -euo pipefail into fuzzing scripts for easier and safer usage, and...
  • More organized Readme.md.

 - Merge *config.h*, *afl-tmin.c* and *afl-fuzz.c* and wrapper aflgo's implementation with macro `AFLGO_IMPL`
 - Modify *Makefile* to define `AFLGO_IMPL` as 1
 - Make instrumentation method standalone (./instrument)
 - Move all distance calculating related to ./distance/
 - Move original fuzzing scripts to ./examples/
 - Move *aflgo-build.sh* to project root
 - Issue LICENSE of AFL
 - Switch to aflgo-* workaround
 - AFLGO_TRACING now can be enabled by Makefile
 - Add test-instr.c for testing
 - New README.md based on README.llvm
Only 3 scripts need to be directly exposed to users:
 - *add_edges.py* (has been re-written to fit python3)
 - *gen_distance_orig.sh* (Original *genDistance.sh*)
 - *gen_distance_fast.py*

Others have been moved to *./distance_calculator*
 - Use more meaningful new name *distance.bin* for the C++ ported *distance.py*.
 - *distance.py* is undoubtedly a Distance Calculator, so it should also be placed here.
 - *merge_callgraphs.py* helps the distance calculators, so it's here, too.
New implementation had been tested with all fuzz scripts in *./examples* since the last commit.

Bugs found and fixed in the new implementation:

build.sh
 - `info` method has been [removed from networkx 3](https://networkx.org/documentation/stable/release/release_3.0.html). *add_edges.py* and *distance.py* give `AttributeError: module 'networkx' has no attribute 'info'`.
 - Incorrect work directory when compling aflgo.

Readme.md
 - Should also install older versions of networkx.
 - Should remind users for libxml2 in another level-1 title.
 - Should info users that the dependencies aren't installed by fuzzing scripts.

afl-2.57b/Makefile
 - Missing link of libm (for math.h)

instrument/aflgo-clang.c
 - Incorrect construction of obj_path

instrument/aflgo-runtime.c
 - Incorrect relative include path

instrument/Makefile
 - Missing "./"

distance/distance_calculator/CMakeLists.txt
 - Incorrect file extension

distance/gen_distance_orig.sh
 - Redundant `set -euo pipefail`. The script has its own error handling processes such as `|| FAIL=1`.
 - Inconsistent behavior with *gen_distance_fast.py* because of `find`. The python script uses `Path.glob`, which won't search recursively without "**/". However GNU find searches the whole directory tree. So use `-maxdepth` to constrain it.

examples
 - All scripts should fit new *gen_distance_orig.sh* (consistent with *gen_distance_fast.py*) and stop abusing $SUBJECT everywhere :(
 - *KTY_Pretty_Printer.sh*, *LMS.sh* and *Palindrome.sh*: trailofbits/cb-multios#86 force it to use clang in cmake/32.cmake and cmake/64.cmake, making $CC and $CXX useless.
 - *libming-CVE-2018-8962.sh* and *libming-CVE-2018-8962.sh*: Without "-fcommon" the linker called by clang will fail. See squaresLab/security-repair-benchmarks#19
 - *libxml2-ef709ce2.sh*: Should undo `set -e` because `cp $SUBJECT/test/dtd* in` needs to ignore directory *$SUBJECT/test/dtds* as normal.
@SonicStark SonicStark changed the title A developer- and user-friendly AFLGo [RFC] A developer- and user-friendly AFLGo Jul 19, 2023
@mboehme
Copy link
Collaborator

mboehme commented Aug 1, 2023

Thanks @SonicStark! Is this mostly a refactoring (moving files around, upgrading to recent AFL, but no changes to the source files), or did you also change the source files? If the latter, can you split the PR into refactoring and changes (in whichever order you prefer)?

Copy link
Collaborator

@mboehme mboehme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through all 8 commits and the changes look great! It would have been nice to see the AFL upgrade as a sync with upstream google/AFL to be able to review the other changes independently and otherwise trust the sync, but no worries. Thanks also for the instrument/Readme.md!

@strongcourage: Could you help running the new version, testing the examples, and testing the docker container?

distance/gen_distance_orig.sh Outdated Show resolved Hide resolved
examples/libming-CVE-2018-8962.sh Show resolved Hide resolved
examples/libxml2-ef709ce2.sh Show resolved Hide resolved

If your compilation crashes in this step, have a look at Issue [#4](https://github.com/aflgo/aflgo/issues/4#issuecomment-333947041).
3) Compile AFLGo fuzzer, LLVM-instrumentation pass and the distance calculator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to set the environment variable. Copying from above:

# Checkout source code
git clone https://github.com/aflgo/aflgo.git
export AFLGO=$PWD/aflgo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 7a204ca.

Readme.md Outdated Show resolved Hide resolved
@SonicStark
Copy link
Author

SonicStark commented Aug 12, 2023

Is this mostly a refactoring (moving files around, upgrading to recent AFL, but no changes to the source files), or did you also change the source files?

All 8 commits actually did "a refactoring with many source files changed" :-)

If the latter, can you split the PR into refactoring and changes (in whichever order you prefer)?

Is it necessary for tracking and reviewing those commits? Currently they are organized by major functional changes with detailed commit msgs. If each commit does either changing the source files or not, it would be logically difficult to understand and more confusing, BME.

running the new version, testing the examples, and testing the docker container

I have run the 14 scripts in ./examples/ and they all seemed to be working well.

Looking forward to good news in further testing 👀

Readme.md
 - Solve aflgo#131 (comment).
 - Solve aflgo#131 (comment).
 - Warn user about side effects of installing LLVM 11.0.0.
 - Update link of LLVM 11.0.0.
 - Remove link of checkout_build_install_llvm.sh since we already have build.sh.
 - Remove compilation of `./afl-2.57b/llvm_mode`. It's useless. AFLGo has its own version in `./instrument`.

build.sh:
 - Download and build LLVM in `./instrument/llvm_tools`. Original `~/build` may corrupt user's data unexpectedly.
 - Not compile useless `./afl-2.57b/llvm_mode`.

.gitignore
 - Ignore llvm_tools generated by build.sh.
@mboehme
Copy link
Collaborator

mboehme commented Aug 31, 2023

Thanks @SonicStark!! LGTM. (Sorry the merge of PR #132 introduced a merge conflict for this PR).

@strongcourage: Can you take this version for a quick spin, incl. your docker setup? Let me know if you don't have the bandwidth.

@strongcourage
Copy link
Collaborator

@mboehme: ok I will test it this weekend

build.sh Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @SonicStark,

Which ubuntu version did you use to build aflgo? I tried to run your script build.sh on the vanilla ubuntu 18.04, but got some issues.

cmake -G "Ninja" \
      -DLIBCXX_ENABLE_SHARED=OFF -DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
      -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" \
      -DLLVM_BINUTILS_INCDIR=/usr/include ~/build/llvm_tools/llvm-11.0.0.src/tools/clang/clang-11.0.0.src/

I was not able to compile successfully the llvm yet :D. Could you please recheck again the building script? Thank you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strongcourage build.sh has been updated in af4af16. I also tested it in docker images with tag 20.04, 18.04 and 16.04 pulled from dockerhub.

FYI:

  • Installing python3-distutils according to UBUNTU_VERSION is unexplained. So I replace it with python3, which is enough for building LLVM and running distance_calculator.
  • Compiling build-llvm/msan and missing clang are also unexplained, and have been replaced with normal stuffs.
  • Finding clang/clang++ by which is more reasonable, since I found that ninja install makes them located in /usr/local/bin compared to an APT installation.
  • It seems that minimum cmake version for the C++ port of distance.py could be as low as 3.4.3, same as LLVM 11.0.0.

If build.sh runs successfully, you will see

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for python3-distutils is here: #85 (comment)

But I still wonder that why a MSan-enabled libcxx in /usr/msan/ is necessary, since the mess of python3-distutils was caused by libcxx compilation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SonicStark: Thank you for your updates and great effort. I built and fuzzed the target libxml2 using the current version 2.57b and it worked well. Merged already! Thanks again.

Screenshot 2023-09-25 at 10 28 07

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SonicStark: If you have a working Docker file that you'd like to share with us, I'll add it to the README for others. Many thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strongcourage This looks simple but actually works for me:

FROM ubuntu:20.04

RUN apt-get update && \
    apt-get install -y git && \
    git clone https://github.com/aflgo/aflgo.git

RUN /aflgo/build.sh

ENV AFLGO /aflgo

And I think there's no need to expand the content of build.sh in the dockerfile, since more inflated and fragile layers would be introduced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SonicStark I completely agree, given the building and fuzzing scripts, the docker file should be as simple as yours. Thanks.

build.sh
 - Handle missing *sudo* for *apt-get*. Docker images often use root as default, but general linux distributions doesn't.
 - Jump over interactions when running *apt-get*.
 - Re-arrange prerequisites for LLVM. Python interpreter seems necessary for compiling LLVM even with test suite disabled, although the [documentation](https://releases.llvm.org/11.0.0/docs/GettingStarted.html#software) says opposite.
 - Re-write building clang toolchain:
    - To get a normal building, *llvm*, *clang* and *compiler-rt* are enough for LLVM source tree.
    - *libcxx* & *libcxxabi* with *msan* seems stray, so removed out.
 - Remove installing *python-bs4*. So weird it was!
 - Install *gawk* and *pkg-config*. They are frequently-used.
 - Do not upgrade *pip* after installed from APT, because latest pip drops support for some old python versions, such as pypa/get-pip#83.
 - Handle different versions of NetworkX:
    - aflgo#132 removed `nx.info()` due to deprecation, so *networkx<3.0* isn't necessary.
    - NetworkX doesn't always support all Python 3.x, which may cause compatibility issue with python installed from APT.
 - Find clang binaries by *which*, since sometimes they are installed into */usr/local/bin*.
 - Prompt user that the build is done.

Readme.md
 - Keep consistent with *build.sh*.
 - Tell readers about networkx version selection.

distance/distance_calculator/CMakeLists.txt
Use 3.4.3 instead of 3.10 for `cmake_minimum_required` because
 - Ver 3.10 is difficult to be satisfied for cmake installation from APT on old linux distributions.
 - Ver 3.4.3 is the minimum required for LLVM 11 (see https://releases.llvm.org/11.0.0/docs/CMake.html).
 - It still works well, tested on Ubuntu 16.04 LTS, 18.04 LTS and 20.04 LTS.
@strongcourage strongcourage merged commit 20370f5 into aflgo:master Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants