Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Build fuzz targets into seperate executables #15043

Merged
merged 2 commits into from Jan 30, 2019

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Dec 27, 2018

Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

Solve that by building each fuzz target into their own executable.

@practicalswift
Copy link
Contributor

Concept ACK

Thanks for doing this!

@maflcko maflcko force-pushed the Mf1812-buildFuzzTargets branch 2 times, most recently from 53ea64f to 0293c5d Compare December 27, 2018 18:19
@maflcko
Copy link
Member Author

maflcko commented Dec 28, 2018

Would be nice if someone could look at the build system changes, since the code is mostly just moved around.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 28, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
  • #13989 (add avx512 instrinsic by fingera)
  • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Jan 2, 2019

Concept ACK

I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.

@cjdelisle
Copy link

As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you're proposing to change.

I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer's ability to efficiently find paths.

My reasoning is as follows:

  1. Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
  2. Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
  3. It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
  4. If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they're touching, solving the concern about organization.

Thanks,
Caleb

@Crypt-iQ
Copy link
Contributor

Crypt-iQ commented Jan 2, 2019

Hey @cjdelisle,

I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

Also I think seeding the fuzzer with "bad" corpus inputs in general isn't a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

Anyways I'm pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf's blog, and the discussion on #11045

@practicalswift
Copy link
Contributor

@cjdelisle See @kcc:s comment in #11045 (comment) :-)

@cjdelisle
Copy link

Thanks @Crypt-iQ and @practicalswift , for those who weren't following carefully, my understanding is that it is the recommendation of Google's OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:

  • Some of the fuzzers have O(executable_size) performance
  • Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory

So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it's always better to do what works.

@Crypt-iQ
Copy link
Contributor

Crypt-iQ commented Jan 3, 2019

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included? Maybe this can be done in another PR.

See @kcc comment: #11045 (comment)

OSS Fuzz recommendations:
https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

doc/fuzzing.md Outdated Show resolved Hide resolved
@laanwj
Copy link
Member

laanwj commented Jan 8, 2019

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included?

The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.

@Crypt-iQ
Copy link
Contributor

Crypt-iQ commented Jan 8, 2019

@laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.

@maflcko
Copy link
Member Author

maflcko commented Jan 25, 2019

Ah, I see. Done and removed 400 lines of boilerplate and headers.

The return value is always 0 and not used, so might as well return void
@maflcko maflcko force-pushed the Mf1812-buildFuzzTargets branch 2 times, most recently from b3f2679 to 0260683 Compare January 26, 2019 00:08
@maflcko
Copy link
Member Author

maflcko commented Jan 26, 2019

Now split into two commits, where the top commit is some move-only:

git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test

@@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
[use_extended_functional_tests=$enableval],
[use_extended_functional_tests=no])

AC_ARG_ENABLE([fuzz],
AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.

@bitcoin bitcoin deleted a comment from DrahtBot Jan 30, 2019
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 2ca632e

test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
test_fuzz_block_deserialize_LDADD = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could deduplicate some boilerplate by defining common variables like:

fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)

and then individual targets could be shortened to:

test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
...

test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
...

At least this is what I did in 2060a30 for #10102. One catch is that LDFLAGS, LDADD, etc suffix can't be capitalized or the variables will be treated specially by automake.

switch(test_id) {
case CBLOCK_DESERIALIZE:
{
#if BLOCK_DESERIALIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don't seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my initial solution, but I changed it back to minimize the diff

@laanwj laanwj merged commit 2ca632e into bitcoin:master Jan 30, 2019
laanwj added a commit that referenced this pull request Jan 30, 2019
2ca632e test: Build fuzz targets into seperate executables (MarcoFalke)
fab4bed [test] fuzz: make test_one_input return void (MarcoFalke)

Pull request description:

  Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

  Solve that by building each fuzz target into their own executable.

Tree-SHA512: a874febc85a3c5e6729199542b65cad10640553fba6f663600c827fe144543744dd0f844fb62b4c95c6a04c670bfce32cdff3d5f26de2dfc25f10b258eda18ab
@maflcko maflcko deleted the Mf1812-buildFuzzTargets branch January 30, 2019 20:22
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request May 28, 2021
d059544 [Build] fuzz target, change LIBBITCOIN_ZEROCOIN link order. (furszy)
2396e6b [fuzz] Add ContextualCheckTransaction call to transaction target. (furszy)
f0887a0 Fuzzing documentation "PIVX-fication" (furszy)
9631f46 [doc] add sanitizers documentation in developer-notes.md (furszy)
70a0ace tests: Test serialisation as part of deserialisation fuzzing. Test round-trip equality where possible. Avoid code repetition. (practicalswift)
e1b92b6 ignore new fuzz targets gitignore (furszy)
d058d8c tests: Add deserialization fuzzing harnesses (furszy)
e1f666c tests: Remove TRANSACTION_DESERIALIZE (replaced by transaction fuzzer) (practicalswift)
b5f291c tests: Add fuzzing harness for CheckTransaction(...), IsStandardTx(...) and other CTransaction related functions (furszy)
3205871 fuzz: Remove option --export_coverage from test_runner (MarcoFalke)
52693ee fuzz: Add option to merge input dir to test runner (MarcoFalke)
2b4f8aa doc: Remove --disable-ccache from docs (MarcoFalke)
b54b1d6 tests: Improve test runner output in case of target errors (practicalswift)
cd6134f test: Log output even if fuzzer failed (MarcoFalke)
48cd0c8 doc: Improve fuzzing docs for macOS users (Fabian Jahr)
d642b67 [Build] Do not disable wallet when fuzz is enabled. (furszy)
c3447b5 Update doc and CI config (qmma)
1266d3e Disable other targets when enable-fuzz is set (qmma)
f28ac9a build: Allow to configure --with-sanitizers=fuzzer (MarcoFalke)
425742c fuzz: test_runner: Better error message when built with afl (MarcoFalke)
541f442 qa: Add test/fuzz/test_runner.py (MarcoFalke)
89fe5b2 Add missing LIBBITCOIN_ZMQ to test target (furszy)
58dbe79 add fuzzing binaries to gitignore. (furszy)
393a126 fuzz: Move deserialize tests to test/fuzz/deserialize.cpp (MarcoFalke)
a568df5 test: Build fuzz targets into separate executables (furszy)
d5dddde [test] fuzz: make test_one_input return void (MarcoFalke)
2e4ec58 [fuzzing] initialize chain params by default. (furszy)
08d8ebe [tests] Add libFuzzer support. (practicalswift)
84f72da [test] Speed up fuzzing by ~200x when using afl-fuzz (practicalswift)
faf2be6 Init ECC context for test_bitcoin_fuzzy. (Gregory Maxwell)
11150df Make fuzzer actually test CTxOutCompressor (Pieter Wuille)
d6f6a85 doc: Add bare-bones documentation for fuzzing (Wladimir J. van der Laan)
5c3b550 Simple fuzzing framework (pstratem)

Pull request description:

  As the title says, adding fuzzing framework support so we can start getting serious on this area as well.

  Adapted the following PRs:

  * bitcoin#9172.
  * bitcoin#9354.
  * bitcoin#9691.
  * bitcoin#10415.
  * bitcoin#10440.
  * bitcoin#15043.
  * bitcoin#15047.
  * bitcoin#15295.
  * bitcoin#15399 (fabcfa5 only).
  * bitcoin#16338.
  * bitcoin#17051.
  * bitcoin#17076.
  * bitcoin#17225.
  * bitcoin#17942.
  * bitcoin#16236 (only fa35c42).
  * bitcoin#18166 (only f2472f6).
  * bitcoin#18300.
  * And.. probably will go further and continue adapting more PRs..

ACKs for top commit:
  random-zebra:
    utACK d059544 and merging...

Tree-SHA512: c0b05bca47bf99bafd8abf1453c5636fe05df75f16d0e9c750368ea2aed8142f0b28d28af1d23468b8829188412a80fd3b7bdbbda294b940d78aec80c1c7d03a
kwvg added a commit to kwvg/dash that referenced this pull request Aug 2, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Aug 5, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Aug 5, 2021
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Aug 6, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Aug 8, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Aug 11, 2021
5tefan pushed a commit to 5tefan/dash that referenced this pull request Aug 12, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants