New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Build fuzz targets into seperate executables #15043
Conversation
9e02074
to
90986ff
Compare
Concept ACK Thanks for doing this! |
53ea64f
to
0293c5d
Compare
Would be nice if someone could look at the build system changes, since the code is mostly just moved around. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
cb0b9d0
to
f0d7531
Compare
Concept ACK I think this should be behind a |
As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you're proposing to change. I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer's ability to efficiently find paths. My reasoning is as follows:
Thanks, |
Hey @cjdelisle, I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target. If we have two inputs, input A meant for Also I think seeding the fuzzer with "bad" corpus inputs in general isn't a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag. Anyways I'm pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf's blog, and the discussion on #11045 |
@cjdelisle See @kcc:s comment in #11045 (comment) :-) |
Thanks @Crypt-iQ and @practicalswift , for those who weren't following carefully, my understanding is that it is the recommendation of Google's OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:
So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it's always better to do what works. |
I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included? Maybe this can be done in another PR. See @kcc comment: #11045 (comment) OSS Fuzz recommendations: |
The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here. |
@laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing. |
Ah, I see. Done and removed 400 lines of boilerplate and headers. |
a233df0
to
d8a68c4
Compare
The return value is always 0 and not used, so might as well return void
b3f2679
to
0260683
Compare
Now split into two commits, where the top commit is some move-only:
|
@@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests], | |||
[use_extended_functional_tests=$enableval], | |||
[use_extended_functional_tests=no]) | |||
|
|||
AC_ARG_ENABLE([fuzz], | |||
AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding perhaps but --enable-fuzzing
feels more natural to me than --enable-fuzz
.
0260683
to
2ca632e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2ca632e
test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1 | ||
test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) | ||
test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS) | ||
test_fuzz_block_deserialize_LDADD = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could deduplicate some boilerplate by defining common variables like:
fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)
and then individual targets could be shortened to:
test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
...
test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
...
At least this is what I did in 2060a30 for #10102. One catch is that LDFLAGS, LDADD, etc suffix can't be capitalized or the variables will be treated specially by automake.
switch(test_id) { | ||
case CBLOCK_DESERIALIZE: | ||
{ | ||
#if BLOCK_DESERIALIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don't seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my initial solution, but I changed it back to minimize the diff
2ca632e test: Build fuzz targets into seperate executables (MarcoFalke) fab4bed [test] fuzz: make test_one_input return void (MarcoFalke) Pull request description: Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target... Solve that by building each fuzz target into their own executable. Tree-SHA512: a874febc85a3c5e6729199542b65cad10640553fba6f663600c827fe144543744dd0f844fb62b4c95c6a04c670bfce32cdff3d5f26de2dfc25f10b258eda18ab
d059544 [Build] fuzz target, change LIBBITCOIN_ZEROCOIN link order. (furszy) 2396e6b [fuzz] Add ContextualCheckTransaction call to transaction target. (furszy) f0887a0 Fuzzing documentation "PIVX-fication" (furszy) 9631f46 [doc] add sanitizers documentation in developer-notes.md (furszy) 70a0ace tests: Test serialisation as part of deserialisation fuzzing. Test round-trip equality where possible. Avoid code repetition. (practicalswift) e1b92b6 ignore new fuzz targets gitignore (furszy) d058d8c tests: Add deserialization fuzzing harnesses (furszy) e1f666c tests: Remove TRANSACTION_DESERIALIZE (replaced by transaction fuzzer) (practicalswift) b5f291c tests: Add fuzzing harness for CheckTransaction(...), IsStandardTx(...) and other CTransaction related functions (furszy) 3205871 fuzz: Remove option --export_coverage from test_runner (MarcoFalke) 52693ee fuzz: Add option to merge input dir to test runner (MarcoFalke) 2b4f8aa doc: Remove --disable-ccache from docs (MarcoFalke) b54b1d6 tests: Improve test runner output in case of target errors (practicalswift) cd6134f test: Log output even if fuzzer failed (MarcoFalke) 48cd0c8 doc: Improve fuzzing docs for macOS users (Fabian Jahr) d642b67 [Build] Do not disable wallet when fuzz is enabled. (furszy) c3447b5 Update doc and CI config (qmma) 1266d3e Disable other targets when enable-fuzz is set (qmma) f28ac9a build: Allow to configure --with-sanitizers=fuzzer (MarcoFalke) 425742c fuzz: test_runner: Better error message when built with afl (MarcoFalke) 541f442 qa: Add test/fuzz/test_runner.py (MarcoFalke) 89fe5b2 Add missing LIBBITCOIN_ZMQ to test target (furszy) 58dbe79 add fuzzing binaries to gitignore. (furszy) 393a126 fuzz: Move deserialize tests to test/fuzz/deserialize.cpp (MarcoFalke) a568df5 test: Build fuzz targets into separate executables (furszy) d5dddde [test] fuzz: make test_one_input return void (MarcoFalke) 2e4ec58 [fuzzing] initialize chain params by default. (furszy) 08d8ebe [tests] Add libFuzzer support. (practicalswift) 84f72da [test] Speed up fuzzing by ~200x when using afl-fuzz (practicalswift) faf2be6 Init ECC context for test_bitcoin_fuzzy. (Gregory Maxwell) 11150df Make fuzzer actually test CTxOutCompressor (Pieter Wuille) d6f6a85 doc: Add bare-bones documentation for fuzzing (Wladimir J. van der Laan) 5c3b550 Simple fuzzing framework (pstratem) Pull request description: As the title says, adding fuzzing framework support so we can start getting serious on this area as well. Adapted the following PRs: * bitcoin#9172. * bitcoin#9354. * bitcoin#9691. * bitcoin#10415. * bitcoin#10440. * bitcoin#15043. * bitcoin#15047. * bitcoin#15295. * bitcoin#15399 (fabcfa5 only). * bitcoin#16338. * bitcoin#17051. * bitcoin#17076. * bitcoin#17225. * bitcoin#17942. * bitcoin#16236 (only fa35c42). * bitcoin#18166 (only f2472f6). * bitcoin#18300. * And.. probably will go further and continue adapting more PRs.. ACKs for top commit: random-zebra: utACK d059544 and merging... Tree-SHA512: c0b05bca47bf99bafd8abf1453c5636fe05df75f16d0e9c750368ea2aed8142f0b28d28af1d23468b8829188412a80fd3b7bdbbda294b940d78aec80c1c7d03a
Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...
Solve that by building each fuzz target into their own executable.