Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Build fuzz targets into seperate executables #15043

Merged
merged 2 commits into from Jan 30, 2019

Conversation

Projects
None yet
9 participants
@MarcoFalke
Copy link
Member

commented Dec 27, 2018

Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

Solve that by building each fuzz target into their own executable.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 2 times, most recently Dec 27, 2018

@practicalswift

This comment has been minimized.

Copy link
Member

commented Dec 27, 2018

Concept ACK

Thanks for doing this!

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 2 times, most recently Dec 27, 2018

Show resolved Hide resolved src/test/test_bitcoin_fuzzy-blockheader_deserialize.cpp Outdated
Show resolved Hide resolved src/test/test_bitcoin_fuzzy-transaction_deserialize.cpp Outdated
Show resolved Hide resolved src/test/test_bitcoin_fuzzy.hpp Outdated
@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Dec 28, 2018

Would be nice if someone could look at the build system changes, since the code is mostly just moved around.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
  • #13989 (add avx512 instrinsic by fingera)
  • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 8 times, most recently Dec 31, 2018

@laanwj

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

Concept ACK

I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.

@cjdelisle

This comment has been minimized.

Copy link

commented Jan 2, 2019

As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you're proposing to change.

I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer's ability to efficiently find paths.

My reasoning is as follows:

  1. Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
  2. Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
  3. It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
  4. If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they're touching, solving the concern about organization.

Thanks,
Caleb

@Crypt-iQ

This comment has been minimized.

Copy link

commented Jan 2, 2019

Hey @cjdelisle,

I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

Also I think seeding the fuzzer with "bad" corpus inputs in general isn't a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

Anyways I'm pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf's blog, and the discussion on #11045

@practicalswift

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

@cjdelisle See @kcc:s comment in #11045 (comment) :-)

@cjdelisle

This comment has been minimized.

Copy link

commented Jan 3, 2019

Thanks @Crypt-iQ and @practicalswift , for those who weren't following carefully, my understanding is that it is the recommendation of Google's OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:

  • Some of the fuzzers have O(executable_size) performance
  • Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory

So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it's always better to do what works.

@Crypt-iQ

This comment has been minimized.

Copy link

commented Jan 3, 2019

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included? Maybe this can be done in another PR.

See @kcc comment: #11045 (comment)

OSS Fuzz recommendations:
https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

Show resolved Hide resolved doc/fuzzing.md Outdated
@laanwj

This comment has been minimized.

Copy link
Member

commented Jan 8, 2019

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included?

The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.

@Crypt-iQ

This comment has been minimized.

Copy link

commented Jan 8, 2019

@laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 4 times, most recently to fae86e6 Jan 15, 2019

@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Jan 16, 2019

@laanwj The previous corpus can still be used by stripping the first few (4?) bytes off of the seeds. I will do that (and move them to a separate repo) after this pull has been merged.

@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

This hasn't received any review, but I am going to merge it tomorrow unless someone objects to that.

@sipa

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

Wouldn't it be easier to accomplish this by having a since source file for all fuzz target, but with a macro define (-D... argument) to select which fuzzer is invoked by main?

@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2019

@sipa Wouldn't that require to pass in and compile for each fuzz target? That seems to move the verbosity to the user/ci script/fuzz runner.

Also having a single source file makes it harder to see what each single fuzz test does (and depends on).

@sipa

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

@MarcoFalke The build system would automatically build all of them, supplying the necessary -D arguments for each binary.

This would avoid all the boilerplate in all those source files.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 2 times, most recently from 8ddbfe3 to a233df0 Jan 25, 2019

@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2019

Ah, I see. Done and removed 400 lines of boilerplate and headers.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch from a233df0 to d8a68c4 Jan 25, 2019

[test] fuzz: make test_one_input return void
The return value is always 0 and not used, so might as well return void

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch 2 times, most recently from b3f2679 to 0260683 Jan 26, 2019

@MarcoFalke

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

Now split into two commits, where the top commit is some move-only:

git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test
@@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
[use_extended_functional_tests=$enableval],
[use_extended_functional_tests=no])

AC_ARG_ENABLE([fuzz],
AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),

This comment has been minimized.

Copy link
@practicalswift

practicalswift Jan 28, 2019

Member

Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.

@MarcoFalke MarcoFalke force-pushed the MarcoFalke:Mf1812-buildFuzzTargets branch from 0260683 to 2ca632e Jan 30, 2019

@bitcoin bitcoin deleted a comment from DrahtBot Jan 30, 2019

@ryanofsky
Copy link
Contributor

left a comment

utACK 2ca632e

test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
test_fuzz_block_deserialize_LDADD = \

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Jan 30, 2019

Contributor

You could deduplicate some boilerplate by defining common variables like:

fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)

and then individual targets could be shortened to:

test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
...

test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
...

At least this is what I did in 2060a30 for #10102. One catch is that LDFLAGS, LDADD, etc suffix can't be capitalized or the variables will be treated specially by automake.

switch(test_id) {
case CBLOCK_DESERIALIZE:
{
#if BLOCK_DESERIALIZE

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Jan 30, 2019

Contributor

I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don't seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Jan 30, 2019

Author Member

This was my initial solution, but I changed it back to minimize the diff

@laanwj laanwj merged commit 2ca632e into bitcoin:master Jan 30, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jan 30, 2019

Merge #15043: test: Build fuzz targets into seperate executables
2ca632e test: Build fuzz targets into seperate executables (MarcoFalke)
fab4bed [test] fuzz: make test_one_input return void (MarcoFalke)

Pull request description:

  Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

  Solve that by building each fuzz target into their own executable.

Tree-SHA512: a874febc85a3c5e6729199542b65cad10640553fba6f663600c827fe144543744dd0f844fb62b4c95c6a04c670bfce32cdff3d5f26de2dfc25f10b258eda18ab

@MarcoFalke MarcoFalke deleted the MarcoFalke:Mf1812-buildFuzzTargets branch Jan 30, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.