Fuzz: a more efficient descriptor parsing target #27888

darosior · 2023-06-14T13:13:09Z

The current descriptor parsing fuzz target requires valid public or private keys to be provided. This is unnecessary as we are only interested in fuzzing the descriptor parsing logic here (other targets are focused on fuzzing keys serializations). And it's pretty inefficient, especially for formats that need a checksum (xpub, xprv, WIF).

This introduces a new target that mocks the keys as an index in a list of precomputed keys. Keys are represented as 2 hex characters in the descriptor. The key type (private, public, extended, ..) is deterministically based on this one-byte value. Keys are deterministically generated at target initialization. This is much more efficient and also largely reduces the size of the seeds.
TL;DR: for instance instead of requiring the fuzzer to generate a pk(xpub6DdBu7pBoyf7RjnUVhg8y6LFCfca2QAGJ39FcsgXM52Pg7eejUHLBJn4gNMey5dacyt4AjvKzdTQiuLfRdK8rSzyqZPJmNAcYZ9kVVEz4kj) to parse a valid descriptor, it just needs to generate a pk(03).

Note we only mock the keys themselves, not the entire descriptor key expression. As we want to fuzz the real code that parses the rest of the key expression (origin, derivation paths, ..).

This is a target i used for reviewing #17190 and #27255, and figured it was worth PR'ing on its own since the added complexity for mocking the keys is minimal and it could help prevent introducing bugs to the descriptor parsing logic much more efficiently.

DrahtBot · 2023-06-14T13:13:11Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	MarcoFalke, achow101
Concept ACK	dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26573 (Wallet: don't underestimate the fees when spending a Taproot output by darosior)
#26567 (Wallet: estimate the size of signed inputs using descriptors by darosior)
#22838 (descriptors: Be able to specify change and receiving in a single descriptor string by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2023-06-14T14:09:08Z

Looks like this touches non-fuzz code? No opinion if people want this, but a much simpler implementation would be to just create a copy of the buffer and inject the pre-generated static string into it. (This is basically what a fuzz engine does automatically, with the difference that you can now provide the string dictionary yourself, directly in the fuzz target).

the fuzz target would look like:

auto str{fdp.ConsumeString()};
if (fdp.ConsumeBool()) {
  str = MockDescriptor(str);
}
const auto desc = Parse(str);

With MockDescriptor doing a search of pk(xx) and then replacing it with pk(yy), where yy=raw_pubkeys[int(xx) % raw_pubkeys.size()] (already hex encoded).

This has the benefits of not mocking out the parsing logic, which for GetXOnlyPubKey is actually worthy to fuzz? Also, it allows to print the fuzz input, if needed, and use it over RPC for debugging. Finally, it doesn't touch real code, only test code.

darosior · 2023-06-14T14:20:03Z

In what you suggest MockDescriptor would basically have to reimplement all of the descriptor parsing logic to be able to detect when is a key expected and replace the hex-encoded byte by an actual key. I figured the slight modification to the descriptor code to make key parsing mockable was preferable. (Note that's what we have already in Miniscript which allows us to have an efficient miniscript_string fuzz target.)

To be clear this is not only about mocking pk() expressions but anywhere we'd expect a key.

maflcko · 2023-06-14T14:49:15Z

Ok, I see. I guess another alternative would be to randomly inject random pre-generated keys at random positions in the string, absent of any logic. Then let the fuzz engine figure out the right positions via coverage feedback.

No strong opinion, just leaving random ideas that can be implemented with less code.

dergoegge · 2023-06-14T14:50:26Z

Concept ACK

Mocking seems fine to me, but I wonder if we could achieve something similar by placing (in)valid encoded keys in a fuzz dictionary (e.g. bitcoin-core/qa-assets#122), which is also almost the same as Marco's suggestion. This would be less efficient compared to what is in this PR but the inputs would still be available for debugging over RPC. We can of course also just do both since adding to the dictionaries is very easy.

darosior · 2023-06-14T15:16:18Z

Thanks both for throwing in ideas. Note i don't have strong opinions here either, it's just some review code that i figured could be helpful having in too. However i still don't think the approaches suggested here would be better:

Inserting valid keys at random positions in the input. I'm assuming you describe something that looks like 1) get the number of keys from the fuzzer output 2) get the positions to insert each at from the fuzzer output. I may be underestimating the fuzzer's capabilities but it sounds much less efficient.
Including valid keys in a dictionary. Again, i'm not very familiar with the inernals of fuzzing engines but i suspect this would make the fuzzer grind the key with virtually no chance of finding another valid one, wasting a lot of cycles.
I don't think keeping the raw seed human readable for debugging over RPC should generally be a goal, especially not at the expense of efficiency. To get something readable you can simply run ./src/test/fuzz/fuzz ./crash-XXXXX with a ToString() printed to stdout.

sipa · 2023-06-14T15:19:12Z

Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form "%XX" where XX is two hex characters, is replaced by a lookup in a table.

darosior · 2023-06-14T15:21:58Z

Hmm good point. Using a distinguishable character fixes the issue of having to "basically reimplement descriptor parsing logic". ------- Original Message -------

…

On Wednesday, June 14th, 2023 at 5:19 PM, Pieter Wuille ***@***.***> wrote: Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form "%XX" where XX is two hex characters, is replaced by a lookup in a table. — Reply to this email directly, [view it on GitHub](#27888 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AFLK3F3PYOENOE2ZPE75HITXLHI7ZANCNFSM6AAAAAAZGLFYQU). You are receiving this because you authored the thread.Message ID: ***@***.***>

maflcko · 2023-06-14T15:31:45Z

I may be underestimating the fuzzer's capabilities but it sounds much less efficient.

I am happy to run a bench, if you happen to have a bug laying around :)

darosior · 2023-06-14T15:35:06Z

I don't, but i think Pieter settled the debate anyways. :) ------- Original Message -------

…

On Wednesday, June 14th, 2023 at 5:31 PM, MacrabFalke ***@***.***> wrote: > I may be underestimating the fuzzer's capabilities but it sounds much less efficient. I am happy to run a bench, if you happen to have a crash+bug laying around :) — Reply to this email directly, [view it on GitHub](#27888 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AFLK3F75QKRNVW44OITSYKLXLHKO5ANCNFSM6AAAAAAZGLFYQU). You are receiving this because you authored the thread.Message ID: ***@***.***>

darosior · 2023-06-15T14:42:42Z

Updated following @sipa's suggestion of using a marker to be able to search and replace in the descriptor string directly.

The diff is now twice smaller, and this does not touch the descriptor parsing logic anymore. After running the new mocked_descriptor_parse target for half a hour on my laptop i get more branch coverage for descriptor.cpp than with the existing corpus for descriptor_parse.

`descriptor_parse`

`mocked_descriptor_parse`

src/test/fuzz/descriptor_parse.cpp

luke-jr · 2023-06-24T14:11:17Z

Suggest putting "Fuzz" in title, and labelling.

We'll be reusing it in the new target.

maflcko · 2023-07-21T10:42:35Z

For testing I've injected a bug in currently uncovered code:

diff --git a/src/script/descriptor.cpp b/src/script/descriptor.cpp
index 09ded5fc61..b69db182ab 100644
--- a/src/script/descriptor.cpp
+++ b/src/script/descriptor.cpp
@@ -509,7 +509,7 @@ public:
         out = "[" + origin_str + "]" + EncodeExtPubKey(xpub) + FormatHDKeypath(end_path);
         if (IsRange()) {
             out += "/*";
-            assert(m_derive == DeriveType::UNHARDENED);
+            assert(m_derive != DeriveType::UNHARDENED); // Injected BUG!! (bad)
         }
         return true;
     }

...

But it looks like the fuzz target just crashed immediately anyway (see CI)

darosior · 2023-07-21T12:44:07Z

Thanks for testing. Fixed the issue. I also introduced the bug you shared and it does make the target crash.

maflcko · 2023-07-21T16:44:28Z

Did quick check to see how many iterations it would take to find my injected bug with libfuzzer:

-use_value_profile=1:
-use_value_profile=0:

maflcko

nice, lgtm ACK 84dee4f 🦄

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: nice, lgtm ACK 84dee4fe690e08a5adaad1c78530666da07075d8 🦄
ag2blJR9VoWl2NoAXwUZSDU54s5hPaFz7IJt+iLlaSMLYtYwbr4/XMw4CW9xm1A+BXAn9ZXtWQxrwFSRr/COBg==

src/test/fuzz/descriptor_parse.cpp

This new target focuses on fuzzing the actual descriptor parsing logic by not requiring the fuzzer to produce valid keys (nor a valid checksum for that matter). This should make it much more efficient to find bugs we could introduce moving forward. Using a character as a marker (here '%') to be able to search and replace in the string without having to mock the actual descriptor parsing logic was an insight from Pieter Wuille.

Once a descriptor is successfully parsed, execute more of its methods. There is probably still room for improvements by checking for some invariants, but this is a low hanging fruit that significantly increases the code coverage of these targets.

maflcko · 2023-07-21T17:18:27Z

re-ACK 131314b 🐓

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 131314b62e899f95d1863083d303b489b3212b16  🐓
PiOVYYPtzobuHwHpzUHpihKOhhWkacXf+ZxTpk/y7EwMzhRSE9kfEk+pz/XQl8QIUv2W4fuwNKvY2pFyASt8Bg==

achow101 · 2023-07-27T17:47:34Z

ACK 131314b

maflcko · 2023-07-28T08:11:30Z

src/test/fuzz/descriptor_parse.cpp

+    std::optional<uint8_t> IdxFromHex(std::string_view hex_characters) const {
+        if (hex_characters.size() != 2) return {};
+        auto idx = ParseHex(hex_characters);
+        if (idx.size() != 1) return {};
+        return idx[0];
+    }


Could just use the raw (single) byte here, but that would interfere with libFuzzer -only_ascii=1, which makes me wonder if this is the first ascii fuzz target and whether we should set the option somewhere somehow during input generation?

Funny how -only_ascii=1 performs worse than -only_ascii=0 (#27888 (comment)).

(or at least, not significantly better)

Maybe the target returning early on non-ASCII has basically the same effect?

maflcko · 2023-07-28T10:46:57Z

-mutate_depth=3 seems to be the best so far:

maflcko · 2023-07-28T11:41:10Z

With -mutate_depth=14 being worse (be aware that the y-axis no longer matches to all previous plots)

131314b fuzz: increase coverage of the descriptor targets (Antoine Poinsot) 90a2474 fuzz: add a new, more efficient, descriptor parsing target (Antoine Poinsot) d60229e fuzz: make the parsed descriptor testing into a function (Antoine Poinsot) Pull request description: The current descriptor parsing fuzz target requires valid public or private keys to be provided. This is unnecessary as we are only interested in fuzzing the descriptor parsing logic here (other targets are focused on fuzzing keys serializations). And it's pretty inefficient, especially for formats that need a checksum (`xpub`, `xprv`, WIF). This introduces a new target that mocks the keys as an index in a list of precomputed keys. Keys are represented as 2 hex characters in the descriptor. The key type (private, public, extended, ..) is deterministically based on this one-byte value. Keys are deterministically generated at target initialization. This is much more efficient and also largely reduces the size of the seeds. TL;DR: for instance instead of requiring the fuzzer to generate a `pk(xpub6DdBu7pBoyf7RjnUVhg8y6LFCfca2QAGJ39FcsgXM52Pg7eejUHLBJn4gNMey5dacyt4AjvKzdTQiuLfRdK8rSzyqZPJmNAcYZ9kVVEz4kj)` to parse a valid descriptor, it just needs to generate a `pk(03)`. Note we only mock the keys themselves, not the entire descriptor key expression. As we want to fuzz the real code that parses the rest of the key expression (origin, derivation paths, ..). This is a target i used for reviewing bitcoin#17190 and bitcoin#27255, and figured it was worth PR'ing on its own since the added complexity for mocking the keys is minimal and it could help prevent introducing bugs to the descriptor parsing logic much more efficiently. ACKs for top commit: MarcoFalke: re-ACK 131314b 🐓 achow101: ACK 131314b Tree-SHA512: 485a8d6a0f31a3a132df94dc57f97bdd81583d63507510debaac6a41dbbb42fa83c704ff3f2bd0b78c8673c583157c9a3efd79410e5e79511859e1470e629118

fa3a410 fuzz: Set -rss_limit_mb=8000 for generate as well (MarcoFalke) fa4e396 fuzz: Generate with random libFuzzer settings (MarcoFalke) Pull request description: Sometimes a libFuzzer setting like `-use_value_profile=1` helps [0], sometimes it hurts [1]. [0] #20789 (comment) [1] #27888 (comment) By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set. Also, set `-max_total_time` to prevent slow fuzz targets from getting a larger time share, or possibly peg to a single core for a long time and block the python script from exiting for a long time. This can be improved in the future. For example, the python script can exit after some time (#20752 (comment)). Alternatively, it can measure if coverage progress was made and run for less time if no progress has been made recently anyway, so that more time can be spent on targets that are new or still make progress. ACKs for top commit: murchandamus: utACK fa3a410 dergoegge: utACK fa3a410 brunoerg: light ACK fa3a410 Tree-SHA512: bfd04a76ca09aec612397bae5f3f263a608faa7087697169bd4c506c8195c4d2dd84ddc7fcd3ebbc75771eab618fad840af819114968ca3668fc730092376768

darosior force-pushed the efficient_desc_target branch from 9cdb8c7 to 001b169 Compare June 14, 2023 13:41

DrahtBot added the CI failed label Jun 14, 2023

DrahtBot removed the CI failed label Jun 14, 2023

darosior force-pushed the efficient_desc_target branch 2 times, most recently from 7f384ae to 3def962 Compare June 15, 2023 13:39

DrahtBot added the CI failed label Jun 15, 2023

darosior force-pushed the efficient_desc_target branch from 3def962 to 8edac3d Compare June 15, 2023 14:02

DrahtBot removed the CI failed label Jun 15, 2023

achow101 reviewed Jun 15, 2023

View reviewed changes

src/test/fuzz/descriptor_parse.cpp Outdated Show resolved Hide resolved

darosior force-pushed the efficient_desc_target branch from 8edac3d to fad60fc Compare June 16, 2023 07:54

DrahtBot mentioned this pull request Jun 16, 2023

Wallet: estimate the size of signed inputs using descriptors #26567

Merged

achow101 reviewed Jun 16, 2023

View reviewed changes

src/test/fuzz/descriptor_parse.cpp Outdated Show resolved Hide resolved

darosior force-pushed the efficient_desc_target branch from fad60fc to b23c2e2 Compare June 18, 2023 12:16

darosior changed the title ~~A more efficient descriptor parsing target~~ Fuzz: a more efficient descriptor parsing target Jun 27, 2023

DrahtBot mentioned this pull request Jul 11, 2023

fuzz: Flatten all FUZZ_TARGET macros into one #28065

Merged

DrahtBot added the Needs rebase label Jul 17, 2023

fuzz: make the parsed descriptor testing into a function

d60229e

We'll be reusing it in the new target.

darosior force-pushed the efficient_desc_target branch from b23c2e2 to 7349861 Compare July 21, 2023 08:50

DrahtBot added CI failed and removed Needs rebase labels Jul 21, 2023

darosior force-pushed the efficient_desc_target branch from 7349861 to 84dee4f Compare July 21, 2023 12:33

DrahtBot removed the CI failed label Jul 21, 2023

maflcko approved these changes Jul 21, 2023

View reviewed changes

src/test/fuzz/descriptor_parse.cpp Outdated Show resolved Hide resolved

src/test/fuzz/descriptor_parse.cpp Outdated Show resolved Hide resolved

darosior force-pushed the efficient_desc_target branch 2 times, most recently from cfc6a6e to f00c62e Compare July 21, 2023 17:11

darosior added 2 commits July 21, 2023 19:14

darosior force-pushed the efficient_desc_target branch from f00c62e to 131314b Compare July 21, 2023 17:15

DrahtBot mentioned this pull request Jul 21, 2023

Wallet: don't underestimate the fees when spending a Taproot output #26573

Open

darosior requested a review from achow101 July 26, 2023 12:51

DrahtBot removed the request for review from achow101 July 27, 2023 17:47

achow101 merged commit cbf3850 into bitcoin:master Jul 27, 2023
15 checks passed

maflcko reviewed Jul 28, 2023

View reviewed changes

darosior deleted the efficient_desc_target branch July 28, 2023 08:31

maflcko mentioned this pull request Jul 28, 2023

fuzz: Generate with random libFuzzer settings #28178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz: a more efficient descriptor parsing target #27888

Fuzz: a more efficient descriptor parsing target #27888

darosior commented Jun 14, 2023 •

edited

DrahtBot commented Jun 14, 2023 •

edited

maflcko commented Jun 14, 2023

darosior commented Jun 14, 2023 •

edited

maflcko commented Jun 14, 2023

dergoegge commented Jun 14, 2023

darosior commented Jun 14, 2023

sipa commented Jun 14, 2023

darosior commented Jun 14, 2023 via email

maflcko commented Jun 14, 2023 •

edited

darosior commented Jun 14, 2023 via email

darosior commented Jun 15, 2023 •

edited

luke-jr commented Jun 24, 2023

maflcko commented Jul 21, 2023

darosior commented Jul 21, 2023

maflcko commented Jul 21, 2023 •

edited

maflcko left a comment

maflcko commented Jul 21, 2023

achow101 commented Jul 27, 2023

maflcko Jul 28, 2023 •

edited

maflcko Jul 28, 2023 •

edited

maflcko Jul 28, 2023

darosior Jul 28, 2023

maflcko commented Jul 28, 2023

maflcko commented Jul 28, 2023

Fuzz: a more efficient descriptor parsing target #27888

Fuzz: a more efficient descriptor parsing target #27888

Conversation

darosior commented Jun 14, 2023 • edited

DrahtBot commented Jun 14, 2023 • edited

Reviews

Conflicts

maflcko commented Jun 14, 2023

darosior commented Jun 14, 2023 • edited

maflcko commented Jun 14, 2023

dergoegge commented Jun 14, 2023

darosior commented Jun 14, 2023

sipa commented Jun 14, 2023

darosior commented Jun 14, 2023 via email

maflcko commented Jun 14, 2023 • edited

darosior commented Jun 14, 2023 via email

darosior commented Jun 15, 2023 • edited

descriptor_parse

mocked_descriptor_parse

luke-jr commented Jun 24, 2023

maflcko commented Jul 21, 2023

darosior commented Jul 21, 2023

maflcko commented Jul 21, 2023 • edited

maflcko left a comment

Choose a reason for hiding this comment

maflcko commented Jul 21, 2023

achow101 commented Jul 27, 2023

maflcko Jul 28, 2023 • edited

Choose a reason for hiding this comment

maflcko Jul 28, 2023 • edited

Choose a reason for hiding this comment

maflcko Jul 28, 2023

Choose a reason for hiding this comment

darosior Jul 28, 2023

Choose a reason for hiding this comment

maflcko commented Jul 28, 2023

maflcko commented Jul 28, 2023

darosior commented Jun 14, 2023 •

edited

DrahtBot commented Jun 14, 2023 •

edited

darosior commented Jun 14, 2023 •

edited

maflcko commented Jun 14, 2023 •

edited

darosior commented Jun 15, 2023 •

edited

`descriptor_parse`

`mocked_descriptor_parse`

maflcko commented Jul 21, 2023 •

edited

maflcko Jul 28, 2023 •

edited

maflcko Jul 28, 2023 •

edited