-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta: Isolated fuzzing of net processing #27502
Comments
I don't think that ##27499 is going to increase fuzz performance. Parsing should currently only be done at the beginning once, before any fuzzing happens. Even if parsing were in the hot loop, it probably wouldn't be noticeable. And if you want to construct a PeerManager for each fuzzing iteration, it seems too slow either way, unless you find a way to skip the memory allocations (as you said yourself). |
I think the separation from gArgs makes sense either way, but yea not really a performance improvement. I added the args stuff because I was investigating performance for a target that creates a new TestingSetup each iteration. The argsman setup with all arguments actually was one of the slower things (among block tree db, chainman setup and blockfile creation).
Afaict the allocations are mostly due to one or two large bloom filters. I think we could lazily initialize the filters on demand, so executions that don't reach code that use the filters would benefit. Or alternatively, we could allow to set the size of these filters when creating a peerman (i think that would be fine for tests). |
What is this going to fuzz exactly? The possible behaviour of an individual p2p message in isolation? The p2p protocol state machine for a single peer? Is this intended to be abstracted away from validation/mempool behaviours (ie, stubbing them out)? |
Mostly all code in I'm not saying that we shouldn't fuzz the other modules, just that it makes more sense to do in isolation for the type of fuzzing we currently utilize. We can also fuzz the integration of them all but snapshot fuzzing is probably much better for that due to quite heavy initial state requirements (I have been working on this as well using https://nyx-fuzz.com/).
Yes and yes, each would be its own target. Another example would be the processing of a sequence of messages e.g. a target per protocol flow we are interested in testing: version handshake, compact block relay, tx relay, etc. All of this can happen for a single peer or multiple. |
I guess the thing I'm wondering is if/how you get coverage on behaviours like "tx is missing witness data", "we got a tx via txid so can cancel out requests via wtxid", "compact block is invalid", "header chain does/doesn't have sufficient work", "parent is (not) in mempool", etc? I wonder if it would make sense to have a ... |
Yea that is what I had in mind as well and what I was alluding to by "net processing / validation split" in the PR description. Alternatively we could work on making the actual mempool/chainman interfaces mockable/more abstract but that might be more work and would also interfere with kernel. For the I can code up a parent PR/branch (similar to #28252) for this (I already had some of this done while working on a block download module). |
I think even just having it as documentation of what parts of mempool/validation are used by net would be interesting -- I don't think the boundary there is entirely clean; eg the |
Efficient isolated fuzzing of our message processing code (net processing) would be very valuable. However, to make that deterministic, fast and fuzzer friendly it appears that extensive refactoring is required. There are three main blockers: module separation (net/net processing/validation split), determinism, performance.
I am gonna use this issue to track and motivate the work that needs to be done. It would be great if we can achieve the same outcome with less refactoring but I don't see how at the moment. Open to suggestions and feedback.
Module separation
Once we have a clean separation between our net, net processing and validation modules, we can fuzz/test them in isolation to maximize the bug yield (🐛, 🪲, 🪳). By design, most fuzzing engines are not great at finding bugs when the scope of the targets is too large (e.g. fuzzing net, net processing and validation all at once).
cs_main
usage in net processingDeterminism
Non-deterministic fuzz targets are less efficient at finding bugs and debugging non-reproducible bugs is annoying/costly.
GetRand()
(mock it?)Performance
Fuzzing is a search, the faster the search - the better.
Misc
Please let me know if you think there are PRs that aren't listed here but should be.
The text was updated successfully, but these errors were encountered: