Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented structure-aware fuzzing approach in ClickHouse for select statement parser.
Detailed description / Documentation draft:
Structure-aware fuzzing has been previously been used in many projects, where it has demonstrated its effectiveness. This approach was observed in V8 fuzzing by Google and implemented for ClickHouse parser fuzzing. The idea behind codegen_fuzzer is very simple: with the help of libfuzzer and libprotobuf-mutator we can fuzz language parsers with coverage. Technically, it is a grammar based fuzzer, where grammar is simply
TOKEN*
, but it is done with feedback, which would be very useful for finding interesting paths without much manual work.If this approach would show any results, I would like to add similar fuzzers for other parts of the project.
gen.py
-- script to generate C++ string generation routines and .proto messagesclickhouse-template.g
-- human-generated list of tokensclickhouse.g
-- full list of tokens generated by call to scriptupdate.sh
.