Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzz test to ensure venial parses everything that syn does #4

Closed
wants to merge 1 commit into from

Conversation

loiclec
Copy link
Contributor

@loiclec loiclec commented Mar 18, 2022

Hello :)

This is the PR adding fuzzing to venial. It is not perfect, but on my machine it works well enough to find some differences. I will continue to improve it if you are happy with the general idea/implementation.

Please let me know if anything fails to compile on your machine. I have had a few problems recently due to new rust toolchains and different OSes handling code coverage instrumentation differently. Usually it can be fixed somewhat easily though.

How to use it

The fuzzer uses an unreleased version of fuzzcheck. It is recommended to first install the cargo-fuzzcheck command line tool, as follows:

cargo install --git "https://github.com/loiclec/fuzzcheck-rs"

And it is also necessary to use the nightly version of the Rust compiler.

 # install the nightly toolchain, then:
rustup override set nightly

Then you can run:

cargo fuzzcheck fuzz::fuzz_parse --profile fuzzing

Updates about the progress are printed in the terminal, for example:

5972ms  242234 simplest_cov(609 cov: 2720/4456 cplx: 63.19) failures(1) iter/s 40438

Some files will be printed inside the fuzz/ folder. If a test failure is found, it will be located inside fuzz/artifacts/. You can stop the fuzzer at any point by pressing Ctrl+C.

If you find a test failure but you think it is too large to really understand what is going on, you can run the following to minify the test failure:

cargo fuzzcheck fuzz::fuzz_parse --command minify --input-file "path/to/artifact.rs"

That will run the fuzzer repeatedly, each time with a reduced upper bound on the complexity of the generated token streams. The minified artifacts will be located in path/to/artifact.rs.minified/{cplx}-{hash}.rs. When it seems like no progress is being made anymore, press Ctrl+C and look for the simplest artifact in the folder.

About the code

There is only one property being tested so far, which checks that venial can parse everything that syn does. This code is located in fn test_parse_declaration(..). It also only parses DeriveInput and not ItemFn, because as far as I understand function parsing isn't as polished yet in venial.

The code that actually launches the fuzz test is inside #[test] fn fuzz_parse(). I have commented it lightly, but let me know if you have any questions about it. You can also read about how fuzzcheck works in general at fuzzcheck.neocities.org . The main difficulty was to observe the code coverage of both venial and syn. So instead of using the convenience functions offered by fuzzcheck::builder, I have created a “sensor” and a “pool” manually. In the future, I will improve fuzzcheck’s APIs so that it isn't necessary.

Limitations of the fuzzer

There are a few limitations of the fuzzer, some of which can be fixed easily, and some of which are just the results of a compromise between “diversity of generated token streams” and performance. The limitations are:

  • Just a few hardcoded identifiers are generated. These include all Rust keywords as well as some simples ones like a, b, c. The complexity of all identifiers is the same. That is, fuzzcheck considers auto and a to be equally complex. So it won't minify e.g. struct auto {} into struct a {}

  • No raw identifiers are generated.

  • Similarly, just a few hardcoded literals are generated as well. That is because I don't think venial cares about the content of the literals. If that is not true, let me know.

  • No groups with Delimiter::None are generated. That is because I don't yet have a grasp on what the purpose of Delimiter::None is. Furthermore, allowing Delimiter::None to be generated means that we have to use a more verbose serialisation format for the artifacts. That is because there is no visible difference between e.g. an identifier: ident and an identifier in a group without delimiters: ident.

  • Almost no awareness of the Rust grammar is used (yet). The fuzzer doesn't try to generate valid Rust syntax. That is partly by design, and partly because it is additional work. It would be nice to prioritise combination of punctuations that we know have a meaning in Rust (e.g. ->, ..=, ..., ::, etc.). I will work on that at some point later.

  • Some of the more advanced functionality of fuzzcheck isn't used. It is possible to optimise for different goals, such as finding a set of N test cases that cover the most code, or finding the test case that allocates the most, or has the highest time complexity, etc.

That's it, I think?

Let me know if it works for you. If you try it now, you should quickly (i.e. in 2 minutes at most) find a failing test case:

enum a { a = () > () }

@PoignardAzur
Copy link
Owner

Let me know if it works for you. If you try it now, you should quickly (i.e. in 2 minutes at most) find a failing test case:

enum a { a = () > () }

I need to kill whoever thought using angle brackets for generics in Rust was a good idea.

@loiclec
Copy link
Contributor Author

loiclec commented Mar 18, 2022

For what it's worth, I think it would be completely fine to only allow literals, identifiers, or expressions within braces as the discriminant of an enum variant, since wrapping an expression inside curly braces is always possible and never changes the meaning of the code. So the type above can always be rewritten as:

enum a { a = { () > () } }

This would also get rid of the closure expression problem.

@PoignardAzur
Copy link
Owner

I don't think that's a satisfying solution. It breaks the "accept any code that compiles" invariant.

In any case, I want venial to parse expressions at some point, so I'll have to handle these cases anyway.

@PoignardAzur
Copy link
Owner

Codes LGTM.

Quick question: instead of generating variations of TokenStream and checking if syn parses them, have you considered generating variations of syn::DeriveInput instead? Although I guess orphan rules mean you can't implement your trait on syn types.

@loiclec
Copy link
Contributor Author

loiclec commented Mar 18, 2022

Quick question: instead of generating variations of TokenStream and checking if syn parses them, have you considered generating variations of syn::DeriveInput instead? Although I guess orphan rules mean you can't implement your trait on syn types.

Yes, that's also possible and I have considered it. The orphan rule wouldn't be a problem since in order to fuzz-test a type T, we need a separate type that implements Mutator<T>, which can be done outside the original crate.

But it's more work and would only produce syntactically correct Rust code. For this property test in venial, it seems like this is indeed what we want, but I thought that was not the case in general. For example, the infinite loop bug may not have been found by generating a valid syn::DeriveInput and then converting it to a token stream.

If I find the time and motivation, I will also write a Mutator<syn::DeriveInput>. Then we can have two fuzz-tests: one for syntactically correct Rust code, and one for quasi-random token streams.

@PoignardAzur
Copy link
Owner

Thanks, that's helpful =)

Quick warning: I've been working non-stop on venial for two weeks, a lot more than I expected for something that was supposed to be a side project. I'm probably not going to have much time to read PRs or fix bugs in the coming weeks. They're still very much appreciated.

@loiclec
Copy link
Contributor Author

loiclec commented Mar 18, 2022

Oh, no worries at all! I meant to write most of this code anyway, and venial was a very convenient target to test the fuzzer, it even helped me fix some bugs in fuzzcheck. So don't feel any pressure to review/fix/merge anything :-)

inherits = "release"
codegen-units = 1
lto = "thin"
debug = false
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the fuzzing profile have debug = true instead, to catch more potential panics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. Not sure why it's set to false, I must have been experimenting with different values and then forgot to turn it back to true.

@PoignardAzur
Copy link
Owner

I've rebased and merged the PR.

@PoignardAzur
Copy link
Owner

PoignardAzur commented Mar 30, 2022

I may have been too hasty.

When trying to compile fuzz tests, I get this error:

test fuzz::fuzz_parse ... thread 'main' panicked at 'failed to properly link the different LLVM coverage sections: CannotFindFunctionRecordAssociatedWithPrfData("This can sometimes happen when the crate is incrementally compiled with more than one codegen-unit. Try adding codegen-units = 1 or incremental = false to the appropriate profile (release or fuzzing) in Cargo.toml")', /home/olivier-faure/.cargo/git/checkouts/fuzzcheck-rs-531eb3f95f61a5dd/b65b30b/fuzzcheck/src/code_coverage_sensor/mod.rs:71:14

@PoignardAzur PoignardAzur reopened this Mar 30, 2022
@loiclec
Copy link
Contributor Author

loiclec commented Mar 30, 2022 via email

@PoignardAzur
Copy link
Owner

Any luck?

@loiclec
Copy link
Contributor Author

loiclec commented Apr 13, 2022

Ah, yes, sorry for not getting back to you. I couldn't reproduce the bug unfortunately :(

There are a couple things to try:

  • did you try running cargo clean and then re-build from scratch again?
  • are you using the fuzzing profile? (i.e. run cargo fuzzcheck fuzz::parse --profile fuzzing) ?
  • Could you provide more information on your setup? (OS, architecture, rust version)?

@PoignardAzur
Copy link
Owner

Running the command cargo fuzzcheck fuzz::fuzz_parse --profile fuzzing after a full clean gives me the same panic as above (CannotFindFunctionRecordAssociatedWithPrfData).

A good first step might be to include more info in that panic message.

  • Current target: nightly-x86_64-unknown-linux-gnu
  • Compiler: rustc 1.62.0-nightly (1f7fb6413 2022-04-10)
  • OS: Pop_OS 21.10

@PoignardAzur
Copy link
Owner

@loiclec

I couldn't reproduce the bug unfortunately :(

To be clear, did you try to reproduce it on MacOS, Linux, or both?

@PoignardAzur
Copy link
Owner

Hi, any progress?

@PoignardAzur
Copy link
Owner

Alright, since I want to get done with venial, I'm going to close this PR and remove the fuzz code for now. I may re-add the code if it's shown to work on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants