Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz tests for Arrow/Parquet #5332

Open
emkornfield opened this issue Jan 25, 2024 · 2 comments
Open

Fuzz tests for Arrow/Parquet #5332

emkornfield opened this issue Jan 25, 2024 · 2 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@emkornfield
Copy link
Contributor

I'm generally new to the code base but it looks like existing fuzz tests might be generating random data and making sure it can be read back, but we don't have any fuzz tests for malformed data. I think in the context of Rust the goal would be to avoid panics?

If this is accurate, I'd propose creating fuzz tests that check succeed as long as there is no panic. A beginning corpus arrow-testing repo 2 for each file type.

A second step would be integrate with oss-fuzz (Arrow C++ already does so).

If this is of interest and i'm correct this isn't already done, I can try to see if I can prototype something.

@emkornfield emkornfield added the enhancement Any new improvement worthy of a entry in the changelog label Jan 25, 2024
@tustvold
Copy link
Contributor

It isn't an issue IMO if the reader panics on malformed data, this is a perfectly safe and well-defined behaviour. We should try to avoid it, but its not like UB where it would indicate a bug. Panics are just exceptions.

The bigger issue with untrusted/malicious inputs is avoiding the reader getting stuck in infinite loops or exploding the memory usage. I'm not sure how easy such things are to catch using a fuzz testing framework.

With regards to parquet, I can't help feeling the format is sufficiently complex that supporting untrusted input is essentially a fools errand though...

That's all to say adding fuzzing support would be a nice add. I'm not too familiar with the Rust ecosystem's support for it, but @crepererum may know more.

.

@crepererum
Copy link
Contributor

Fuzzing is a good thing, even when you accept panics as an outcome. The fuzzer then has two wrap the method call accordingly.

Regarding the toolchain: We should use cargo-fuzz. That gives us the option to use multiple fuzzers. Then you need to choose a fuzzer. I would suggest you use libFuzzer which comes w/ LLVM, since it is the least invasive one, however it has entered maintenance mode. I see the following options:

That said, the choice can easily be changed later, since the fuzzer is effectively just a "run some code on this blackbox [u8] input" (like "parse parquet from bytes").

Footnotes

  1. Note that AFL was abandoned for a while, but development is now open and active under the AFL++ project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants