Fuzz testing for parquet errors vs panic

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

In general, this crate should error on invalid data rather than panic

https://github.com/apache/arrow-rs?tab=readme-ov-file#guidelines-for-panic-vs-result

> For those caused by invalid user input, however, we prefer to report that invalidity gracefully as an error result instead of panicking. In general, invalid input should result in an Error as soon as possible.


However, we keep hitting various paths in parquet where there are panics
- https://github.com/apache/arrow-rs/pull/9725

Given these paths require a corrupt / invalid datasource, it is hard to write tests for them

For example, here is a test that @xuzifu666 added for one such error: https://github.com/apache/arrow-rs/pull/9725/commits/0bb99427f62f36fbeaf680c62265b200881c1549

However, I thought it would be hard to maintain over the long run as the programatic generation of bad data will be brittle (if we change how the thrift is written, for example, the truncation may go down a different path).

**Describe the solution you'd like**

I think we should consider some sort of parquet fuzzer that makes randomly bad data and ensures that the reader is returning error (not `panic`ing). It would be nice if it made some parqut files and then applied common data corruption:
1. Truncate the data (remove bytes from end of the file)
2. Truncate the data (remove bytesof the start of the file)
3. Switch a random bit
4. Set a random range of the file to all zeros

There are probably other good ones we can do

**Describe alternatives you've considered**


**Additional context**
Related to 
- https://github.com/apache/arrow-rs/issues/5332

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz testing for parquet errors vs panic #9742

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fuzz testing for parquet errors vs panic #9742

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions