Parquet Fuzz Tests #1053

tustvold · 2021-12-17T17:02:36Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Whilst working on #1037 I've introduced bugs that have then been caught by the arrow array benchmarks.

It would therefore appear that these tests are exercising code paths not found in the other tests, and we could therefore increase the test coverage by including some variant of them.

Describe the solution you'd like

A set of fuzz tests that create various types of PageIterator with multiple column chunks, and multiple pages per column chunk. This can likely reuse much of the fuzz plumbing found in the arrow_array_reader benchmarks.

The tests would then use the ArrayReader abstractions to read this data and verify it is what was written.

Describe alternatives you've considered

We could not add fuzz tests, but there would be an increased likelihood of regressions.

The text was updated successfully, but these errors were encountered:

chadbrewbaker · 2021-12-19T20:37:41Z

After thinking about this for a week - I'm inclined to start driving with Arrow Python/Hypothesis and Python Parquet tests then gradually add Proptest. AWS Labs has the best proptest examples.

Zooming out a bit more, DataFusion needs to be integrated in squirrel - sqlancer cross SQL engine tests. Can use sqlsmith for reductions of large queries.

We also want to be like AWS Redshift where you write a query in Python/SQL - and it emits Rust code that gets compiled and sent to worker nodes.

Seems we might need thin-lto even on dev builds to reduce false positives https://github.com/awslabs/rust-smt-ir/blob/551565ea5e97f502269d74d189e2e2c1e6b52f40/Cargo.toml#L11

tustvold · 2021-12-29T18:36:19Z

FYI I'm experimenting with extending the existing fuzz tests to support nulls, dictionaries, etc...

…groups with multiple pages (#1053) (#1110) * Parquet fuzz tests (#1053) * Test multiple WriterVersions * Revert array_reader change

tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Dec 17, 2021

This was referenced Dec 29, 2021

Interval Support in Dyn Comparison Kernels #1106

Closed

Add comparison kernels for BinaryArray #1108

Closed

Add native comparison kernel support for FixedSizeBinaryArray #1109

Closed

tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 29, 2021

Parquet fuzz tests (apache#1053)

b7649ac

tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 29, 2021

Parquet fuzz tests (apache#1053)

c5597b2

tustvold mentioned this issue Dec 29, 2021

Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) #1110

Merged

tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 29, 2021

Parquet fuzz tests (apache#1053)

7838585

tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 29, 2021

Parquet fuzz tests (apache#1053)

0baa151

tustvold mentioned this issue Jan 1, 2022

Generify ColumnReaderImpl and RecordReader (#1040) #1041

Merged

alamb closed this as completed in #1110 Jan 11, 2022

alamb pushed a commit that referenced this issue Jan 11, 2022

Extends parquet fuzz tests to also tests nulls, dictionaries and row …

f8ff7fe

…groups with multiple pages (#1053) (#1110) * Parquet fuzz tests (#1053) * Test multiple WriterVersions * Revert array_reader change

tustvold mentioned this issue Jan 11, 2022

Fuzz test different parquet encodings #1156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet Fuzz Tests #1053

Parquet Fuzz Tests #1053

tustvold commented Dec 17, 2021

chadbrewbaker commented Dec 19, 2021 •

edited

tustvold commented Dec 29, 2021

Parquet Fuzz Tests #1053

Parquet Fuzz Tests #1053

Comments

tustvold commented Dec 17, 2021

chadbrewbaker commented Dec 19, 2021 • edited

tustvold commented Dec 29, 2021

chadbrewbaker commented Dec 19, 2021 •

edited