Support Configuring Parquet Column Specific Options via SQL Statement Options #7466
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #7442
Closes #7463 (this PR implements a different syntax than 7463 which turned out much easier to implement. Leaving 7463 draft open to show as possible alternative)
Rationale for this change
Extends syntax and allowed options for SQL statement options to configure parquet column level options (e.g. different compression for each possibly nested column).
What changes are included in this PR?
Implements new parsing utils and options for parquet column specific options. Example:
The example defaults all columns to snappy compression and sets col1 to zstd level 5 and the nested column (col2.nested) to zstd level 10.
Using "::" to separate the setting from the column path is obviously rust syntax inspired. We could use any other separator here if there is a more natural one for SQL. Dots (.) would have been the natural choice, but those are being used for separating nested column paths here.
Are these changes tested?
Yes, added a new unit test to verify settings are actually set as expected.
Are there any user-facing changes?
New syntax for specifying column level options, but this PR is backward compatible/no breaking changes unlike 7463