Skip to content

Conversation

@devinjdangelo
Copy link
Contributor

Which issue does this PR close?

None

Rationale for this change

I noticed that with the current default parquet writer settings, writing a single row to a parquet file results in a file over 4MB in size. Some of the config values are being set to explicit values (e.g. plain encoding) rather than the default setting in the parquet crate which is to leave these values as None.

What changes are included in this PR?

This PR makes the session level default for these values None to match the behavior of the parquet crate. After doing so, parquet files are now a minimum of 4KB rather than 4MB.

Are these changes tested?

Yes by existing tests.

Are there any user-facing changes?

Better default parquet writer settings

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 16, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @devinjdangelo -- I added some documentation suggestions but I also think we could do them as a follow on PR

devinjdangelo and others added 3 commits August 16, 2023 19:23
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
clarify meaning of null in session default writer settings

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb alamb merged commit 354d4ff into apache:main Aug 17, 2023
@alamb
Copy link
Contributor

alamb commented Aug 17, 2023

Thanks again @devinjdangelo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants