[SPARK-25356][SQL]Add Parquet block size option to SparkSQL configuration #22350
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
I think we should configure the Parquet buffer size when using Parquet format.
Because for HDFS,
dfs.block.size
is configurable, sometimes we hope the block size of parquet to be consistent with it.And whether this parameter
spark.sql.files.maxPartitionBytes
is best consistent with the Parquet block size when using Parquet format?Also we may want to shrink Parquet block size in some tests.
How was this patch tested?
N/A