You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
after #15596, the file size control for parquet is improved.
but when there are many threads, blocks are likely to eventually be distributed to the writer threads, and result in relative small files.
a grouping processor is used to group small blocks to MAX_FILE_SIZE before distributed to the writer threads. but its based on uncompressed size, so may result in files with size MAX_FILE_SIZE/compress_ratio
user can change the setting max_threads, but this will affect the whole plan.
compress ratio estimator
another automated approach is to enhance the grouping processor with a compress ratio estimator,
compress ratio may be diff from block to block
grouping larger mem of blocks cost more tmp memory
The text was updated successfully, but these errors were encountered:
Summary
after #15596, the file size control for parquet is improved.
but when there are many threads, blocks are likely to eventually be distributed to the writer threads, and result in relative small files.
a grouping processor is used to group small blocks to
MAX_FILE_SIZE
before distributed to the writer threads. but its based on uncompressed size, so may result in files with sizeMAX_FILE_SIZE/compress_ratio
user can change the setting
max_threads
, but this will affect the whole plan.compress ratio estimator
another automated approach is to enhance the grouping processor with a compress ratio estimator,
The text was updated successfully, but these errors were encountered: