You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even Oldridge:
RAPIDS.AI has recently implemented a parquet reader to load data to GPU. According to the dev the optimal page size for GPUs is much smaller than the default of 1M and should be set closer to 256K. My current workflow uses pyarrow to do the parquet write and I'd love to be able to specify this.
Even Oldridge:
I'm not confident enough that I could implement this; I'm new to parquet and am not comfortable enough with the c++ required, but i'll bring it up with the team that developed the parquet reader.
It would be useful to be able to set the size of data pages (within Parquet column chunks) from Python. The current default is set to 1MiB at https://github.com/apache/parquet-cpp/blob/0875e43010af485e1c0b506d77d7e0edc80c66cc/src/parquet/properties.h#L81. It might be useful in some situations to lower this for more granular access.
We should provide this value as a parameter to
pyarrow.parquet.write_table
.Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
PRs and other links:
Note: This issue was originally created as ARROW-2057. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: