@adriangb has made some great contributions to the Rust parquet writer to support several usecases where parquet writers typically struggle:
I think it would be amazing to write a blog post explaining the challenge and the solutions
Challenge 1: Oversized pages for large binary/string columns
Challenge 2: RAM buffering requirements when writing large Row Groups
The idea is to write a blog in https://arrow.apache.org/blog/ (source in https://github.com/apache/arrow-site) that explains the challenges and how we solved it with Software Engineering rather than a new file format
@adriangb has made some great contributions to the Rust parquet writer to support several usecases where parquet writers typically struggle:
-Pluggable page spilling API for the Parquet ArrowWriter (PageStore) #10020
I think it would be amazing to write a blog post explaining the challenge and the solutions
Challenge 1: Oversized pages for large binary/string columns
Challenge 2: RAM buffering requirements when writing large Row Groups
The idea is to write a blog in https://arrow.apache.org/blog/ (source in https://github.com/apache/arrow-site) that explains the challenges and how we solved it with Software Engineering rather than a new file format