-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
### Rationale for this change The FLBA implementation of RecordReader is suboptimal: * it doesn't preallocate the output array * it reads the decoded validity bitmap one bit at a time and recreates it, one bit at a time ### What changes are included in this PR? Optimize the FLBA implementation of RecordReader so as to avoid the aforementioned inefficiencies. I did a quick-and-dirty benchmark on a Parquet file with two columns: * column 1: uncompressed, PLAIN-encoded, FLBA<3> with no nulls * column 2: uncompressed, PLAIN-encoded, FLBA<3> with 25% nulls With git main, the file can be read at 465 MB/s. With this PR, the file can be read at 700 MB/s. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #39122 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
- Loading branch information
Showing
1 changed file
with
50 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters