-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36773: [C++][Parquet] Avoid calculating prebuffer column bitmap multiple times #36774
GH-36773: [C++][Parquet] Avoid calculating prebuffer column bitmap multiple times #36774
Conversation
|
8a43f41
to
c8e226b
Compare
Also cc @jp0317 |
c8e226b
to
a4208e4
Compare
I wonder that if https://github.com/apache/arrow/pull/36774/files#r1268338192 is possible. Because I guess if no selection or projection, there are all-buffered. Should I do an optimization for all-buffered case, using a |
@pitrou would mind take a look? This patch is small and simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this @mapleFU .
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b31977f. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…map multiple times (apache#36774) ### Rationale for this change According to apache#36192 and apache#36649 . RowGroupReader using a bitmap to control a column-level prebuffer. However, if all columns are selected, this will be a heavy overhead for building a bitmap multiple times. ### What changes are included in this PR? Build `Prebuffer` Bitmap once, and reuse that vector. ### Are these changes tested? no ### Are there any user-facing changes? no * Closes: apache#36773 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…map multiple times (apache#36774) ### Rationale for this change According to apache#36192 and apache#36649 . RowGroupReader using a bitmap to control a column-level prebuffer. However, if all columns are selected, this will be a heavy overhead for building a bitmap multiple times. ### What changes are included in this PR? Build `Prebuffer` Bitmap once, and reuse that vector. ### Are these changes tested? no ### Are there any user-facing changes? no * Closes: apache#36773 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
According to #36192 and #36649 . RowGroupReader using a bitmap to control a column-level prebuffer.
However, if all columns are selected, this will be a heavy overhead for building a bitmap multiple times.
What changes are included in this PR?
Build
Prebuffer
Bitmap once, and reuse that vector.Are these changes tested?
no
Are there any user-facing changes?
no