-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] support concatenate recordbatches. #37895
Comments
bkietz
pushed a commit
that referenced
this issue
Oct 13, 2023
### Rationale for this change User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together. ### What changes are included in this PR? record_batch.cc record_batch.h record_batch_test.cc ### Are these changes tested? yes, see record_batch_test.cc ### Are there any user-facing changes? yes * Closes: #37895 Authored-by: light-city <455954986@qq.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
JerAguilon
pushed a commit
to JerAguilon/arrow
that referenced
this issue
Oct 23, 2023
…ache#37896) ### Rationale for this change User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together. ### What changes are included in this PR? record_batch.cc record_batch.h record_batch_test.cc ### Are these changes tested? yes, see record_batch_test.cc ### Are there any user-facing changes? yes * Closes: apache#37895 Authored-by: light-city <455954986@qq.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…ache#37896) ### Rationale for this change User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together. ### What changes are included in this PR? record_batch.cc record_batch.h record_batch_test.cc ### Are these changes tested? yes, see record_batch_test.cc ### Are there any user-facing changes? yes * Closes: apache#37895 Authored-by: light-city <455954986@qq.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…ache#37896) ### Rationale for this change User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together. ### What changes are included in this PR? record_batch.cc record_batch.h record_batch_test.cc ### Are these changes tested? yes, see record_batch_test.cc ### Are there any user-facing changes? yes * Closes: apache#37895 Authored-by: light-city <455954986@qq.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.
Therefore, similar to the array concatenate operation, recordbatch also requires
Component(s)
C++
The text was updated successfully, but these errors were encountered: