Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] support concatenate recordbatches. #37895

Closed
Light-City opened this issue Sep 27, 2023 · 0 comments · Fixed by #37896
Closed

[C++] support concatenate recordbatches. #37895

Light-City opened this issue Sep 27, 2023 · 0 comments · Fixed by #37896

Comments

@Light-City
Copy link
Contributor

Describe the enhancement requested

User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.

Therefore, similar to the array concatenate operation, recordbatch also requires

Component(s)

C++

bkietz pushed a commit that referenced this issue Oct 13, 2023
### Rationale for this change

User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.

### What changes are included in this PR?

record_batch.cc
record_batch.h
record_batch_test.cc

### Are these changes tested?

yes, see record_batch_test.cc

### Are there any user-facing changes?

yes

* Closes: #37895

Authored-by: light-city <455954986@qq.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
@bkietz bkietz added this to the 15.0.0 milestone Oct 13, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…ache#37896)

### Rationale for this change

User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.

### What changes are included in this PR?

record_batch.cc
record_batch.h
record_batch_test.cc

### Are these changes tested?

yes, see record_batch_test.cc

### Are there any user-facing changes?

yes

* Closes: apache#37895

Authored-by: light-city <455954986@qq.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ache#37896)

### Rationale for this change

User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.

### What changes are included in this PR?

record_batch.cc
record_batch.h
record_batch_test.cc

### Are these changes tested?

yes, see record_batch_test.cc

### Are there any user-facing changes?

yes

* Closes: apache#37895

Authored-by: light-city <455954986@qq.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ache#37896)

### Rationale for this change

User scenario: When we use acero plan, many smaller batches may be generated through agg and hashjoin. In addition, due to the mpp database, there is data distribution. When there are many segments, each segment data is compared at this time. Small, in order to improve performance, we hope to merge multiple fragmented small batches into one large batch for calculation together.

### What changes are included in this PR?

record_batch.cc
record_batch.h
record_batch_test.cc

### Are these changes tested?

yes, see record_batch_test.cc

### Are there any user-facing changes?

yes

* Closes: apache#37895

Authored-by: light-city <455954986@qq.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants