Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Acero] union node output batches should be unordered #39045

Closed
niyue opened this issue Dec 2, 2023 · 0 comments · Fixed by #39046
Closed

[C++][Acero] union node output batches should be unordered #39045

niyue opened this issue Dec 2, 2023 · 0 comments · Fixed by #39046
Assignees
Milestone

Comments

@niyue
Copy link
Contributor

niyue commented Dec 2, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Acero's union node may have multiple input nodes that have ordered output, so the union node's input batches may contain batch index from previous nodes. However, the union node output doesn't guarantee any order, so it should clear the batch index if it has multiple input nodes, so that downstream node won't be confused by its input when ordering is a concern.

According to the doc for batch index property:

/// This index must be strictly monotonic starting at 0 without gaps or
/// it can be set to kUnsequencedIndex if there is no meaningful order
int64_t index = kUnsequencedIndex;

The downstream is expected to receive only strictly monotonic starting at 0 without gaps, but for a union node with multiple ordered input nodes, it will produce duplicated batch indexes, which is not expected.

Component(s)

C++

@niyue niyue added the Type: bug label Dec 2, 2023
westonpace pushed a commit that referenced this issue Dec 5, 2023
…39046)

### Rationale for this change
Acero's union node produce duplicated batch index if having multiple ordered input nodes, which makes down stream nodes unable to process these batches if ordering is a concern. This PR tries to address this issue.

### What changes are included in this PR?
This PR fixes this issue by setting the index to unsequenced if the order cannot be guaranteed.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No
* Closes: #39045

Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 15.0.0 milestone Dec 5, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…dered (apache#39046)

### Rationale for this change
Acero's union node produce duplicated batch index if having multiple ordered input nodes, which makes down stream nodes unable to process these batches if ordering is a concern. This PR tries to address this issue.

### What changes are included in this PR?
This PR fixes this issue by setting the index to unsequenced if the order cannot be guaranteed.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No
* Closes: apache#39045

Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants