You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
Acero's union node may have multiple input nodes that have ordered output, so the union node's input batches may contain batch index from previous nodes. However, the union node output doesn't guarantee any order, so it should clear the batch index if it has multiple input nodes, so that downstream node won't be confused by its input when ordering is a concern.
/// This index must be strictly monotonic starting at 0 without gaps or
/// it can be set to kUnsequencedIndex if there is no meaningful order
int64_t index = kUnsequencedIndex;
The downstream is expected to receive only strictly monotonic starting at 0 without gaps, but for a union node with multiple ordered input nodes, it will produce duplicated batch indexes, which is not expected.
Component(s)
C++
The text was updated successfully, but these errors were encountered:
…39046)
### Rationale for this change
Acero's union node produce duplicated batch index if having multiple ordered input nodes, which makes down stream nodes unable to process these batches if ordering is a concern. This PR tries to address this issue.
### What changes are included in this PR?
This PR fixes this issue by setting the index to unsequenced if the order cannot be guaranteed.
### Are these changes tested?
Yes
### Are there any user-facing changes?
No
* Closes: #39045
Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
…dered (apache#39046)
### Rationale for this change
Acero's union node produce duplicated batch index if having multiple ordered input nodes, which makes down stream nodes unable to process these batches if ordering is a concern. This PR tries to address this issue.
### What changes are included in this PR?
This PR fixes this issue by setting the index to unsequenced if the order cannot be guaranteed.
### Are these changes tested?
Yes
### Are there any user-facing changes?
No
* Closes: apache#39045
Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
Acero's union node may have multiple input nodes that have ordered output, so the union node's input batches may contain batch index from previous nodes. However, the union node output doesn't guarantee any order, so it should clear the batch index if it has multiple input nodes, so that downstream node won't be confused by its input when ordering is a concern.
According to the doc for batch index property:
arrow/cpp/src/arrow/compute/exec.h
Lines 223 to 225 in 47dadb0
The downstream is expected to receive only
strictly monotonic starting at 0 without gaps
, but for a union node with multiple ordered input nodes, it will produce duplicated batch indexes, which is not expected.Component(s)
C++
The text was updated successfully, but these errors were encountered: