-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Simplify IPC writer for dense unions #32613
Milestone
Comments
Antoine Pitrou / @pitrou: for i, c := range codes {
if offsets[c] == -1 {
// offsets are guaranteed to be increasing according to the spec
// so the first offset we find for a child is the initial offset
// and will become the "0" for this child.
offsets[c] = unshiftedOffsets[i]
shiftedOffsets[i] = 0
} else {
shiftedOffsets[i] = unshiftedOffsets[i] - offsets[c]
}
lengths[c] = maxI32(lengths[c], shiftedOffsets[i]+1)
} |
Hi, I'd like to work on this if it is still available. Thanks. |
kou
pushed a commit
that referenced
this issue
Feb 2, 2023
JIRA: https://issues.apache.org/jira/browse/ARROW-17339 Closes: #32613 ### Rationale for this change Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved. ### What changes are included in this PR? Just a change to eliminate looping over the size of a `DenseUnionArray` twice. ### Are these changes tested? I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice. ### Are there any user-facing changes? There are no user facing changes for this. * Closes: #32613 Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com> Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
sjperkins
pushed a commit
to sjperkins/arrow
that referenced
this issue
Feb 10, 2023
) JIRA: https://issues.apache.org/jira/browse/ARROW-17339 Closes: apache#32613 ### Rationale for this change Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved. ### What changes are included in this PR? Just a change to eliminate looping over the size of a `DenseUnionArray` twice. ### Are these changes tested? I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice. ### Are there any user-facing changes? There are no user facing changes for this. * Closes: apache#32613 Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com> Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
gringasalpastor
pushed a commit
to gringasalpastor/arrow
that referenced
this issue
Feb 17, 2023
) JIRA: https://issues.apache.org/jira/browse/ARROW-17339 Closes: apache#32613 ### Rationale for this change Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved. ### What changes are included in this PR? Just a change to eliminate looping over the size of a `DenseUnionArray` twice. ### Are these changes tested? I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice. ### Are there any user-facing changes? There are no user facing changes for this. * Closes: apache#32613 Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com> Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
fatemehp
pushed a commit
to fatemehp/arrow
that referenced
this issue
Feb 24, 2023
) JIRA: https://issues.apache.org/jira/browse/ARROW-17339 Closes: apache#32613 ### Rationale for this change Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved. ### What changes are included in this PR? Just a change to eliminate looping over the size of a `DenseUnionArray` twice. ### Are these changes tested? I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice. ### Are there any user-facing changes? There are no user facing changes for this. * Closes: apache#32613 Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com> Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In ARROW-10580 we fixed the Arrow C++ implementation so that dense union offsets are always (non-strictly) monotonic for a given child, as mandated by the spec.
The IPC writer implementation, however, still assumes that dense union offsets may be in any order:
arrow/cpp/src/arrow/ipc/writer.cc
Lines 476 to 485 in 5719576
This can probably be simplified, making it slightly less costly to emit a sliced union array.
Reporter: Antoine Pitrou / @pitrou
Related issues:
Note: This issue was originally created as ARROW-17339. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: