-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C#] Slice value buffers when writing slices of list arrays and binary arrays #41225
Comments
CurtHagenlocher
pushed a commit
that referenced
this issue
Apr 16, 2024
… arrays in IPC format (#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: #41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
Issue resolved by pull request 41230 |
raulcd
pushed a commit
that referenced
this issue
Apr 29, 2024
… arrays in IPC format (#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: #41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
tolleybot
pushed a commit
to tmct/arrow
that referenced
this issue
May 2, 2024
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
tolleybot
pushed a commit
to tmct/arrow
that referenced
this issue
May 4, 2024
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
rok
pushed a commit
to tmct/arrow
that referenced
this issue
May 8, 2024
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
rok
pushed a commit
to tmct/arrow
that referenced
this issue
May 8, 2024
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
vibhatha
pushed a commit
to vibhatha/arrow
that referenced
this issue
May 25, 2024
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
This is a follow up to #40517, which made round-tripping of sliced arrays via the IPC format work. When writing list arrays and binary/string arrays the full values buffer is written and the value offsets and validity buffer are sliced, but to reduce file sizes the value buffer should be sliced and the value offsets adjusted. Eg. see the C++ implementation.
Component(s)
C#
The text was updated successfully, but these errors were encountered: