Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support slice view data on writing IPC #5610

Closed
wants to merge 1 commit into from

Conversation

ariesdevil
Copy link
Contributor

Which issue does this PR close?

Part of #5506.

Rationale for this change

Support slice view data on writing IPC.

What changes are included in this PR?

Support slice view data on writing IPC.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 9, 2024
@tustvold
Copy link
Contributor

tustvold commented Apr 9, 2024

I believe the intention of this code is to avoid writing out data buffers that aren't referenced by any views, perhaps as a result of slicing. I think this is probably better handled as a more general "compact" operation, that also handles pruning the data buffers themselves. I believe this is tracked by #5513

I'm not sure whether the IPC writer should be doing anything smart here?

Perhaps @alamb has thoughts on this

@ariesdevil
Copy link
Contributor Author

Maybe we can just slice views here and hand over the data buffer to gc for processing. cc @alamb

@alamb
Copy link
Contributor

alamb commented Apr 9, 2024

I believe the intention of this code is to avoid writing out data buffers that aren't referenced by any views, perhaps as a result of slicing.

That is my interpretation too -- @ariesdevil can you confirm?

I think this is probably better handled as a more general "compact" operation, that also handles pruning the data buffers themselves. I believe this is tracked by #5513

Yes, that is my suggestion. The rationale for a separate gc operation is:

  1. Since it is potentially expensive I think the user should decide when to do the check, rather than on each IPC call. I think some applications might have different tradeoffs between compacting / pruning and saving IPC space.
  2. gc is more generally useful for operations other than IPC (consolidating memory down after a filter, for example)

Having a variation of gc that simply prunes unreferenced buffers (but not copy) might also be interesting

@ariesdevil
Copy link
Contributor Author

Make sense, thanks @alamb @tustvold

@ariesdevil ariesdevil closed this Apr 10, 2024
@ariesdevil ariesdevil deleted the ipc branch April 10, 2024 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants