Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy-On-Write Support to Support Mutating/Updating Arrays in Place #1981

Closed
tustvold opened this issue Jun 30, 2022 · 9 comments · Fixed by #3326
Closed

Copy-On-Write Support to Support Mutating/Updating Arrays in Place #1981

tustvold opened this issue Jun 30, 2022 · 9 comments · Fixed by #3326
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Certain use-cases may benefit from being able to mutate arrays in place, without copying them. Fortunately we have most of the pieces to support this, it just requires some plumbing work.

Describe the solution you'd like

A high-level outline can be found below:

It should then be possible to do something like (likely encapsulated in a nicer interface)

fn cow_builder(col: ArrayRef) -> Int32Builder {
    let col: Arc<Int32Array> = downcast_array(col).unwrap();
    let col: Int32Array = Arc::try_unwrap(col).unwrap_or_else(|_| col.as_ref().clone());
    col.into_builder()
}

Additional context

Arrow2 recently merged a form of support for this.

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog help wanted labels Jun 30, 2022
@alamb
Copy link
Contributor

alamb commented Jul 1, 2022

Related arrow2 code: Example https://github.com/jorgecarleitao/arrow2/blob/main/examples/cow.rs#L23

Looks like it was added by @jorgecarleitao in jorgecarleitao/arrow2#1042 (which was a bit hard to find as it is described in terms of the implementation rather than the end user API change) 👍

@alamb alamb changed the title Copy-On-Write Support Copy-On-Write Support to Support Mutating/Updating Arrays in Place Jul 1, 2022
@avantgardnerio
Copy link
Contributor

avantgardnerio commented Nov 15, 2022

Instead of copying-on-write (or maybe in addition to), how would we feel about an AppendOnlyArrayBuilder that never calls any method that would cause MutableBuffer to realloc() and move any pointers?

If we had that, we could have AppendableRecordBatchs, and a borrow() -> &RecordBatch (as opposed to .finish()) that would allow us to append (but not mutate) data, while still being able to do all the fun stuff we can do with RecordBatches today (query in DataFusion, export to parquet, etc).

@tustvold
Copy link
Contributor Author

I think that is a separate feature, would you mind creating a new issue along with the intended use-case? I suspect it concerns implementing some sort of append only memtable, something I have thoughts on having implemented something similar for IOx (spoiler mem copies are far from the bottleneck).

@avantgardnerio
Copy link
Contributor

@tustvold @alamb @andygrove I filed an issue with PoC implementation: #3142

@viirya
Copy link
Member

viirya commented Dec 5, 2022

For COW, the high-level outline I think we have corresponding implementation now. Is there anything I miss?

@tustvold
Copy link
Contributor Author

tustvold commented Dec 6, 2022

Theoretically it might be nice to provide into_buffer for ByteArray, but otherwise I think we can close this

@viirya
Copy link
Member

viirya commented Dec 7, 2022

For into_buffer, do you mean a MutableBuffer of a ByteArray so we can write ByteArrayType data into it?

@tustvold
Copy link
Contributor Author

tustvold commented Dec 7, 2022

Sorry I meant into_builder for ByteArray

@alamb
Copy link
Contributor

alamb commented Jan 1, 2023

🎉 thank you @viirya -- cc @avantgardnerio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants