-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move BufferBuilder
to arrow-buffer
#4630
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, I have been toying with the idea of deprecating this in favour of Vec FWIW
Makes sense. Until then, I'm probably going to do some follow-up PRs for additional trait implementations for this builder. |
Could I possibly inquire as to your use-case? Is there some reason you wish to stick with using BufferBuilder and MutableBuffer? |
I'm working on an Arrow implementation for use-cases where array schemas are known at compile time. This allows deriving statically typed arrays for Rust types that provide efficient array-of-structs to struct-of-arrays conversion - and vice versa.
No. The arrays are generic over their inner buffer types, so to provide interop with |
Very cool, this sort of use-case is definitely quite cumbersome currently, having both will be very nice 👌 I am curious if one could collect directly into the arrow-rs builders, i.e. a derive macro that builds something akin to this. Mainly wondering if I can tempt you into implementing such a thing 😅 |
That's exactly what I'm building. The following test already passes: #[derive(ArrayType, Default)]
struct MyRow {
i32: i32,
optional_i32: Option<i32>,
string: Option<String>,
i32_list: Option<Vec<Option<i32>>>,
}
#[test]
fn my_row() {
// non-nullable
let input = [
MyRow {
i32: 1234,
optional_i32: None,
string: Some("hello world".to_string()),
i32_list: Some(vec![Some(4321), None]),
},
MyRow {
i32: 1234,
optional_i32: Some(42),
string: Some("hello world".to_string()),
i32_list: Some(vec![Some(1), None, Some(42)]),
},
];
let array = input.into_iter().collect::<StructArray<MyRow, false>>();
assert_eq!(array.len(), 2);
assert_eq!(array.0.i32.0.as_slice(), &[1234, 1234]);
assert_eq!(array.0.optional_i32.0.as_ref().as_slice(), &[0, 42]);
assert_eq!(
array
.0
.optional_i32
.0
.bitmap_ref()
.into_iter()
.collect::<Vec<_>>(),
&[false, true]
);
assert_eq!(
array.0.string.0.data.0.as_slice(),
"hello worldhello world".as_bytes()
);
assert_eq!(array.0.i32_list.0.data.0.as_ref(), &[4321, 0, 1, 0, 42]);
assert_eq!(array.0.i32_list.0.offsets.as_ref(), &[0, 2, 5]);
// nullable
let input = [
Some(MyRow {
i32: 1234,
optional_i32: None,
string: Some("hello world".to_string()),
i32_list: Some(vec![Some(4321), None]),
}),
None,
Some(MyRow {
i32: 1234,
optional_i32: None,
string: Some("hello world".to_string()),
i32_list: Some(vec![Some(4321), None]),
}),
];
let array = input.into_iter().collect::<StructArray<MyRow, true>>();
assert_eq!(array.len(), 3);
} Please note that I didn't work on access methods yet ( |
Oh ok, the repository description seemed to imply it was looking to define a new set of arrays, as opposed to just using the existing arrow-rs abstractions as in the linked docs. |
Yes, it is defining different abstractions for generic arrays to support fully statically typed arrays (no trait objects). However, the underlying buffers follow the Arrow physical memory layouts, so it's possible (and I'm planning) to add methods for zero-copy conversion to |
Which issue does this PR close?
None.
Rationale for this change
BufferBuilder
builds buffers, so it makes sense to move it to thearrow-buffer
crate. It also only requires types from that crate.What changes are included in this PR?
Moved
BufferBuilder
fromarrow_array::builder
toarrow_buffer::builder
. Add re-export inarrow_array::builder
to prevent a breaking change.Are there any user-facing changes?
Yes,
BufferBuilder
is now available to users of thearrrow-buffer
crate.