-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make FFI support optional, change APIs to be safe
(#2302)
#2303
Conversation
/// Creates a new array from two FFI pointers. Used to import arrays from the C Data Interface | ||
/// # Safety | ||
/// Assumes that these pointers represent valid C Data Interfaces, both in memory | ||
/// representation and lifetime via the `release` mechanism. | ||
pub unsafe fn make_array_from_raw( | ||
array: *const ffi::FFI_ArrowArray, | ||
schema: *const ffi::FFI_ArrowSchema, | ||
) -> Result<ArrayRef> { | ||
let array = ffi::ArrowArray::try_from_raw(array, schema)?; | ||
let data = ArrayData::try_from(array)?; | ||
Ok(make_array(data)) | ||
} | ||
|
||
/// Exports an array to raw pointers of the C Data Interface provided by the consumer. | ||
/// # Safety | ||
/// Assumes that these pointers represent valid C Data Interfaces, both in memory | ||
/// representation and lifetime via the `release` mechanism. | ||
/// | ||
/// This function copies the content of two FFI structs [ffi::FFI_ArrowArray] and | ||
/// [ffi::FFI_ArrowSchema] in the array to the location pointed by the raw pointers. | ||
/// Usually the raw pointers are provided by the array data consumer. | ||
pub unsafe fn export_array_into_raw( | ||
src: ArrayRef, | ||
out_array: *mut ffi::FFI_ArrowArray, | ||
out_schema: *mut ffi::FFI_ArrowSchema, | ||
) -> Result<()> { | ||
let data = src.data(); | ||
let array = ffi::FFI_ArrowArray::new(data); | ||
let schema = ffi::FFI_ArrowSchema::try_from(data.data_type())?; | ||
|
||
std::ptr::write_unaligned(out_array, array); | ||
std::ptr::write_unaligned(out_schema, schema); | ||
|
||
Ok(()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are moved to the ffi module to make the feature flags easier
#[deprecated( | ||
note = "use from_custom_allocation instead which makes it clearer that the allocation is in fact owned" | ||
)] | ||
pub unsafe fn from_unowned( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was already deprecated, so I opted to just remove it
/// # Memory Leaks | ||
/// This method releases `buffers`. Consumers of this struct *must* call `release` before | ||
/// releasing this struct, or contents in `buffers` leak. | ||
pub fn try_new(data: ArrayData) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method wasn't actually unsafe, and was being called by safe methods previously without any additional checks #2301
@@ -216,15 +214,6 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual { | |||
self.data_ref().get_array_memory_size() + std::mem::size_of_val(self) | |||
- std::mem::size_of::<ArrayData>() | |||
} | |||
|
|||
/// returns two pointers that represent this array in the C Data Interface (FFI) | |||
fn to_raw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a compelling reason for this API to exist, ultimately the use of std::any::Any
means users can't implement Array
for custom types, and I can't see why arrow-rs would ever make use of this extension point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this to_raw
usage, ArrowArray
provides into_raw
which should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use ArrowArray.into_raw
actually.
safe
(#2302)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of an optional flag looks good to me. I don't feel comfortable reviewing the FFI changes as I am not familiar with that code
# force_validate runs full data validation for all arrays that are created | ||
# this is not enabled by default as it is too computationally expensive | ||
# but is run as part of our CI checks | ||
force_validate = [] | ||
# Enable ffi support | ||
ffi = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I was going to recommend adding this to the "list of features of this crate" but then it turns out we don't seem to have any docs for them 🤦
@@ -216,15 +214,6 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual { | |||
self.data_ref().get_array_memory_size() + std::mem::size_of_val(self) | |||
- std::mem::size_of::<ArrayData>() | |||
} | |||
|
|||
/// returns two pointers that represent this array in the C Data Interface (FFI) | |||
fn to_raw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good direction to me. I will take a close look later.
Benchmark runs are scheduled for baseline = 4b15b7e and contender = d87f6a4. d87f6a4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2301
Closes #2302
Rationale for this change
Most applications don't need FFI support, and putting this behind a feature flag allows them to reduce their dependency footprint
What changes are included in this PR?
Are there any user-facing changes?
Yes, ffi is no longer enabled by default, and
Array::to_raw
is removed