-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10692: [Rust] Removed undefined behavior derived from null pointers #8997
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8997 +/- ##
=======================================
Coverage 82.61% 82.61%
=======================================
Files 202 202
Lines 50048 50055 +7
=======================================
+ Hits 41347 41354 +7
Misses 8701 8701
Continue to review full report at Codecov.
|
@vertexclique , would you mind taking a look at this? You are an expert around these. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @jorgecarleitao. I can't say I am an expert in this code / area of rust, but I read this PR carefully and I think I understand all the changes and they make sense to me.
In your PR description, you are basically saying this code is bringing MutableBuffer
closer to the implementation of Vec
... I wonder then, how is the work in this PR related to #8796 (aka actually using Vec<u8>
as the underlying memory source)?
/// This function panics if: | ||
/// * `ptr` is null | ||
/// * `ptr` is not aligned to a slice of type `T`. This is guaranteed if it was built from a slice of type `T`. | ||
pub(super) unsafe fn new(ptr: *const u8) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change -- to move the alignment assertion into the RawPtrBox::new
as that now makes the callsites clearer as well as they can't forget to ensure alignment. 👍
@@ -919,24 +962,18 @@ mod tests { | |||
|
|||
#[test] | |||
fn test_from_raw_parts() { | |||
let buf = unsafe { Buffer::from_raw_parts(null_mut(), 0, 0) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this test removed? Because it is no longer possible to create a buffer from a null pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly :)
@alamb , thanks a lot for taking the time to review this. This one is challenging to review. wrt to the Basically, we have a performance problem in the reallocation code. The following is the result of 4 runs:
The fact that there is no difference between 1 and 2 but a 3.5x difference between 3 and 4 shows that we are doing something wrong. |
I am going to check out this code and have a look today @jorgecarleitao |
The full set of Rust CI tests did not run on this PR :( Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do? I apologize for the inconvenience. |
@Dandandan @alamb @nevi-me @jhorstmann : this was merged and had some changes to the names of public methods of the Buffer and MutableBuffer (they are now similar to the ones in |
// * The pointers are non-null by construction | ||
// * alignment asserted above | ||
// Unsoundness | ||
// * There is no guarantee that the memory regions do are non-overalling, but `memcpy` requires this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since buffer
is freshly allocated it should not be possible for the memory regions to overlap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This was copied from a previous version and it slipped. :/
…ters Currently, our allocation code is not guaranteeing that the `std::mem::alloc` was successful, by checking for whether the returned pointer was not null. Passing null pointers to buffers is dangerous, specially given that Buffers currently expose them without any checks. This PR is a series of modifications that removes the possibility of having null pointers: * Made most of our pointers `NonNull` and panic whenever a null pointer tries to sneak to a buffer (either via FFI or a failed allocation) * Guard against overflow of a pointer address during allocations (relevant for 32 bit systems) * remove the possibility of a null pointer to be on `RawPtrBox`, flags `RawPtrBox::new` as `unsafe` and documents the invariants necessary to a sound usage of `RawPtrBox`. * Made all methods in `memory` expect and output a `NonNull` All these changes were highly motivated by the code in Rust's `std::alloc`, and how it deals with these edge cases. The main consequence of these changes is that our buffers no longer hold null pointers, which allow us to implement `Deref<[u8]>` (done in this PR), and treat `Buffer` as very similar to an immutable `Vec<u8>` (and `MutableBuffer` closer to `Vec<u8>`). In this direction, this PR renames a bunch of methods: * `MutableBuffer::data_mut -> MutableBuffer::as_slice_mut` * `MutableBuffer::data -> MutableBuffer::as_slice` * `Buffer::data -> Buffer::as_slice` * `Buffer::raw_data -> Buffer::as_ptr` * `RawPtrBox::get -> RawPtrBox::as_ptr` The rational for these names come from `Vec::as_slice_mut`, `Vec::as_slice`, `Vec::as_ptr` and `NonNull::as_ptr` respectively. Closes apache#8997 from jorgecarleitao/clean_buffer Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Currently, our allocation code is not guaranteeing that the
std::mem::alloc
was successful, by checking for whether the returned pointer was not null. Passing null pointers to buffers is dangerous, specially given that Buffers currently expose them without any checks.This PR is a series of modifications that removes the possibility of having null pointers:
NonNull
and panic whenever a null pointer tries to sneak to a buffer (either via FFI or a failed allocation)RawPtrBox
, flagsRawPtrBox::new
asunsafe
and documents the invariants necessary to a sound usage ofRawPtrBox
.memory
expect and output aNonNull
All these changes were highly motivated by the code in Rust's
std::alloc
, and how it deals with these edge cases.The main consequence of these changes is that our buffers no longer hold null pointers, which allow us to implement
Deref<[u8]>
(done in this PR), and treatBuffer
as very similar to an immutableVec<u8>
(andMutableBuffer
closer toVec<u8>
). In this direction, this PR renames a bunch of methods:MutableBuffer::data_mut -> MutableBuffer::as_slice_mut
MutableBuffer::data -> MutableBuffer::as_slice
Buffer::data -> Buffer::as_slice
Buffer::raw_data -> Buffer::as_ptr
RawPtrBox::get -> RawPtrBox::as_ptr
The rational for these names come from
Vec::as_slice_mut
,Vec::as_slice
,Vec::as_ptr
andNonNull::as_ptr
respectively.