-
Notifications
You must be signed in to change notification settings - Fork 385
Switch BackingBuffer::VirtualMemory
to Safe Box<[u8]>
Backing
#392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Switch BackingBuffer::VirtualMemory
to Safe Box<[u8]>
Backing
#392
Conversation
# Motivation This PR addresses concerns raised in [microsoft#359](microsoft#359) about the safety and maintainability of manually managed virtual memory using `NonNull<u8>` and pointer arithmetic. By switching to `Box<[u8]>`, we maintain similar memory semantics while gaining safety and reducing code complexity. ## Main Changes - Used a `Box<[u8]>` for the contents of `BackingBuffer::VirtualMemory` instead of a `NonNull`. - Unsafe blocks are now limited to system-level allocation/deallocation and bounds-checked access is the default. - Removed most manual pointer arithmetic and unsafe dereferencing in favor of slice indexing. - Switching to `Box<[u8]>` also allowed me to remove the `text` field from `GapBuffer` which reduces the size of the struct. ## Miscellaneous changes - Pinned down the toolchain to `nightly-2025-04-15` as not all features required to build this crate are available on all versions of nightly. - Added line to clarify `virtual_release` returns memory which will error if accessed before [committed](https://learn.microsoft.com/en-us/windows/win32/Memory/page-state). # Results Benchmarks were obtained by running `cargo bench --bench lib -- --save-baseline before` on main and `cargo bench --bench lib -- --save-baseline box-raw-alloc` on my branch. Results were then exported to JSON and compared using `critcmp before.json box-raw-alloc.json --list`. See [before.json (https://drive.google.com/file/d/1krzmBTmidI8jgXu08EAo_FelmFmKP2i-/view?usp=drive_link) and [box-raw-alloc.json](https://drive.google.com/file/d/1rwud47HCDTiGTh9jqEPtWSChIXiVJqst/view?usp=sharing). - Benchmarks show no statistically significant regression in performance, despite the shift to safe abstractions. - Binary size is 1kb smaller than the current release build on [my machine](https://drive.google.com/file/d/1Ey73e8h9G1L4fmR0cyWgGshcr7QfyGWX/view?usp=drive_link) and 20kb smaller than the download. - All existing tests pass. No regressions were found during local testing.
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rather skeptical of this change. The largest benefit seems to stem from the match &self.buffer
blocks which are responsible for removing most of the unsafe
calls. I think it may be worth exploring that further.
...but doing it like that is not safe. The gap in the gap buffer is akin to the uninitialized end in a Vec
. While accessing it is safe (the memory is committed in the entire buf
range), this has to be done carefully. It's completely uninitialized memory in release builds and contains previous buffer contents. This seems like an improvement, but I'm skeptical if it's a safe improvement.
That aside, there's a bug in the implementation. This PR prompted me to add benchmarks for the GapBuffer
which should help in the future. BTW: There are no buffer benchmarks and there is a 3% perf regression in my testing. But a 3% change is completely fine, because this is still 600% faster than ropey.
rust-toolchain.toml
Outdated
@@ -1,2 +1,3 @@ | |||
[toolchain] | |||
channel = "nightly" | |||
# Doesn't have to be this toolchain but I pinned it down to a concrete one because not all features required to build this crate are availible on all versions of nightly | |||
channel = "nightly-2025-04-15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd prefer not pinning the nightly version as of yet.
src/lib.rs
Outdated
os_string_truncate, | ||
box_vec_non_null, | ||
ptr_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you opened the issue, Linux package maintainers have come forward asking us to build on stable Rust. This means we need to remove all nightly features sooner or later and it would be preferable if this PR didn't introduce new dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can easily remove box_vec_non_null
and ptr_metadata
. However, removing os_string_truncate
causes the path
to error because os_string_truncate
is required for the truncate method of OS string.
I can remove it from my final submission but I am unsure if this issue is present in other versions of the repo/for other people.
I was also unaware it constituted a new dependency since it is present in the binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it as in "taking dependency on a nightly feature", not as an actual cargo dependency. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for clarifying. In this case, I was referring to this feature being enabled in the binary's
root.
At any rate, I am happy to remove the dependency from my next update as my code has no reliance on it. I only included it because otherwise the build fails for me due to path
's reliance on the feature.
src/buffer/gap_buffer.rs
Outdated
BackingBuffer::VirtualMemory(buf) => { | ||
for i in start..end { | ||
buf[i] = 0; | ||
} | ||
} | ||
BackingBuffer::Vec(buf) => { | ||
for i in start..end { | ||
buf[i] = 0; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unacceptable from a debug performance perspective. It needs to continue using a memset fill. It also doesn't set the contents to 0xCD.
src/buffer/gap_buffer.rs
Outdated
let ptr = unsafe { | ||
VALLOCATOR | ||
.allocate(Layout::from_size_align_unchecked(LARGE_CAPACITY, align_of::<u8>())) | ||
.unwrap() | ||
}; | ||
let buf = unsafe { Box::from_non_null_in(ptr, VALLOCATOR) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box<[T]>
only provides new_uninit_slice
too... I wonder how you're supposed to do this. It seems kind of sketchy if I'm honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a good point. As I understand it, Box
provides uninit_slice
because it has no guarantees the memory will be initialized. But since by the time this memory is accessed it has been committed (so it is at least zero initialized this should be ok). MaybeUninit
is mostly for the compiler so it knows not to do certain optimizations but as long as the memory is initialized to some value it should be fine. From the 'nomicon:
Safe Rust doesn't permit you to partially initialize an array. When you initialize an array, you can either set every value to the same thing with let x = [val; N], or you can specify each member individually with let x = [val1, val2, val3]. Unfortunately this is pretty rigid, especially if you need to initialize your array in a more incremental or dynamic way.
My only question/concern is if it needs to be initialized when the memory is allocated or when the array is initialized. As I understand it this array is not really initialized or accessed until the underlying allocation is committed so it should be ok to do this because we never actually access data which is possibly uninitialized in the sense the Rust abstract machine means?
This concern could be alleviated by committing the page alongside allocation as this would guarantee it was initialized at the time the [u8]
was created however my understanding was this was avoided for resource management/performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it this array is not really initialized or accessed until the underlying allocation is committed so it should be ok to do this because we never actually access data which is possibly uninitialized in the sense the Rust abstract machine means?
In my reading this means the array is partially initialized, because the uncommitted parts have undetermined value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, where this may theoretically break (even if contrived), is if Rust implements copy_within
by reading beyond the bounds of the specified range, as long as it ensures the reads are within the slice bounds. Since the slice bounds are partially uncommitted, that may segfault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my mistake, I was under the impression the memory for the entire buffer was committed. Yes this change would break if the array is only partially committed and an attempt was made to access beyond the committed bounds.
In that case, I could implement an additional check the memory is committed before attempts to access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we added checks, I believe that would defeat the purpose of the gap buffer managing its own gap. Rather than trying to build it around slices and Box
, it's possible that an entirely different approach may yield a better abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this makes sense, I will think on this to see if there is a better solution more guaranteed to be sound. Very likely, I can't come up with something better than you had originally.
In the meantime, I'm experimenting with a smart pointer similar to Box<[u8]>
but with bounds checking altered to check against self.committed
instead of self.len
.
Is there a way to access the GapBuffer
bench you mentioned further up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just opened #403 which should merge some time soon.
src/buffer/gap_buffer.rs
Outdated
unsafe { slice::from_raw_parts_mut(self.text.add(self.gap_off).as_ptr(), self.gap_len) } | ||
|
||
match &mut self.buffer { | ||
BackingBuffer::VirtualMemory(buf) => &mut buf[off..off + self.gap_len], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use self.gap_off
here.
…`VirtualMemory` smart pointer.
# Motivation This PR addresses concerns raised in [microsoft#359](microsoft#359) about the safety and maintainability of manually managed virtual memory using `NonNull<u8>` and pointer arithmetic. By switching to `Box<[u8]>`, we maintain similar memory semantics while gaining safety and reducing code complexity. ## Main Changes - Used a `Box<[u8]>` for the contents of `BackingBuffer::VirtualMemory` instead of a `NonNull`. - Unsafe blocks are now limited to system-level allocation/deallocation and bounds-checked access is the default. - Removed most manual pointer arithmetic and unsafe dereferencing in favor of slice indexing. - Switching to `Box<[u8]>` also allowed me to remove the `text` field from `GapBuffer` which reduces the size of the struct. ## Miscellaneous changes - Pinned down the toolchain to `nightly-2025-04-15` as not all features required to build this crate are available on all versions of nightly. - Added line to clarify `virtual_release` returns memory which will error if accessed before [committed](https://learn.microsoft.com/en-us/windows/win32/Memory/page-state). # Results Benchmarks were obtained by running `cargo bench --bench lib -- --save-baseline before` on main and `cargo bench --bench lib -- --save-baseline box-raw-alloc` on my branch. Results were then exported to JSON and compared using `critcmp before.json box-raw-alloc.json --list`. See [before.json (https://drive.google.com/file/d/1krzmBTmidI8jgXu08EAo_FelmFmKP2i-/view?usp=drive_link) and [box-raw-alloc.json](https://drive.google.com/file/d/1rwud47HCDTiGTh9jqEPtWSChIXiVJqst/view?usp=sharing). - Benchmarks show no statistically significant regression in performance, despite the shift to safe abstractions. - Binary size is 1kb smaller than the current release build on [my machine](https://drive.google.com/file/d/1Ey73e8h9G1L4fmR0cyWgGshcr7QfyGWX/view?usp=drive_link) and 20kb smaller than the download. - All existing tests pass. No regressions were found during local testing.
…m/Barca545/edit into backing-buffer-virtual-mem-safety
The only relevant changes for my PR are in I created a |
Only changes which needed to be tracked were these 3. os_string_truncate was removed from the `lib.rs` file on `main` but building fails on this commit because it was forked from an earlier version of `main`.
…m/Barca545/edit into backing-buffer-virtual-mem-safety
Motivation
This PR addresses concerns raised in #359 about the safety and maintainability of manually managed virtual memory using
NonNull<u8>
and pointer arithmetic. By switching toBox<[u8]>
, we maintain similar memory semantics while gaining safety and reducing code complexity.Changes
Main Changes
Box<[u8]>
for the contents ofBackingBuffer::VirtualMemory
instead of aNonNull
.Box<[u8]>
also allowed me to remove thetext
field fromGapBuffer
which reduces the size of the struct.Miscellaneous changes
nightly-2025-04-15
as not all features required to build this crate are available on all versions of nightly.virtual_release
returns memory which will error if accessed before committed.Results
Benchmarks were obtained by running
cargo bench --bench lib -- --save-baseline before
on main andcargo bench --bench lib -- --save-baseline box-raw-alloc
on my branch. Results were then exported to JSON and compared usingcritcmp before.json box-raw-alloc.json --list
. See before.json and box-raw-alloc.json.