Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer: a safe utility collection for allocating memory. #103

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#[cfg(test)] extern crate test;
extern crate core;
extern crate traverse;
extern crate alloc;



Expand Down Expand Up @@ -85,5 +86,6 @@ pub mod trie_set {



pub mod util;
pub mod proto;

284 changes: 284 additions & 0 deletions src/util/buffer.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
//! A memory buffer with resizing.

use core::nonzero::NonZero;
use std::num::Int;
use std::ops::Deref;
use std::rt::heap;
use std::mem;

/// A safe wrapper around a heap allocated buffer of Ts, tracking capacity only.
///
/// Buffer makes no promises about the actual contents of this memory, that's up
/// to the user of the structure and can be manipulated using the standard pointer
/// utilities, accessible through the impl of `Deref<Target=*mut T>` for `Buffer<T>`.
///
/// As a result of this hands-off approach, `Buffer`s destructor does not attempt
/// to drop any of the contained elements; the destructor simply frees the contained
/// memory.
///
/// You can think of `Buffer<T>` as an approximation for `Box<[T]>` where the elements
/// are not guaranteed to be valid/initialized. It is meant to be used as a building
/// block for other collections, so they do not have to concern themselves with the
/// minutiae of allocating, reallocating, and deallocating memory.
pub struct Buffer<T> {
buffer: NonZero<*mut T>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this NonZero<Unique<T>>, you can get rid of the unsafe impls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I tried originally, but there is no impl of Zeroable for Unique<T>, so you can't put it in a NonZero.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a libstd bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to make this change when it becomes possible, and also change all the underlying allocating utilities to take NonZero<Unique<T>> instead of NonZero<*mut T>, seems like a really excellent use of using the type system to maintain invariants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making a PR with the impl, by the way. Running tests now.

cap: usize
}

unsafe impl<T: Send> Send for Buffer<T> { }
unsafe impl<T: Sync> Sync for Buffer<T> { }

impl<T> Buffer<T> {
/// Create a new, empty Buffer.
///
/// ```
/// # use collect::util::buffer::Buffer;
///
/// let buffer: Buffer<usize> = Buffer::new();
/// assert_eq!(buffer.capacity(), 0);
/// ```
pub fn new() -> Buffer<T> {
Buffer {
buffer: empty(),
cap: 0
}
}

/// Create a new buffer with space for cap Ts.
///
/// Unlike `std::rt::heap::allocate`, cap == 0 is allowed.
///
/// ```
/// # use collect::util::buffer::Buffer;
///
/// let buffer: Buffer<usize> = Buffer::allocate(128);
/// assert_eq!(buffer.capacity(), 128);
/// ```
pub fn allocate(cap: usize) -> Buffer<T> {
if cap == 0 { return Buffer::new() }

Buffer {
buffer: unsafe { allocate(NonZero::new(cap)) },
cap: cap
}
}

/// Reallocate this buffer to fit a new number of Ts.
///
/// Unlike `std::rt::heap::reallocate`, cap == 0 is allowed.
///
/// ```
/// # use collect::util::buffer::Buffer;
///
/// let mut buffer: Buffer<usize> = Buffer::allocate(128);
/// assert_eq!(buffer.capacity(), 128);
///
/// buffer.reallocate(1024);
/// assert_eq!(buffer.capacity(), 1024);
/// ```
pub fn reallocate(&mut self, cap: usize) {
if self.cap == 0 || cap == 0 {
// Safe to drop the old buffer because either it never
// allocated or we're getting rid of the allocation.
*self = Buffer::allocate(cap)
} else {
// We need to set the capacity to 0 because if the capacity
// overflows unwinding is triggered, which if we don't
// change the capacity would try to free empty().
let old_cap = mem::replace(&mut self.cap, 0);
let buffer = mem::replace(&mut self.buffer, empty());

self.buffer = unsafe {
reallocate(buffer, NonZero::new(old_cap),
NonZero::new(cap))
};
self.cap = cap;
}
}

/// Get the current capacity of the Buffer.
///
/// ```
/// # use collect::util::buffer::Buffer;
///
/// let buffer: Buffer<usize> = Buffer::allocate(128);
/// assert_eq!(buffer.capacity(), 128);
/// ```
pub fn capacity(&self) -> usize {
self.cap
}
}

#[unsafe_destructor]
impl<T> Drop for Buffer<T> {
/// Safety Note:
///
/// The Buffer will *only* deallocate the contained memory. It will
/// *not* run any destructors on data in that memory.
fn drop(&mut self) {
if self.cap == 0 { return }
unsafe { deallocate(self.buffer, NonZero::new(self.cap)) }
}
}

impl<T> Deref for Buffer<T> {
type Target = *mut T;

fn deref(&self) -> &*mut T {
&*self.buffer
}
}

// Typed Allocation Utilities
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be pub? My intuition is no - since you should just use Buffer, but maybe there is utility in having them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think definitely no now.

//
// Unlike std::rt::heap these check for zero-sized types, capacity overflow,
// oom etc. and calculate the appropriate size and alignment themselves.

unsafe fn allocate<T>(cap: NonZero<usize>) -> NonZero<*mut T> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm... not sure this is what NonZero is intended to be used for.

Like, you can, obviously.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides the invariant that the pointer returned by allocate is never null, i.e. allocate can never fail except by aborting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or did you mean cap? That's just encoding the invariant that capacity must never be 0 in the type system.

if mem::size_of::<T>() == 0 { return empty() }

// Allocate
let ptr = heap::allocate(allocation_size::<T>(cap), mem::align_of::<T>());

// Check for allocation failure
if ptr.is_null() { ::alloc::oom() }

NonZero::new(ptr as *mut T)
}

unsafe fn reallocate<T>(ptr: NonZero<*mut T>,
old_cap: NonZero<usize>,
new_cap: NonZero<usize>) -> NonZero<*mut T> {
if mem::size_of::<T>() == 0 { return empty() }

let old_size = unchecked_allocation_size::<T>(old_cap);
let new_size = allocation_size::<T>(new_cap);

// Reallocate
let new = heap::reallocate(*ptr as *mut u8, old_size, new_size, mem::align_of::<T>());

// Check for allocation failure
if new.is_null() {
// Cleanup old buffer.
deallocate(ptr, old_cap);

::alloc::oom()
}

NonZero::new(new as *mut T)
}

unsafe fn deallocate<T>(ptr: NonZero<*mut T>, cap: NonZero<usize>) {
if mem::size_of::<T>() == 0 { return }

let old_size = unchecked_allocation_size::<T>(cap);

heap::deallocate(*ptr as *mut u8, old_size, mem::align_of::<T>())
}

fn allocation_size<T>(cap: NonZero<usize>) -> usize {
mem::size_of::<T>().checked_mul(*cap).expect("Capacity overflow")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is kinda a Problem Rust Has. Technically overflowing a uint isn't sufficient. Anything bigger than int::MAX is going to cause some UB because pointer offsets will overflow. Technically possible to create such a thing without OOM on e.g. x32 (32-bit mode binary compiled for a 64-bit platform).

This isn't something you really need to address. It just continues to distress me. 😿

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just check for > isize::MAX and reject that.

}

fn unchecked_allocation_size<T>(cap: NonZero<usize>) -> usize {
mem::size_of::<T>() * (*cap)
}

fn empty<T>() -> NonZero<*mut T> {
unsafe { NonZero::new(heap::EMPTY as *mut T) }
}

#[cfg(test)]
mod test {
use std::ptr;

use util::buffer::Buffer;
use super::empty;

#[test]
fn test_empty() {
let buffer: Buffer<usize> = Buffer::new();
assert_eq!(buffer.cap, 0);
assert_eq!(buffer.buffer, empty());
}

#[test]
fn test_allocate() {
let buffer: Buffer<usize> = Buffer::allocate(8);

assert_eq!(buffer.cap, 8);

unsafe {
ptr::write(buffer.offset(0), 8);
ptr::write(buffer.offset(1), 4);
ptr::write(buffer.offset(3), 5);
ptr::write(buffer.offset(5), 3);
ptr::write(buffer.offset(7), 6);

assert_eq!(ptr::read(buffer.offset(0)), 8);
assert_eq!(ptr::read(buffer.offset(1)), 4);
assert_eq!(ptr::read(buffer.offset(3)), 5);
assert_eq!(ptr::read(buffer.offset(5)), 3);
assert_eq!(ptr::read(buffer.offset(7)), 6);
};

// Try a large buffer
let buffer: Buffer<usize> = Buffer::allocate(1024 * 1024);

unsafe {
ptr::write(buffer.offset(1024 * 1024 - 1), 12);
assert_eq!(ptr::read(buffer.offset(1024 * 1024 - 1)), 12);
};
}

#[test]
fn test_reallocate() {
let mut buffer: Buffer<usize> = Buffer::allocate(8);
assert_eq!(buffer.cap, 8);

buffer.reallocate(16);
assert_eq!(buffer.cap, 16);

unsafe {
// Put some data in the buffer
ptr::write(buffer.offset(0), 8);
ptr::write(buffer.offset(1), 4);
ptr::write(buffer.offset(5), 3);
ptr::write(buffer.offset(7), 6);
};

// Allocate so in-place fails.
let _: Buffer<usize> = Buffer::allocate(128);

buffer.reallocate(32);

unsafe {
// Ensure the data is still there.
assert_eq!(ptr::read(buffer.offset(0)), 8);
assert_eq!(ptr::read(buffer.offset(1)), 4);
assert_eq!(ptr::read(buffer.offset(5)), 3);
assert_eq!(ptr::read(buffer.offset(7)), 6);
};
}

#[test]
#[should_fail = "Capacity overflow."]
fn test_allocate_capacity_overflow() {
let _: Buffer<usize> = Buffer::allocate(10_000_000_000_000_000_000);
}

#[test]
#[should_fail = "Capacity overflow."]
fn test_fresh_reallocate_capacity_overflow() {
let mut buffer: Buffer<usize> = Buffer::new();
buffer.reallocate(10_000_000_000_000_000_000);
}

#[test]
#[should_fail = "Capacity overflow."]
fn test_reallocate_capacity_overflow() {
let mut buffer: Buffer<usize> = Buffer::allocate(128);
buffer.reallocate(10_000_000_000_000_000_000);
}
}

3 changes: 3 additions & 0 deletions src/util/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/// Utilities for implementing other data structures.

pub mod buffer;