Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer: a safe utility collection for allocating memory. #103

Closed
wants to merge 4 commits into from

Conversation

reem
Copy link
Collaborator

@reem reem commented Jan 27, 2015

Buffer provides an entirely safe interface for managing memory within other
collections. It provides allocate, reallocate, and deallocate functionality
and obeys RAII principles (deallocate on drop).

This can be used to simplify the implementation of any data structure which
does allocation not through another structure like Vec.

From the docs:

A safe wrapper around a heap allocated buffer of Ts, tracking capacity only.

Buffer makes no promises about the actual contents of this memory, that's up
to the user of the structure and can be manipulated using the standard pointer
utilities, accessible through the impl of Deref<Target=*mut T> for Buffer<T>.

As a result of this hands-off approach, Buffers destructor does not attempt
to drop any of the contained elements; the destructor simply frees the contained
memory.

You can think of Buffer<T> as an approximation for Box<[T]> where the elements
are not guaranteed to be valid/initialized. It is meant to be used as a building
block for other collections, so they do not have to concern themselves with the
minutiae of allocating, reallocating, and deallocating memory.

Buffer provides an entirely safe interface for managing memory within other
collections. It provides allocate, reallocate, and deallocate functionality
and obeys RAII principles (deallocate on drop).

This can be used to simplify the implementation of any data structure which
does allocation not through another structure like Vec.

From the docs:

A safe wrapper around a heap allocated buffer of Ts, tracking capacity only.

Buffer makes no promises about the actual contents of this memory, that's up
to the user of the structure and can be manipulated using the standard pointer
utilities, accessible through the impl of `Deref<Target=*mut T>` for `Buffer<T>`.

As a result of this hands-off approach, `Buffer`s destructor does not attempt
to drop any of the contained elements; the destructor simply frees the contained
memory.

You can think of `Buffer<T>` as an approximation for `Box<[T]>` where the elements
are not guaranteed to be valid/initialized. It is meant to be used as a building
block for other collections, so they do not have to concern themselves with the
minutiae of allocating, reallocating, and deallocating memory.
@reem
Copy link
Collaborator Author

reem commented Jan 27, 2015

I've noticed that this logic gets strewn all over the place in various collections both here and in std, it would be much better to bundle it in one typed, checked place, then just re-use that.

/// assert_eq!(buffer.capacity(), 1024);
/// ```
pub fn reallocate(&mut self, cap: usize) {
*self = if self.cap == 0 || cap == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reassigning to self in the reallocate case actually safe with the drop impl? From tests it appears yes, but I'm not actually confident about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this should be incorrect (?). This should drop self before reassigning it.

@reem
Copy link
Collaborator Author

reem commented Jan 27, 2015

Sidenote: I really enjoyed using NonZero to "check my work" when writing this, especially for all the allocation procedures, even if we forget the optimization benefits. I'm convinced it should be publicized in stable soon after 1.0, as it's pretty useful.

@reem
Copy link
Collaborator Author

reem commented Jan 27, 2015

Other sidenote: this also creates and publicizes a util module for utilities useful in building other data structures.

@reem
Copy link
Collaborator Author

reem commented Jan 27, 2015

Receiving a bizarre error on travis that won't show up locally. @gankro does this look familiar?

@Gankra
Copy link
Owner

Gankra commented Jan 27, 2015

Woaaah that's crazy.

/// block for other collections, so they do not have to concern themselves with the
/// minutiae of allocating, reallocating, and deallocating memory.
pub struct Buffer<T> {
buffer: NonZero<*mut T>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this NonZero<Unique<T>>, you can get rid of the unsafe impls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I tried originally, but there is no impl of Zeroable for Unique<T>, so you can't put it in a NonZero.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a libstd bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to make this change when it becomes possible, and also change all the underlying allocating utilities to take NonZero<Unique<T>> instead of NonZero<*mut T>, seems like a really excellent use of using the type system to maintain invariants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making a PR with the impl, by the way. Running tests now.

@cgaebel
Copy link
Collaborator

cgaebel commented Jan 27, 2015

I like this, but want to grab it without pulling in collect-rs. Can it go in another package instead?

@reem
Copy link
Collaborator Author

reem commented Jan 27, 2015

@cgaebel it could go elsewhere if that is desired - I'm not really sure what our policy on such things should be.

These checks aren't needed since the old size was checked when
it was originally allocated.
reem added a commit to reem/rust that referenced this pull request Jan 27, 2015
This allows the use of `NonZero<Unique<T>>` for owned,
non-null raw pointers.

cc Gankra/collect-rs#103
@reem
Copy link
Collaborator Author

reem commented Jan 28, 2015

This could be used in #100 to massively simplify and safe-ify a lot of the code.

@reem
Copy link
Collaborator Author

reem commented Jan 29, 2015

I'm doing some fixing on this but getting allocation errors... Don't merge yet.

@reem
Copy link
Collaborator Author

reem commented Jan 29, 2015

Ok, should be fixed and ready to go. That was a tricky to track down bug.

Also provide forward-compatibility for the move to `NonZero<Unique<T>>`
in the future and tweak some docs.
…location.

By only swapping out the buffer and not also changing the capacity
to 0 during reallocation, a capacity overflow could cause the drop
impl of Buffer to run when the buffer was empty() and the capacity
was non-zero, causing an incorrect call to deallocate.

We now also swap the capacity to 0, so that when reallocate is called
(the site of a possible panic) the buffer is in a consistent empty
state.

Also, in order to avoid leaking memory on oom (ya kind of useless,
but still) reallocate now deallocates the old ptr if reallocation
fails.
// Unlike std::rt::heap these check for zero-sized types, capacity overflow,
// oom etc. and calculate the appropriate size and alignment themselves.

unsafe fn allocate<T>(cap: NonZero<usize>) -> NonZero<*mut T> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm... not sure this is what NonZero is intended to be used for.

Like, you can, obviously.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides the invariant that the pointer returned by allocate is never null, i.e. allocate can never fail except by aborting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or did you mean cap? That's just encoding the invariant that capacity must never be 0 in the type system.

@Gankra
Copy link
Owner

Gankra commented Jan 29, 2015

With respect to simplifying the logic used by collections, to my knowledge there are 4 collections that handle allocation directly: Vec, RingBuf, HashMap, and BTree. HashMap and BTree do a triple-array-as-one-allocation trick, so this utility isn't applicable.

RingBuf and Vec would potentially benefit from this design, but Vec is actually pretty delicate pref-wise and does a few fancy things. Not sure if you could just drop this in. RingBuf can benefit, as I demonstrated in my faux allocator PR to rust-lang.

@reem
Copy link
Collaborator Author

reem commented Jan 29, 2015

You could use this for any allocated buffer, including HashMap and BTree. It's just a wrapper around *mut T (even derefs to it) with a capacity for allocation. That said, I haven't actually tried it... so...

@Gankra
Copy link
Owner

Gankra commented Jan 29, 2015

Hash and BTree basically make an ([A; n], [B; n], [C; n]). There's no one T that will work here other than u8, but then you're just doing the all the computations anyway.

I believe @gereeter had some interesting ideas about using a kind of builder pattern for this kind of thing,

@Gankra
Copy link
Owner

Gankra commented Jan 29, 2015

Also of interest is that Hash stores only the "head" pointer and re-computes the other array offsets at every query, while BTree stores all 3 pointers forever. Both of these choices were based on empirical performance data. But BTree might not want to cache the pointers when B is a known constant.

So even if you abstract away the triple-vec logic, you need to provide a mechanism for retrieving the relative offsets.

@reem
Copy link
Collaborator Author

reem commented Jan 29, 2015

Sounds more complex than I assumed. I was mostly targeting Vec, HybridVec, RingBuf and similar structures with this anyway.

@gereeter
Copy link
Collaborator

Regarding making all allocation easier, I had 2 APIs in mind. The first one is simply aimed at avoiding all the nasty calculations involved in allocating space for things like parallel arrays and BTree nodes. This involves creating a MemDescriptor type (see #51 for an implementation) containing a size and alignment and functions for composing those objects. This allows, for example, HashMap's allocation calculations to be simplified to

let desc = MemDescriptor::from_ty::<SafeHash>().array(cap)
     .then(MemDescriptor::from_ty::<K>().array(cap))
     .then(MemDescriptor::from_ty::<V>().array(cap));

The second API idea uses a trait abstracting over different types of arrays to generically implement things like Vec over many different layouts:

trait Data {
    type Target;
    unsafe fn read(&self) -> T;
    unsafe fn write(&mut self, value: T);
    unsafe fn offset(&self, offset: usize) -> Self;

    fn array_descriptor(num_elements: usize) -> MemDescriptor;
}

struct Flat<T> { ptr: NonZero<Unique<T>>}
impl<T> Data for Flat<T> {
    type Target = T;
    unsafe fn read(&self) -> T { mem::read(self.ptr) }
    unsafe fn write(&mut self,  value: T) { mem::write(self.ptr, value) }
    unsafe fn offset(&self, offset: usize) { Flat { ptr: self.ptr.offset(offset) } }
    fn array_descriptor(num_elements: usize) -> MemDescriptor { MemDescriptor::from_ty::<T>().array(num_elements) }
}

struct Parallel<A, B> { first: A, second: B }
impl<A: Data, B: Data> Data for Parallel<A, B> {
    type Target = (A::Target, B::Target);
    // ...
}

With such a trait, HashMap could just be backed by a Vec<Parallel<Flat<SafeHash>, Parallel<Flat<K>, Flat<V>>>. This would also support SmallVec things by having a Small<D> that implemented Data and using Vec<Small<...>>.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Jan 30, 2015
This allows the use of `NonZero<Unique<T>>` for owned,
non-null raw pointers.

cc Gankra/collect-rs#103
@abonander
Copy link
Contributor

@reem I was just going to refactor #100's heap logic to use a regular old Vec, since they behave equivalently after switching from stack-based storage.

@Gankra
Copy link
Owner

Gankra commented Feb 7, 2015

I'm actually kind of interested in moving this into std. We could put the can't-get-bigger-than-int-max logic in it and share it between ringbuf and vec.

For reference I think the check only needs to be performed when size_of::<isize>() <= 4 && size_of::<T>() == 1

@tbu-
Copy link
Contributor

tbu- commented Feb 7, 2015

@gankro Shouldn't the "can't get larger than isize" be done in the allocator?

@Gankra
Copy link
Owner

Gankra commented Feb 7, 2015

The allocator API is already unsafe, and I'm not clear that there's anything fundamentally unsound about making really huge allocations. It just causes trouble when you try to treat it as an array of bytes and ptr offset across the whole range.

Maybe that's a problem for Box<[T]> or Box<StructWithHugeArray>, though.

@tbu-
Copy link
Contributor

tbu- commented Feb 7, 2015

I thought such large allocations were always unsound due to some LLVM stuff, but I don't remember anymore.

@tbu-
Copy link
Contributor

tbu- commented Mar 5, 2015

@gankro Is there a plan for this?

@reem
Copy link
Collaborator Author

reem commented Mar 5, 2015

I can rebase if we want to land this, or remake the PR to std, or move to another package. Thoughts?

@Gankra
Copy link
Owner

Gankra commented Mar 18, 2015

RIP Collect; can be its own crate.

@Gankra Gankra closed this Mar 18, 2015
@abonander
Copy link
Contributor

@reem I still think the Buffer struct has potential. Are you going to be doing anything with it?

@reem
Copy link
Collaborator Author

reem commented Mar 20, 2015

Yes. I was debating whether it belongs in contain-rs or as a standalone repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants