Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve std.typecons.Unique #3139

Merged
merged 16 commits into from
Apr 24, 2015
Merged

Improve std.typecons.Unique #3139

merged 16 commits into from
Apr 24, 2015

Conversation

mrkline
Copy link
Contributor

@mrkline mrkline commented Apr 2, 2015

Whenever D is brought up to the general programming public, the garbage collector is quickly raised as a point of contention. Regardless of how legitimate or well-informed these concerns are or are not, it would be a massive public relations boon — and great for the language, to boot — if we could trot out a solid set of RAII-based smart pointers for those who prefer to use them. We have a solid start in std.typecons.Unique and std.typecons.RefCounted. Unfortunately, these seem to be victims of bit rot and compiler bugs of days long gone.

This is the first of several pull requests that attempt to clean these up. An overview of the changes in this request is as follows (updated 2015-04-21):

  • Unique uses malloc and free instead of the GC for backing memory storage. Unfortunately this rules out nested types for the time being (as emplace cannot set the frame pointer for closures).
  • std.algorithm.move is used instead of a special release member function. Whether by design or by happy accident, move transfers ownership between Unique pointers in a very similar manner to C++'s std::move with std::unique_ptr. Along with being a familiar paradigm to C++ users, using move to transfer ownership makes more intuitive sense and builds consistency with the rest of Phobos.
  • With std.algorithm.move transferring ownership, release is deprecated.
  • Unique.create has transformed into a freestanding unique function. Regardless of whether or not there is language support for checking uniqueness, a utility function that creates a Unique, taking the same arguments as the underlying type's constructor, is extremely useful, as demonstrated by the addition of make_unique to C++14.
  • Constructors taking a pointer have been removed, since we now control allocation ourselves.
  • A new method, get, returns the underlying pointer, for use in functions and code that do not play a role in the life cycle of the object. Smart pointers are as much about ownership semantics as they are about allocating and freeing memory, and non-owning code should continue to refer to data using a raw pointer or a reference.

This pull request is not meant to be merged prima facie, but to initiate a dialogue. I strongly believe this is a good step in the right direction and would love to hear commentary from the core devs. Perhaps some of the implementation details need to be hashed out some more, but we can all agree that a stronger showing from the smart pointers in std.typecons will only bolster D and Phobos.

Whenever D is brought up to the general programming public,
the garbage collector is quickly raised as a point of contention.
Regardless of how legitimate or well-informed these concerns are,
it would be a massive public relations boon --- and great for the language,
to boot --- if we could trot out a solid said of RAII-based smart pointers
for those who prefer to use them. We have a solid start in
std.typecons.Unique and std.typecons.RefCounted.
Unfortunately, these classes seem to be victims of bit rot and
compiler bugs of days long gone.

An overview of the changes in this commit is as follows:

- Unique's underlying data now uses malloc and free
  instead of the garbage collector. Given that many people use RAII
  smart pointers to escape the GC, it seems to make more sense to
  avoid it here. On a related note, isn't delete deprecated?
  The current destructor uses it.

- std.algorithm.move is used instead of a special release
  member function. Whether by design or by happy accident,
  move transfers ownership between Unique pointers in a very
  similar manner to C++'s std::move with std::unique_ptr.
  Along with being a familiar paradigm to C++ users,
  using move to transfer ownership makes more intuitive sense
  and builds consistency with the rest of Phobos.

- With std.algorithm.move transferring ownership, release now just
  frees the underlying pointer and nulls the Unique.

- Unique.create is no longer compiled out using version(None).
  Regardless of whether or not there is language support for
  checking uniqueness, a utility function that creates a Unique,
  taking the same arguments as the underlying type's constructor,
  is extremely useful, as demonstrated by the addition of
  make_unique to C++14.

- Because Unique.create is now in place and Unique is backed with
  malloc, constructors taking a pointer have been removed.
  This encourages the use of create as the idiomatic,
  consistent method to, well, create Unique objects.
  If one can only get a Unique by calling create or moving another
  into it, we also ensures uniqueness in one fell swoop.

- A new method, get, returns the underlying pointer, for use in
  functions and code that do not play a role in the life cycle
  of the object. Smart pointers are as much about ownership
  semantics as they are about allocating and freeing memory,
  and non-owning code should continue to refer to data using a raw
  pointer or a reference.
immutable size_t allocSize = T.sizeof;

void* rawMemory = enforce(malloc(allocSize), "malloc returned null");
u._p = cast(RefT)rawMemory;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appear to be having a segfault on classes that have a reference to their context, such as the one on the third unit test, on line 281. When I step through create in GDB, rawMemory is correctly set to our newly allocated memory, but after stepping past this line, u._p is some bad address, such as 0x14. It's like the assignment is ignored or the cast messes with the address somehow. Is this some failing of my understanding? The D site says that "Casting a pointer type to and from a class type is done as a type paint (i.e. a reinterpret cast)." so I'm not sure how this is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breakpoint 1, std.typecons.Unique!(std.typecons.__unittestL276_3().C).Unique.create!().create() (
    __HID19=0x7fffffffe0d8) at std/typecons.d:101
101         u._p = cast(RefT)rawMemory;
(gdb) print rawMemory
$1 = (void *) 0x92d860
(gdb) n
111             void[] init = typeid(T).init[];
(gdb) print u._p
$2 = (struct std.typecons.__unittestL276_3.C *) 0x7fffffffe060

ಠ_ಠ

@mihails-strasuns
Copy link

Great to see some effort here. One quick thing that caught my eye is Unique's underlying data now uses malloc and free instead of the garbage collector which is sub-optimal, because it is perfectly legal to use Unique with GC managed data. Ideally it should be possible to define initializer and finalizer for Unique!T different for each T, so that any custom allocator could be used. I am not sure if this belongs to scope of this PR though.

@mrkline
Copy link
Contributor Author

mrkline commented Apr 2, 2015

My rationale behind doing so was that if we remove constructors accepting RefT and promote create as the way to make a Unique, we can completely control the life cycle of the underlying object T from start to finish. This is the entire point of Unique, is it not? And if we completely control the life cycle, then there is no reason to rely on the GC. If you think that's too ambitious to start with or don't agree with the idea, I can certainly shelve it for now.

I agree that making Unique into something that can act as a general scope guard and take arbitrary allocation and freeing functions like std::unique_ptr may be useful, though it seems less needed than it is in C++ given that we have scope (exit).

else
immutable size_t allocSize = T.sizeof;

void* rawMemory = enforce(malloc(allocSize), "malloc returned null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce wants to allocate to throw an exception, so you can't use it with malloc.

Do:

void* rawMemory = malloc(allocSize);
if(!rawMemory)
    onOutOfMemoryError();

(See also #3031)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For now I'll do this, and once both this and #3031 are hopefully merged, we can move over to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JakobOvrum After a bit of review, I recall that I was following a pattern noticed elsewhere in Phobos:

~/s/d/phobos [v2.067.0 ?] % ag 'enforce\(malloc'
std/container/array.d
485:            auto p = enforce(malloc(sz));

std/regex/package.d
565:        _memory = (enforce(malloc(size))[0..size]);
641:            _memory = (enforce(malloc(size))[0..size]);
671:    void[] memory = enforce(malloc(size))[0..size];

std/stdio.d
346:        _p = cast(Impl*) enforce(malloc(Impl.sizeof), "Out of memory");

std/typecons.d
4053:            _store = cast(Impl*) enforce(malloc(Impl.sizeof));

std/uni.d
1745:        auto ptr = cast(T*)enforce(malloc(T.sizeof*size), "out of memory on C heap");
6875:            auto p = cast(ubyte*)enforce(malloc(raw_cap));
6917:        ubyte* p = cast(ubyte*)enforce(malloc(3*(grow+1)));

Should these be corrected in another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3031 fixes a few of them but focuses on more important changes, but yes, we'll fix all of them eventually.

I picked up the trick of converting a pointer into an array
using the [0 .. size] syntax from std/regex/package.d.

Unique.create is still segfaulting, but this seems to be an issue
with the class version of emplace regardless of this Unique work.
The bug can be found here:
https://issues.dlang.org/show_bug.cgi?id=14402
@mrkline
Copy link
Contributor Author

mrkline commented Apr 3, 2015

The segfaults I've been getting seem to be caused by a separate issue in std.conv.emplace. I opened a bug here: https://issues.dlang.org/show_bug.cgi?id=14402

@mihails-strasuns
Copy link

This is the entire point of Unique, is it not?

Not entire at the very least. For me personally it is not even important point. The way I see it, best thing that can happen with Unique is integration with std.concurrency to allow sending unique mutable messages between threads without shared qualification. See also vibe.d Isolated (https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/core/concurrency.d#L306)

It is still rather far away but I'd prefer to avoid any hard requirements about internal memory management. Though that does seem a bit out of scope of this PR so no real objections as long as you keep that in mind in the long term.

It is currently impossible (or so it seems) to use malloc and
emplace to create a nested class or struct, so we'll return to
using the GC for now. We'll also restore the constructors that take
a RefT, as using new _inside_ the context of the nested class or
struct is apparently the only way to create one currently.
@mrkline
Copy link
Contributor Author

mrkline commented Apr 5, 2015

Alright. So, I've done a bit of exploring and realized that context-aware objects cannot be created with emplace or new outside their context. Because of this, I've switched back to using the GC and added the constructors that take a pointer from new (since new seems to be the only valid way to currently create a context-aware object). This PR should now pass unit tests.

So we're all on the same page, this PR now makes the following changes:

  • std.algorithm.move becomes the canonical way to transfer ownership of a resource from one Unique to another. The docs and unit tests have been updated to reflect this.
  • Given the previous point, release just destroys the owned object and nulls the pointer.
  • create is no longer compiled out via version(None).
  • Releasing the owned object is now done with destroy followed by GC.free, as suggested by the "corrective action" listed for delete on the deprecated features page: http://dlang.org/deprecate.html#delete

@JakobOvrum
Copy link
Member

Because of this, I've switched back to using the GC and added the constructors that take a pointer from new (since new seems to be the only valid way to currently create a context-aware object).

Instead of hiding this dangerous operation in an unassuming, anonymous constructor, we should probably formalize it in a named, greppable constructor function. Ideally it would be named assumeUnique... although that might be an unpopular choice due to std.exception.assumeUnique.

(See also http://acehreli.org/AliCehreli_assumptions.pdf)

@mrkline
Copy link
Contributor Author

mrkline commented Apr 6, 2015

@JakobOvrum - I couldn't agree more. I only left those constructors because they were there already, and I was going under the assumption that fewer breaking changes would mean less debate to get this PR accepted. I'm more than happy to remove those constructors and work them into something more... excplicit in its stated intentions.

@JakobOvrum
Copy link
Member

Alright, this is a breaking change but Unique is one of those fundamental types we need to get right. LGTM.

@mihails-strasuns
Copy link

Needs an entry in changelog with migration instructions in that case

@mrkline
Copy link
Contributor Author

mrkline commented Apr 6, 2015

I can provide those, along with some better DDoc documentation in typecons.d itself. I'll implement the changes proposed by @JakobOvrum ASAP (hopefully tonight, possibly tomorrow).

@mihails-strasuns
Copy link

btw I'd really love assumeUnique to return Unique!T instead of immutable but can't imagine good non-breaking migration path for that :(

From the related pull request
(#3139),
there seems to be a general consensus that it is more important to
do Unique "right", even if that means breaking changes, so long as
there is a clean migration path. With that in mind, I have made the
following additional changes:

- Instead of constructors that take a RefT, Uniques can now be
  created one of two ways: via .create or .fromNested.
  See the DDocs of both for details.

- opDot is replaced with "alias _p this". A cursorty Google search
  indicates that opDot is deprecated and that alias this is the
  preferred method. Like C++'s unique_ptr, Unique now enjoys
  pointer-like operations (such as dereferencing),
  but cannot be set to null or assigned from a different pointer
  due to opAssign and the disabled postblit constructor.

- Consequently, isEmpty has been removed. Instead, just use
  is null as you would with a pointer.

- Removal of redundant unit tests

- Various comment and unit test cleanup
@mrkline
Copy link
Contributor Author

mrkline commented Apr 7, 2015

There seems to be a general consensus that it is more important to do Unique right, even if that means breaking changes, so long as there is a clean migration path. With that in mind, I have made the following additional changes:

  • Instead of constructors that take a RefT, Uniques can now be created one of two ways: via .create or .fromNested. The latter takes a pointer fresh from new, since that seems to be the only way at time of writing to create a reference to a nested object.
  • opDot is replaced with alias _p this. A cursory Google search indicates that opDot is deprecated and that alias this is the preferred method. Like C++'s unique_ptr, Unique now enjoys pointer-like operations (such as dereferencing).
  • Consequently, isEmpty has been removed. Instead, just use is null as you would with a pointer. Since a pointer implicitly casts to a boolean value, if (myUnique) is also valid.

@MartinNowak
Copy link
Member

Why create? We don't have any create in Phobos. The common pattern is to use a normal constructor and maybe add a lowercase unique function for inference.

@MartinNowak
Copy link
Member

I just made a related patch for RefCounted, #3171.

@MartinNowak
Copy link
Member

But assumeUnique does something very different, it casts to immutable. It was named after the precondition for converting something to immutable.


To ensure uniqueness, be sure to provide the direct result of $(D new).

Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this example a documented unittest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but if you make it a documented unittest, you don't have to copy the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, fantastic! I wasn't aware of this feature. Thanks much.

@mihails-strasuns
Copy link

But assumeUnique does something very different, it casts to immutable

Which is exactly my main problem with assumeUnique. It is quite often to have uniqueness axiom stated much earlier in code than decision about mutability of result can be made. Compiler can nicely infer it for pure functions without using assumeUnique at all but in general case it is unsolved problem.

@mrkline
Copy link
Contributor Author

mrkline commented Apr 10, 2015

I used create because the function was already there. If you would like me to change that to a constructor, I'm happy to do so.

As @Dicebot discussed, I'm not touching assumeUnique with this PR. It's a pretty bad misnomer given that it doesn't return a Unique, but changing it would open a huge can of worms.

@mrkline
Copy link
Contributor Author

mrkline commented Apr 10, 2015

@MartinNowak A thought just occurred to me - create exists because it allows us to build a Unique for an underlying T whose constructor takes no arguments, no?

Allows you to dereference the underlying $(D RefT)
and treat it like a $(D RefT) in other ways (such as comparing to null)
*/
alias _p this;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those functions can escape references making Unique unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, they can, and perhaps I should add a comment saying as much. However, this is not inherently bad. Unique conveys ownership, and non-owning functions shouldn't pass around smart pointers if they aren't going to play a role in the underlying object's lifetime.

See this bit (starting at the 14 minute mark or so) of a recent Herb Sutter talk. Yes, it's about C++, but the exact same semantics are at play here and he discusses it extensively. He makes these points much more eloquently than I do, so I hope you take the time to watch it, but to summarize, doing this bakes the ownership semantics of calls directly into the function signatures. Observe:

void consume(Unique!T u) // Takes ownership of u

void poke(ref T u) // Uses u but plays no impact on its lifetime

void opt(T* u) // u is optional; has no impact on its lifetime

If you're still not convinced, another important point is that a non-owning function shouldn't care how ownership of an object is being handled.

void antipatternIfNotOwning(ref Unique!T u)

is just fine to tell the user "this function may reseat the ownership of the T". But it's just bad if the function doesn't affect ownership. It demands a certain means of ownership (as opposed to GC, or RefCounted, or stack-allocated) despite not actually caring, it's more verbose, and its intentions aren't as clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ is hardly a decent example to follow. The stuff they are forced to use here is all about using convention in absence of working compile-verified solution. I believe D can and should do better in this regard.

Allowing @safe escaping of unique reference is absolutely unacceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take the time to watch the linked section of the talk? D can (unlike C++) do compile-time checks for safety using @safe, but since it can't verify life cycles at compile time (a la Rust), that doesn't get around the inherent ownership issues at hand.

If we want to mark this as unsafe, that's fine, but I still strongly believe it's necessary. What's the alternative? Passing ref Unique!T around? That's problematic for the reasons listed above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and I have been using that approach in C++ for ages. It is different in D though as we actually promise @safe code to be always memory-safe if it compiles. No conventions, no style rules - if it compiles, it must be good to go. This can't be compromised no matter what, even simply banning all borrowing of Unique would be more practical approach.

And making Unique @system is just delaying another full rewrite of the utility.

Also please note there are actually various proposals about enforcing borrowing / lifetime semantics at compile-time in a way similar to Rust, with http://wiki.dlang.org/DIP69 being primary candidate for getting implemented. I haven't checked it in a while but it should allow adding scope ref T borrow() { return _p; } method while keeping it all @safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. My apologies if my tone was somewhat... irreverent.

So, what's the plan? Drop get and just pass Uniques by ref? And what should we do about opDot? I only moved away from it because there was no documentation for it anywhere, and several posts said this was because it was deprecated for alias this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Dicevot says. I know that talk btw.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No offense, I simply wanted to stress the point how important guarantess of @safe are.

Another alternative for opDot is using std.typecons.Proxy : https://github.com/D-Programming-Language/phobos/blob/master/std/typecons.d#L4821-L4829
I don't know how robust it is in practice but is supposed to provide wrappers for all members without allowing implicit conversion (like alias this does).

As for get - yes, I am tempted to just drop it completely for now and add better facility once DIP69 or similar is implemented. But this is a delicate topic, I wonder what Phobos maintainers think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at DIP25 we should implement this like so.

ref T get() return { return *_p; }
alias get this;

That should be safe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@MartinNowak
Copy link
Member

We could make Unique and RefCounted use the same internal storage struct, to encapsulate common construct/move/memory code.

@mihails-strasuns
Copy link

Yeah making get() @system for classes and @safe for value types makes sense

@JakobOvrum
Copy link
Member

It's not an issue of classes vs anything else, it's an issue of types with indirection.

@mihails-strasuns
Copy link

@JakobOvrum is it? Indirection on its own should be fine as long as that data is @safe on its own. Issue with classes is related to their inherent reference semantics which are not covered by DIP25 (it requires usage of ref)

@JakobOvrum
Copy link
Member

@Dicebot, right, I don't think I've wrapped my head around DIP25 completely yet.


Regarding C heap vs GC heap; if we want to allow transferring Unique!T (where T is unshared and mutable/const) to other threads, using the C heap might be a big advantage, as there are GC improvement ideas floating around involving thread-local GC heaps (@deadalnix?). It's best if we can decide the sharedness of a GC-allocated chunk at allocation-time.

@JakobOvrum
Copy link
Member

Another observation regarding cross-thread moves - the uniqueness of Unique!T only applies to T, not references embedded as T's fields. That is, the uniqueness is not transitive, it only applies for one level of indirection - the Unique!T reference itself (as has been observed by others before me).

That means not just any Unique instance can be moved to a different thread, only instances where all of T's fields passes isUnique!U || !hasUnsharedAliasing!U, where U is the type of the field.

Does this make sense?

@MartinNowak
Copy link
Member

Can we please keep that highly speculative GC discussion out of here?
The idea of using shared to separate thread-local from global allocations is pretty old and has many pitfalls.

That means not just any Unique instance can be moved to a different thread, only instances where all of T's fields passes isUnique!U || !hasUnsharedAliasing!U, where U is the type of the field.

Makes sense, and once we merge this we should make std.concurrent aware of Unique.

Unique!C uc = new C;
Unique!Object uo = uc.release;
Unique!C uc = unique!C();
Unique!Object uo = move(uc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think that polymorph conversion works, should turn this into a documented unittest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's actually is supposed to work, nice.

@MartinNowak
Copy link
Member

Looks pretty good already, just a few more details.

@mrkline
Copy link
Contributor Author

mrkline commented Apr 24, 2015

Alright, I added the suggested docs and made get a template function (seemed like the simplest solution). I agree that a move constructor for a T rvalue would be nice, but perhaps we can save that for another PR that also lifts your moveEmplace functionality from #3171 into its own function.

They don't go well together.
@MartinNowak
Copy link
Member

Auto-merge toggled on

MartinNowak added a commit that referenced this pull request Apr 24, 2015
@MartinNowak MartinNowak merged commit 8f4a85b into dlang:master Apr 24, 2015
@MartinNowak
Copy link
Member

Thanks, I'll follow up with RefCounted support for classes and factoring out moveEmplace.

@MartinNowak MartinNowak added this to the 2.068 milestone Jun 30, 2015
MartinNowak added a commit to MartinNowak/phobos that referenced this pull request Jul 21, 2015
This reverts commit 8f4a85b, reversing
changes made to d74e4d7.

Delay unfinished feature until after 2.068.x.
quickfur pushed a commit that referenced this pull request Jul 21, 2015
Revert "Merge pull request #3139 from mrkline/better-unique"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants