Mark some GC functions pure and/or nothrow. #198

Merged
merged 1 commit into from Jun 1, 2012

Projects

None yet

4 participants

@alexrp
D Programming Language member

A few things worth noting:

  • These functions actually can be nothrow, because the only exceptions they can throw are Errors.
  • Yes, marking the functions with these modifiers means future implementations have to follow them. I don't think this is a problem, however, and the current situation (no modifiers at all) is very annoying.
  • I haven't marked the actual functions inside the GC with these modifiers. I didn't feel that cascading them all the way to the GC implementation was necessary; the GC is a low-level component of the runtime and thus deserves as much freedom as it can get. That said, I did verify that none of the functions marked nothrow can throw.
  • I'm not sure if the application of pure is entirely correct. I think it makes sense in terms of weak purity, but please correct me if I'm wrong.
@alexrp
D Programming Language member

Ping?

@alexrp
D Programming Language member

I've been thinking about the application of pure to some of the functions here. While the getAttr, clrAttr, and setAttr functions (and the likes) do rely on the GC's global state, I think it could be argued that this state is an implicit parameter to all functions in the core.memory module. Anyone who calls these functions will know that they rely on fetching information from the garbage collector instance. As such, marking those functions pure is fine IMHO, since their purity only depends on two inputs: The GC itself and the memory address.

Besides, there's no good alternative to doing it this way, and this module is currently not very usable in pure/nothrow code. We need to fix that silly situation.

Thoughts?

@schveiguy
D Programming Language member

I think most of these can be marked pure as well.

As long as it's weak purity, it can be marked pure. What we don't want is someone for instance doing:

void *p1 = gc_malloc(10);
void *p2 = gc_malloc(10);

and having the compiler rewrite as:

void *p2 = p1;

Given that most of these return mutable data, they should be weakly pure, right?

@alexrp
D Programming Language member

Makes sense. But this brings up the question: Should free be pure? I find that one tough, since it may seem weakly pure, but it can have all sorts of arbitrary side-effects. For example, a GC implementation might choose to run the finalizer when the data is freed, or whatever.

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

Speaking of which, we probably should fix the docs: http://dlang.org/phobos/core_memory.html#free

But I'm honestly not sure what's the preferred construct these days.

Anyway, as long as free is guaranteed to not call arbitrary functions with side-effects, then it's fine I guess.

What about the enable, disable, minimize, and collect functions? I think making the latter two pure would be very, very wrong, since they on the other hand can result in finalizers being enqueued.

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

a) running a collect cycle or not running one shouldn't alter the logical execution of the program. So if a call to collect is optimized out, no big deal.

I think this is a big deal: It is very common to force collections in order to reduce memory use and force finalizers to be run ASAP. We need to account for this case, and under no circumstances optimize it out. Also, keep in mind that finalizers can run arbitrary code, so if a collection causes finalizers to be fired off, this operation is by no means pure. Ideally, finalizers should be programmed to not access state outside of the object they belong to, but making this assumption is not pragmatic IMHO (just because finalization order is undefined doesn't mean accessing global state is useless - think a simple scenario like atomically decreasing a global counter).

b) the collection cycle runs in its own context, it's almost like it's on its own thread, but it's "borrowing" the current running thread's stack for execution. It really doesn't affect the current-running thread's data whatsoever (except for things the pure function isn't legally allowed to see, such as global data).

That depends. In order to implement weak references, I've used a very dirty trick where I store a GC pointer into a NO_SCAN area. This means that a collection actually can have side-effects in that it'll render weak references collected. You could argue that this is how weak references are supposed to work, but it's a side-effect no less.

As far as enable/disable, I think they should not be pure, because you don't want to optimize out a call to one of those! However, my gut tells me we should be able to build a way to call these in pairs inside a pure scope.

Yeah, it could probably be useful for certain optimized algorithms. Unfortunately, I can't think of a good way to do this.

@schveiguy
D Programming Language member
@schveiguy
D Programming Language member

Speaking of which, we probably should fix the docs: http://dlang.org/phobos/core_memory.html#free

From that page:

The block will not be finalized regardless of whether the FINALIZE attribute is set.

Seems correct to me...

@alexrp
D Programming Language member

I was getting at it recommending using delete.

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

Well, anyway, you make a good point: We already allow pure functions to cause GC allocations, so this should be no different. But I have to admit I don't like it... it seems... dirty.

Well, I'll make the changes and rebase.

@schveiguy
D Programming Language member

Well, don't mark gc_enable and gc_disable pure, because they have no parameters/return value, they will be interpreted as strong-pure and might be optimized out. We need to find a way to make these weak-pure. I'll post a NG message querying on how to do it.

I guess same thing for gc_collect and gc_minimize.

@alexrp
D Programming Language member

Rebased. I'm not sure what to do about the range management functions, though. Do we want those to be pure?

@schveiguy
D Programming Language member

I'm not sure what to do about the range management functions, though. Do we want those to be pure?

I think these should be pure, but we can hold off for now. Adding ranges and roots is done when you allocate outside of the GC, and I don't think any of those functions (i.e. C's malloc and free) are marked pure yet.

@alexrp
D Programming Language member

It will have to be eventually, otherwise writing a pure allocator interface is going to suck...

@schveiguy
D Programming Language member

Reviewing your new version:

gc_setAttr, gc_getAttr, gc_clrAttr cannot be marked pure as-is, because they could be interpreted as strong-pure if called with immutable pointer. I'm pretty sure the current compiler does not optimize this, but we don't want to leave a bomb there for when it does optimize it.

It would make sense, actually, for those to be marked without 'in' (since a GC implementation might actually store the bits inside the block), but that would likely break some existing code.

gc_reserve does not have any mutable reference parameters or return values, so it will be interpreted as strong pure. It cannot be marked pure without a solution as we are discussing in the NG, but I think this is no big loss for now.

Continuing review...

@schveiguy
D Programming Language member

OK, so other than those, I think it looks good.

@alexrp
D Programming Language member

It would make sense, actually, for those to be marked without 'in' (since a GC implementation might actually store the bits inside the block), but that would likely break some existing code.

Then we should do that breakage sooner rather than later. I agree that these pointers really should not be marked with 'in' at all.

gc_reserve does not have any mutable reference parameters or return values, so it will be interpreted as strong pure. It cannot be marked pure without a solution as we are discussing in the NG, but I think this is no big loss for now.

I think for both gc_reserve and the attribute functions, we can just mark them pure for now and and add a TODO comment as a reminder that it should be fixed when we get proper weak purity (or whatever we want to call it).

@alexrp
D Programming Language member

Is this good to go?

@schveiguy
D Programming Language member

gc_reserve will be marked strong pure today. I think we can't make it pure, or it will not work properly.

The attribute functions I think should not be marked pure, but add a TODO: these should be pure when forced weak purity is possible at the top of the function list.

@alexrp
D Programming Language member

Why can't the attribute ones be pure? We just make the arguments not 'in' (as they really should be) and they will be weakly pure by definition.

@schveiguy
D Programming Language member

Because it would break code. The bar has to be set very high to be break existing code, and there just isn't enough benefit, seeing as how the adverse behavior won't happen without an improved optimizer in the compiler.

We have two choices:

  1. mark them as pure and leave in specified on the void*. This should work today, as I don't think the compiler will optimize out any calls based on this. Then "remember" to fix it if the compiler does start optimizing.
  2. Do not mark them as pure, and change it properly when it is possible to mark something as explicitly weak pure.

My vote is for option 2, because the chances we remember some time in the future to come back and fix it for option 1 is pretty slim.

What is the fallout if we use option 2? That is, if we can't mark those functions pure, what other functions now cannot be marked pure?

@alexrp
D Programming Language member

There is the fallout that writing pure allocators against the GC interface would basically be impossible (in particular, I suspect that setting NO_SCAN will be very common).

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

I really wish more people would weigh in here, but they seem busier arguing about the language's purity design than giving input here. ;)

Create setAttrPure which does exactly the same thing, but just doesn't mark the parameter as in (and of course, is marked pure). When we can force weak purity, we can replace it with an alias.

Can't we simply overload it? Keep in mind, it's a D linkage function, not C. So, we overload it and deprecate the old overload?

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

gc_getAttr is a completely internal function. We just change it to accept void*. Then we make an extra set of overloads inside the GC struct, and, in the ones that take an in void*, we just use a cast hack (this isn't problematic because we're dealing with const, not immutable).

@schveiguy
D Programming Language member
@klickverbot
D Programming Language member

@schveiguy: Cue Walter complaining that his private DLL tests don't work any longer… ;)

@alexrp
D Programming Language member

if you are dealing with const you are dealing with possibly immutable.

I posted a long rant on the NG on why this really doesn't matter in real world compilers: http://forum.dlang.org/thread/jooo4k$shb$2@digitalmars.com?page=4#post-joq5hk:24bit:241:40digitalmars.com

But we can make pure versions if you prefer.

@schveiguy
D Programming Language member

Cue Walter complaining that his private DLL tests don't work any longer… ;)

ooooh good point! @alexrp any changes we make to the private functions have to be made to the gcstub directory as well.

@schveiguy
D Programming Language member

if you are dealing with const you are dealing with possibly immutable.

I posted a long rant on the NG on why this is really doesn't matter in real world compilers

No, that's not what I'm talking about. But in any case, I think actually, we are easily able to cast away const for the prototype because the actual function is const. So here's now what I think we can do:

gc_getAttr(in void *) -> gc_getAttr(void *) pure
GC.getAttr(in void *) -> leave as is (but insert a cast to void * for passing to gc_getAttr)
add GC.getAttr(void *) pure

Then when weakpure is markable, fix things back, we won't need to deprecate anything.

@alexrp
D Programming Language member

@schveiguy I've rebased with the changes we discussed, and have added FIXME notes that we should attend to once weak purity can be properly marked. Can you please review?

@schveiguy
D Programming Language member

addrOf can be pure outright, it returns a mutable void * (though this might not be the best idea, maybe it should be fixed to return const(void)* )

Don't forget to update gcstub!

@alexrp
D Programming Language member

addrOf can be pure outright, it returns a mutable void * (though this might not be the best idea, maybe it should be fixed to return const(void)* )

Too restrictive IMO; addrOf is often used to get from an interior pointer in an array or structure to the base, so forcing it to be const would be unreasonable.

Don't forget to update gcstub!

What is there to update? The functions I modified have C linkage, so all these attribute and storage class changes shouldn't affect linking (I think... doesn't on Linux, anyway).

Note also that I didn't change the GC implementation either.

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

Should be inout then.

Can you clarify how it should be declared? I'm not too familiar with inout yet, to be honest.

No idea, all I know is that Walter goes on a rampage whenever It doesn't get updated ;) Maybe that's only when functions are added/removed...

I think so. For C linkage functions, this stuff really shouldn't matter, since all the D 'annotations' are removed from the mangling.

@schveiguy
D Programming Language member

Can you clarify how it should be declared? I'm not too familiar with inout yet, to be honest.

inout(void)* addrOf(inout(void)* p)

I think so. For C linkage functions, this stuff really shouldn't matter, since all the D 'annotations' are removed from the mangling.

Just checked, yes you need to update gcstub, because it duplicates the GC struct, which is D linkage.

@alexrp
D Programming Language member

Just checked, yes you need to update gcstub, because it duplicates the GC struct, which is D linkage.

I don't see it... only occurrence of struct GC is in core.memory.

@schveiguy
D Programming Language member

Yes, you are right, I am wrong. Sorry.

@alexrp
D Programming Language member

Well, just pushed the inout change. All good?

@schveiguy
D Programming Language member

Almost right. We still need the overloads, because inout is like const, so could potentially be treated as optimizable on a pure function if called with immutable arguments.

Sorry :( should have mentioned that when I said it could be pure outright!

@alexrp
D Programming Language member

Done.

@schveiguy
D Programming Language member

Still not right.

Should be like this:

static inout(void)* addrOf( inout(void)* p ) /*pure*/ nothrow
{
   return cast(inout(void)*)addrOf(cast()p);
}
@alexrp
D Programming Language member

OK, how about this time? :)

@schveiguy
D Programming Language member

OK, looks good! Let's wait and see how the pull tester does with it.

@schveiguy
D Programming Language member

Indeed, it looks good to merge.

@schveiguy schveiguy merged commit ef8b2a6 into dlang:master Jun 1, 2012
@denis-sh

WTF?! Sorry, but these breakes everything I know about pure functions in D. They are compiler checkable, not user checkable. E.g. strlen can't be pure because it's argument is a pointer, not an array (or we have to consider that if a pure function accepts a pointer the function result depends on all process memory).

All gc_* functions depends on global GC state and can't be pure unless this state is passed as an argument (such change will break binary compatibility but we can also leave old functions and add new ones).

Do you understand that you don't understand what is argument value and what is returned value of a pure function when we are dealing with pointers? I just filled Issue 8185 - Pure functions and pointers.

Revert it as soon as possible, please!

@klickverbot
D Programming Language member

@denis-sh: You might want to get your facts straight first – pure functions in D can take/return pointers just fine. See the spec, and I recently wrote an article about it. As for the GC state, managing garbage collected memory is allowed in pure functions (making it possible to e.g. use new). This change gives some low-level functions the same treatment.

@denis-sh

@klickverbot:

You might want to get your facts straight first – pure functions in D can take/return pointers just fine. See the spec, and I recently wrote an article about it.

OK, I'm a human and can make a mistake. But I still doesn't see this. If my mistake is obvious, just quote me some sentences from specs etc.

As for the GC state, managing garbage collected memory is allowed in pure functions (making it possible to e.g. use new).

According to specs new is clearly stated as an exception. An arbitrary C function isn't stated as an exception (even a C function from a specific module).

@klickverbot, have you seen my Issue 8185 - Pure functions and pointers?

@alexrp
D Programming Language member

@denis-sh As discussed above in this pull request, we decided to do what we did here because the GC is a special case. Yes, in an ideal world, all purity would be compiler-checkable, but then pure would be next to useless. You would be unable to write a pure generic allocator using the GC as allocation mechanism for example. These changes are made because we've taken a pragmatic stance on the matter. Even Haskell, for example, has to call low-level functions in its runtime that aren't quite pure.

You are correct in saying that in reality, the GC functions operate on global state. But conceptually they operate on the GC class instance inside the gc.gcx module, and that can be considered a hidden parameter. Yes, it could be made an actual parameter, but there's no point. Further, there's no point in trying to actually enforce pure and nothrow inside the GC implementation; in fact, it would be impossible due to some of the functions it uses.

The GC (or more generally, druntime) is a very low-level aspect of the language. What we've done here is a faith-based approach where we assume people who hack on the GC know what they're doing and don't break the guarantees made in the core.memory module.

@denis-sh

@alexrp I'm not against any hacking with hidden global state in pure functions. My point is that a compiler can do some unexpected things if a function is pure because it assumes that pure functions produces the same results for the same arguments and doesn't read/write global state. And I doesn't see anywhere a definition of same arguments/same results for pointers (see Comment 3 on Issue 8185).

But at least the following example has to be considered:

Example 1

int res = gc_reserve(100);
  • Expectations: Reserve some memory.
  • Reality: Will (may) not be actually called if gc_reserve have ever been called with this argument.
  • Note: This even violates gc_reserve documentation, because it can return any amount of memory for the same argument.

Example 2

if(!gc_reserve(400 * MiB))
    return false; // This is assumed to be a common case
  • Expectations: We will tell the user to stop allocating such big buffers.
  • Reality: Same as above.

Example 3

gc_reserve(100);
  • Expectations: Reserve some memory.
  • Reality: Will (may) produce a compilation error/warning and never ever be called.

Example 4

immutable(void)*[] pArr;
for(; ;) {
    immutable p = gc_malloc(100 * MiB);
    if(!p) break;
    pArr ~= p;
    size_t random = 0;
    foreach(b; cast(immutable(ubyte)[])p[0 .. 100 * MiB])
        random += b;
    writeln("random: ", random);
}
  • Expectations: Prints memory hashes until break.
  • Reality: Will (may) call gc_malloc once so the random will always be the same and it will be an infinite loop.

P.S.

I hope this my post is constructive enough.

@alexrp
D Programming Language member

Example 1
Example 2
Example 3

You are right. gc_reserve should only be nothrow until we can mark weak purity. @schveiguy Do you agree?

Example 4

Wrong. gc_malloc returns a pointer with mutable indirection, so it is weakly pure. The compiler is not allowed to optimize it. It may only use the purity information for the type system.

@denis-sh

I forgot to mention that Example 2 leads to OOM error because one can think a system just gave him lots of memory and than try to use it with gc_malloc.

@denis-sh

Example 4

Wrong.

Have you noticed there is no cast(immutable)? You probably have to look at @klickverbot's Purity in D article "pure and immutable – again?" section

@schveiguy
D Programming Language member
@alexrp
D Programming Language member

I actually did (if you look at the diff) - either of us must have missed fixing/reviewing that at some point. Anyway, I'll make a pull request to revert it to just nothrow.

@alexrp
D Programming Language member

See #237.

@klickverbot
D Programming Language member

@denis-sh: Sorry if my earlier post sounded like a personal attack, it certainly wasn't intended that way. I thought you were implying that pure functions taking pointers are a problem in general, and Alex/Steven were stupid for not seeing that – even though the issues were discussed at length before the request was merged. ;)

@schveiguy
D Programming Language member
@denis-sh

That doesn't matter. The fact that it returns mutable makes it weak pure (the optimizer cannot remove any calls to gc_malloc)

Than I have no idea how does pure work. Why this

immutable int[] arr1 = pureFunction(1);
...
immutable int[] arr2 = pureFunction(1);

can't be optimized to this

immutable int[] arr1 = pureFunction(1);
...
immutable int[] arr2 = arr1;

? The only answer I see is that compiler knows that pureFunction can depend on some global state but I've never heard about such "feature" of pure functions and it looks inconsistent.

@schveiguy
D Programming Language member

It's a new concept D has embraced. Since such a function that takes or returns mutable data cannot be pure optimized, it normally cannot be pure. But, if it doesn't access global variables, or shared data, then it still can't be optimized, but it can be called by an optimizable pure function!

So a pure function that created an immutable(int)[] with the numbers 1 to n could be pure and call malloc with no issues. The call to malloc could not be optimized, but the call to the number generating function could.

Of course, malloc DOES modify global shared state, but in an independent and thread safe way. So we must force malloc to be weakpure.

In fact , it might be reasonable to detect the implicit immutable cast, and optimize out calls to malloc, since you cannot change the data returned anyway. So the example might not work like you expect, but that's because your expectations are wrong.

@denis-sh

It's a new concept D has embraced. ....

I completely agree and thank you for doing this. My point is that some functions you calls pure aren't logically pure.

In fact , it might be reasonable to detect the implicit immutable cast, and optimize out calls to malloc, since you cannot change the data returned anyway. So the example might not work like you expect, but that's because your expectations are wrong.

Looks like it contradicts to your previous post. Does it? Doesn't matter. Anyway, at least I'm still very confused with pure and request participation in Issue 8185 - Pure functions and pointers understanding and than converting the conclusion into docs and/or @klickverbot's article.

@schveiguy
D Programming Language member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment