-
-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve pure functions spec #2627
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request, @andralex! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
|
N.B.: this is best viewed as a split diff. |
|
spec/function.dd
Outdated
| pure int fun(S* object); | ||
| --- | ||
|
|
||
| $(LI The function returns `void`. Example:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently such functions are already accepted (this diff attempts to clarify the spec without altering it). They don't make sense as strongly pure functions because then they'd be simply not called at all.
An use sample is a deallocation function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deallocation functions are not a good match for weak purity, because "this will at most modify what is reachable through the parameters" and "this needs to be called to reclaim memory" are not compatible. One is a guarantee, the other is a restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if we require exceptions and non-termination to be preserved, then strongly pure functions returning void may not be elided unless they follow an identical function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr I just want to make sure odd cases like that are put in the weak functions category. That means no moving calls around, no elision, and no nasty surprises. Recall that "strongly pure" is the elites, "weakly pure" is the rest. We need to have a good argument to put something in the strongly pure category. Weakly pure is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not make a lot of sense. Why is a void return an "odd case" and returning an empty struct instance is not an "odd case"? The only way this frivolous special casing is acceptable is by noting that we can remove it in the future, but I fear that you might try to double down on it by coupling it with some unrelated feature to save on syntax.
More generally, I am very much against creating categories of pure functions based on the function signature such that the semantics in one category are more relaxed than in another category even though they can call each other. If function inlining can make the set of allowed reorderings smaller, then your semantics is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr I'm just keeping things simple. Going with void is simple and obvious. Going with "all data types without state" would work but is more complicated with no increase in expressive power. Simplicity is something to be appreciated.
More generally, I am very much against creating categories of pure functions based on the function signature such that the semantics in one category are more relaxed than in another category even though they can call each other. If function inlining can make the set of allowed reorderings smaller, then your semantics is broken.
My perception is there's a lot of good experience with distinguishing between weak and strong purity. Initially we only had strong purity, which was very difficult to use. When we added weak purity things got suddenly a ton better. We could of course eliminate them and just go with one definition and let compilers optimize as they find fit. But you'd need a good amount of convincing that these are useless.
So what's your take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr I'm just keeping things simple. Going with
voidis simple and obvious. Going with "all data types without state" would work but is more complicated with no increase in expressive power. Simplicity is something to be appreciated.
...
Adding an unnecessary special case is not keeping it simple. It is the opposite of that.
We do not even need a way to say: this function is only weakly pure even though it cannot mutate any state. And making it implicit with a void return is not even a good way to specify that, even if you do want that feature for some reason. Especially now, when you have allowed out parameters on strongly pure functions (as well as pure factory functions).
More generally, I am very much against creating categories of pure functions based on the function signature such that the semantics in one category are more relaxed than in another category even though they can call each other. If function inlining can make the set of allowed reorderings smaller, then your semantics is broken.
My perception is there's a lot of good experience with distinguishing between weak and strong purity. Initially we only had strong purity, which was very difficult to use. When we added weak purity things got suddenly a ton better. We could of course eliminate them and just go with one definition and let compilers optimize as they find fit. But you'd need a good amount of convincing that these are useless.
So what's your take?
My take is that your fluffy paragraph fails to address my point while setting up a straw man. (It also wasted a good bit of my time, spent verifying that this is indeed the case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for wasting your time. I don't know what I'm doing.
What I want here is to give people a chance to do systemy things like this:
void deallocate(immutable[] array);
At some point there's going to be a need for such in a systems programming language. And it should be well defined and work well - no elision, no reordering, just straight call.
Without the special rule, the function above is strongly pure. Right? Then an implementation looks at the rules and figures there's no need to ever call it.
I don't know how to do this another way. Please advise.
spec/function.dd
Outdated
| pure S* fun(int); | ||
| --- | ||
|
|
||
| $(LI The function takes zero arguments. Example:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such functions are already accepted and considering them strongly pure would not be useful - they'd be constants. So I put them in the weakly pure category.
Weakly pure means "somewhat restricted, can be called by any pure function, but not reordered at will".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constants or factory functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr cool! So those should be weakly pure, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they should be (potentially) strongly pure. Note that your pull request says they can't be factory functions.
(Here, I'm ignoring issues like exceptions and termination etc. If we don't do that, then they would not necessarily be constants.)
|
I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior: int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
} |
|
Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure". |
I'm trying to define strong purity as "definitely have freedom to optimize in functional tradition, even when definition is not available". Weak purity is "may or may not depending on information at hand". Also: the weakly pure definition is more conservative than it could be. Consider: pure void fun(int input, out int output);This should be categorized as a pure function, because it's just an awkward way of writing: pure int fun(int input);By this PR, however, it's categorized as weakly pure. Making it strongly pure would complicate the definition while not allowing additional useful cases. |
int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
}Can you please explain that further? By this PR the function would need to return fresh memory every time, which is the right thing to do. |
What would be a better way? The way I went for maximum clarity was to define weakly pure first. Then to make clear that a function can be either weakly pure or strongly pure (but not neither or both), I mention that a strongly pure function is a pure function that is not weakly pure. THEN just to make things crystal clear the doc goes ahead and enumerates the conditions for strong purity. Are you thinking it's better to define strongly pure first and then just say "if it's pure but not strongly pure, it's weakly pure"? |
|
@tgehr Consider: int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
auto b=foo();
assert(b == a);
assert(b !is a);
}This currently passes, which is good. If |
|
Direct link to html for ease of viewing: http://dtest.dlang.io/artifact/website-008142779c13e1c98c82d000287b2780f73fb9eb-2c9b19a3e85d4bd5b9ca85d217459224/web/spec/function.html#pure-functions |
Besides not being necessary for that, it is also not sufficient (with the current language definition). E.g., Haskell has non-deterministic exception semantics, to allow more rewrites. Furthermore, Haskell does not expose reference identities for immutable values.
Strong vs weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the The important distinction is between what you call weakly pure functions and pure factory functions, because it affects implicit conversions. Currently, the pull request says pure factory functions must have at least one argument.
It's weakly pure because it modifies a parameter by reference. That does not mean the compiler should not optimize |
By this PR the implicit conversion to immutable should not compile because |
Yes, I think it is, but of course that's fully up to you, as it does not influence correctness. |
"Might", not "would". Also, you are justifying why you think strongly pure functions should not have indirections in their return values, not why they must have arguments. It really does not make sense to require them to have arguments. (The single argument could be of an empty struct type.) |
spec/function.dd
Outdated
| struct S; // defined elsewhere | ||
| // These functions are weakly pure: | ||
| pure int fun(S object); | ||
| pure int fun(immutable S object); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not strongly pure? Clearly it does not have mutable indirections because of transitivity of immutability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr this is to leave us room for __metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It contradicts existing behavior:
struct S;
int[] foo(immutable S* s)pure;
immutable(int[]) bar(immutable S* s){
return foo(s); // ok
}__metadata shouldn't influence strong purity anyway. Otherwise we can't actually implement lazy evaluation correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr I don't understand the example. Can you please clarify?
__metadata shouldn't influence strong purity anyway. Otherwise we can't actually implement lazy evaluation correctly.
I don't understand this either, please clarify. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr I don't understand the example. Can you please clarify?
...
The function foo computes an int[] from a pointer to an immutable instance of an opaque datatype. The result converts to immutable implicitly. The PR says this is not the case.
__metadata shouldn't influence strong purity anyway. Otherwise we can't actually implement lazy evaluation correctly.
I don't understand this either, please clarify. Thanks!
Simplest example is you have an immutable data type and you change some field from being eagerly initialized to being lazily initialized. Factory functions that take an immutable instance of your type will break, because their results suddenly cannot be implicitly converted to immutable anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr thanks, I see. The short of it is I don't know how to define __metadata and pure so as to work together properly without blowing complexity up. If you have any ideas, they'd be welcome.
The longer of it is I'm willing to impose additional restrictions on unknown data types if that makes matters simple, even if we need to break code that works in obscure cases. Optimizing strongly pure functions is low-impact anyway (we have zero such optimizations right now and there's no blood in the streets), and it's very rare that types are incompletely defined in D to start with.
I'm not that concerned about marketing as much as about making decisions that keep things simple and meaningful. In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace. Also I don't want to define I think I have a response to the paren - revised version upcoming which restricts what "equivalent parameters" means. |
|
@tgehr eliminated the no-args requirement in the upcoming revision |
|
@tgehr followed your advice by placing definition of strongly pure first. Then made weakly pure definition as the complement. Things seem a lil cleaner. Thx. |
I was concerned about conflating strongly pure, which originated as a marketing concept used to justify weak purity, with language semantics. The PR makes more sense with your latest edits. (Or I missed the part restricting reordering before. I have no way to tell as the previous version disappeared after a force push.)
That's fine, as long as it is with an understanding that the set of specified optimizations is a small subset of what is possible, and not a comprehensive list.
Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.) Some possible bad outcomes that I really want to avoid:
|
| pure double gun(double); | ||
| void hun(double x) | ||
| { | ||
| auto y = fun(x) + gun(x); // evaluate fun(x) and gun(x) in any order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where fun is not nothrow, gun is nothrow, and fun throws an exception exactly when gun does not terminate?
spec/function.dd
Outdated
| not carry to the thrown object.) | ||
|
|
||
| $(P Note: a strongly pure function may still have behavior inconsistent with memoization by e.g. | ||
| using `cast`s or by changing behavior depending on the address of its parameters. An implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you defined away memoization for equal immutable arguments with different addresses? (Not that I like it, but this seems to eliminate this concern.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was the intent. Here's what I had in mind:
pure string identity(string s) { return s; }
auto s1 = "abc", s2 = s1.idup;
auto s3 = identity(s1), s4 = identity(s2);If we allow s1 to be considered equivalent to s2, identity will always return s1.
Also: when in doubt, make it weakly pure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you're saying I should eliminate that text. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr will still keep ie because:
pure int* hmmm(int x) { return &x; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgehr simplified the text to, well, simplify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my comment was purely about the parameter addresses. (By which I assumed you meant the actual address of references/pointers within the arguments, otherwise it should have said address of local variables.)
Yeah, I switched to no-amend commits going forward.
Cool!
That would be acceptable. And don't forget - fewer is not a disaster.
Allow me to put my pointy-haired manager hat (or wig) by interspersing the top-level view with these:
Very low impact. Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head.
Low impact.
Low impact.
Low impact. What is high impact:
These are what matters. I would very happy give away any and all items on your list to get these two. And it would be great if you did, too. |
The point of the rewrites is to define the meaning of
It was you who said: "Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure."
That's not the point. Currently,
I guess you mean nominally
I can do lazy initialization of members today, even within a weakly pure function.
I thought you wanted to support general manual memory management within
I'm just not very excited to have merely nominal immutability in the type system, and I am certainly not the only one. It means whenever there is an |
Yes, but that's not a hard requirement. In fact if we drop strongly pure entirely and stay with weakly pure that's acceptable if there's no other way to get work done on reference counting.
Maybe we have too much with
This might be the case with complex data structures, but it is not with strings and slices. Currently Can you lead such an effort?
Not in an
I'd be happy with just reference counting if that's all that's possible.
NO it's not a lie! The point is to allow controlled use of metadata that is transparent to the user of the immutable data structure. Is tracing GC a lie? Because it does make mutable memory into immutable memory (construction) and immutable memory back into modifiable memory (reclamation). It is also easy to circumvent in any number of ways. Yet nobody is spending nights losing sleep over it.
That is working just fine for C++. A good success story to learn from.
I don't understand this.
I understand, and I don't want to be there either. But I also reckon that without workable reference counting we're dead in the water. Sadly we have re-entered an unhelpful pattern.
How can we get from this pattern to a pattern whereby you actually do steer things to the positive? What steps can we get to get reference counted data structures while at the same time preserve most of the good qualities of |
|
You can't elide strongly pure calls to zero, because they can throw exceptions, it must be called at least once. So void function is fine as strongly pure, a use case for it is data validation. A void function that does nothing - that's low impact. Maybe you can reorder two pure nothrow functions between each other, but not anything else, again because exceptions. Lazy initialization doesn't seem to conflict with purity. Reference counting can be done as weakly pure: void* addRef() pure immutable
{
_counter++;
return null;
}Also postblit should be weakly pure too.
|
Good insight, thanks. Can throw errors or abort the program even if |
You can't update the counter of an immutable structure. (It's actually a pointer to a counter.)
Good news - we don't need to worry about postblit :o).
Yes to |
|
Eliminated the |
|
@andralex I'd say just do the ref counting. escape the type system. improve later. |
|
@SSoulaimane we have that already... dlang/druntime#2348 |
|
Mutable data can be stored like this: struct __mutable(T)
{
private size_t _ref;
this(T val){ _ref=cast(size_t)val; }
T unwrap() const
{
return cast(T)_ref;
}
}
struct A
{
__mutable!(int*) rc;
this(int) immutable
{
rc=new int;
}
int inc() immutable
{
return ++*rc.unwrap;
}
}
int main()
{
immutable A b=0;
assert(b.inc==1);
assert(b.inc==2);
return 0;
} |
|
@anon17 thanks for the suggestion. It's in line with some of what @edi33416 and myself have tried over time. It has a number of small issues (not safe, the GC may free the memory prematurely) that can be fixed as follows: @safe:
struct __mutable(T)
{
private union { T _unused; size_t _ref; }
this(T val) pure immutable { _ref=cast(size_t)val; }
T unwrap() const pure @system
{
return cast(T)_ref;
}
}
struct A
{
__mutable!(int*) rc;
this(int) immutable pure
{
rc=new int;
}
int inc() immutable pure @trusted
{
return ++*rc.unwrap;
}
}
int main() pure
{
immutable A b=0;
assert(b.inc==1);
assert(b.inc==2);
return 0;
}This code compile and runs. The large problem is that A.inc is a strongly pure function (pure operating on immutable data). However, that's a trick - unbeknownst to the type system, the code ends up changing immutable data. Per the spec (with or without this PR), the call to |
|
Weakly pure variant: struct A
{
__mutable!(int*) rc;
this(int) immutable pure
{
rc=...; //PrefixAllocator is fine
}
private void* addRef() pure const @trusted
{
++*rc.unwrap;
return null;
}
private void* release() pure const @trusted
{
int counter=--*rc.unwrap;
if(counter==0)deallocate();
return null;
}
private void deallocate() pure const
{
...
}
~this() pure const
{
release();
}
}Also this way the change is not visible outside, so the type system can't really tell if it changes or not, it only needs to call weakly pure functions. |
| } | ||
| pure S* f4(int); | ||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have an example for ref return?
| { | ||
| if (i == n) break; | ||
| result += __t; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that fun defined above always does return, I think the above code should actually be equivalent a valid lowering. So the example is confusing.
I've redone the pure functions section to improve definitions. I'm aiming for a narrow and precise definition of strongly pure functions; following that, aggressive optimizations can be applied. cc @WalterBright @tgehr @dnadlinger @JohanEngelen @redstar