Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Const opSlice, upperBound, lowerBound and equalRange for rbtree #3501

Merged
merged 1 commit into from Aug 13, 2015

Conversation

Groterik
Copy link
Contributor

Const versions of opSlice, upperBound, lowerBound and equalRange functions are added. They return ConstRange that is a range of const type elements.

@schveiguy
Copy link
Member

I think this is the wrong approach. First, you should not be casting away const to run an algorithm which is actually const. You should do it the other way around (implement the const version, then cast within the mutable one).

Second, this is missing the immutable equivalent (easy to add when you have the boilerplate for const). This also will not work with inout unfortunately.

Third, the best way to fix this is to fix the language. Although, we can likely include a reasonable compromise here and then remove the compromise when the language is fixed.

@Groterik
Copy link
Contributor Author

Thank you, I agree, more work is needed here. I'll try to fix it.

@aG0aep6G
Copy link
Contributor

First, you should not be casting away const to run an algorithm which is actually const. You should do it the other way around (implement the const version, then cast within the mutable one).

That would mean call the const opSlice on a mutable this, cast the result back to mutable, and return it, right? I think that's no good. Casting away const and then mutating is undefined behaviour.

The other way around (cast this from const to mutable, call opSlice on it, return as const) doesn't have that problem, as there's no mutating going on when this is briefly painted mutable.

-> I think it's correct the way it is, and it wouldn't be correct the way you suggest.

However, simply duplicating the function bodies is an option, too. opSlice, upperBound, and lowerBound are one-liners. Just repeating them with different range types would be shorter and possibly less bug-prone than casting cleverness. equalRange is more complex, I'd avoid duplicating that one.

Third, the best way to fix this is to fix the language. Although, we can likely include a reasonable compromise here and then remove the compromise when the language is fixed.

We did the very same ConstRange/ImmutableRange thing with Array (#2631). No one likes it, but no one came up with something better either.

@schveiguy
Copy link
Member

Casting away const and then mutating is undefined behaviour.

Casting away immutable and then mutating is undefined behavior. Const is OK as long as the actual data is mutable. Obviously this is true from a mutable version of the function.

@schveiguy
Copy link
Member

However, simply duplicating the function bodies is an option, too. opSlice, upperBound, and lowerBound are one-liners. Just repeating them with different range types would be shorter and possibly less bug-prone than casting cleverness.

This is an excellent point, can we just do this? I'd rather avoid casting wherever possible, even if it means not all the implementations use the same "tricks"

@aG0aep6G
Copy link
Contributor

Casting away immutable and then mutating is undefined behavior. Const is OK as long as the actual data is mutable.

I have my doubts about this. I'm pretty sure it's wrong in the general case, and I think it may be wrong in this specific case, too.

Surely the following is not ok, is it?

void f(const int* c) pure {* cast(int*) c = 2;}
void main()
{
    int x = 1;
    const int* c = &x;
    f(c);
}

x is mutable, but the compiler should be allowed to assume that f doesn't change it.

When we inline f, we get this:

void main()
{
    int x = 1;
    const int* c = &x;
    * cast(int*) c = 2;
}

Which you're saying is ok, right? It looks less problematic, but unless I'm mistaken, which I well may be, the two snippets should be equivalent, i.e. either both ok or both bad.

Anyway, what's wrong with casting the other way around?

@schveiguy
Copy link
Member

I'm pretty sure it's wrong in the general case

Yes, definitely if you aren't sure if the const data is immutable or mutable, you cannot mutate.

the compiler should be allowed to assume that f doesn't change it.

I think in this case, yes, it should be able to assume that.

But that is not the same here. The casting occurs locally, not wrapped inside a function.

This is more of the equivalent:

const int *f(const int *c) pure { return c;}

void main()
{
   int x;
   auto y = f(&x);
   *(cast(int *)y) = 2;
}

Now, can the compiler assume that y doesn't point at x? I don't see how it can assume that.

what's wrong with casting the other way around?

Because this turns off the compiler checks for const. One is free to mutate inside the function without realizing you are compromising the const/immutable versions, and the compiler won't complain. If you do it the other way, the dangerous code is not the implementation, it's simply plumbing.

@Groterik
Copy link
Contributor Author

This is an excellent point, can we just do this?
Sure, I will try to manage without castings.

@aG0aep6G
Copy link
Contributor

Yes, definitely if you aren't sure if the const data is immutable or mutable, you cannot mutate.

I meant that the data being mutable is not enough to allow casting away const and then mutating. You're agreeing on the example then, so I think we're on the same page here.

But that is not the same here. The casting occurs locally, not wrapped inside a function.

I'm not completely sure if it's correct, but I'm under the impression that when a line of code itself is ok, then turning it into a function should be ok, too. Just like the other way around, inlining any function call should be ok (ignoring recursion for a moment).

This is more of the equivalent:

const int *f(const int *c) pure { return c;}

Fixed: const(int*) f(const int *c) pure { return c;}

void main()
{
int x;
auto y = f(&x);
*(cast(int *)y) = 2;
}

Now, can the compiler assume that y doesn't point at x? I don't see how it can assume that.

I agree that it can't assume that *y is immutable, but I think it should be allowed to assume that x won't be mutated through y, as that's the const promise.

And when the compiler can rely on the const promise, then I think it could put x into immutable storage here:
There are two references to the data: x is a mutable reference, y is a const reference. x is only ever used to initialize y, i.e. x is never used to change the data. y can't change the data, says the const promise. => The data won't change, and can be put into immutable storage.

what's wrong with casting the other way around?

Because this turns off the compiler checks for const. One is free to mutate inside the function without realizing you are compromising the const/immutable versions, and the compiler won't complain. If you do it the other way, the dangerous code is not the implementation, it's simply plumbing.

I see. Yes, ensuring correctness and maintainability are problems. I still think that the alternative is outright invalid, though.

@schveiguy
Copy link
Member

I agree that it can't assume that *y is immutable, but I think it should be allowed to assume that x won't be mutated through y, as that's the const promise.

const promises go away when you cast. If the compiler sees the cast, it has to assume the worst.

And in any case, I need to refine the analogous example. Because the function that does the cast isn't actually mutating anything. So really it looks more like this:

const(int) *f1(const(int)* x) { return x;}
int *f2(int *x) { return cast(int *)f1(x);}

void main()
{
   int x;
   auto y = f2(&x);
   *y = 2;
}

It seems to me there can't be any invalid assumptions made by the compiler due to const in this case when it examines each function.

I still think that the alternative is outright invalid

I don't see in the spec where that is so. Only modifying immutable data is invalid that I can find.

@jmdavis
Copy link
Member

jmdavis commented Jul 23, 2015

const promises go away when you cast. If the compiler sees the cast, it has to assume the worst.

Once you cast away const, it's up to you to guarantee that those promises aren't violated, and as I understand it, casting away const does not stop the compiler from making assumptions about any const references to the same data. So, if you cast away const and mutate the object, even if the object is actually mutable underneath the hood, you risk running afoul of assumptions made by the compiler. As long as the compiler can't do anything based on the assumption that a const variable isn't going to change, then it won't matter, but if it can do something based on that, then as I understand it, it's going to. So, if it's doing something based on the assumption that the variable's value didn't change, and you changed it by casting away const and mutating it anyway, then you're going to have problems. What makes it so that you can usually get away with it is the fact that there are very few assumptions that can be made based on const (since to be able to assume that the value wasn't changed, it has to be sure that there are no mutable references to the same data which are getting accessed at the same time, which can sometimes be known thanks to pure, but usually, it can't be known). But if you cast away const and mutate the value, you do run the risk of the compiler getting smarter later and making more assumptions based on const than it does now. So, even if it works now, that doesn't guarantee that it will later.

So, personally, my take on it is that you just don't cast away const and mutate. It's too risky. Yes, you can theoretically get away with it if you're careful (particularly since it is very hard to do much based on const as opposed to immutable), but I wouldn't risk it.

Casting away const without mutating is fine, if you're careful, though it's ugly, so I really wouldn't do that either unless I had no choice, but it doesn't run the same risks. You just have to make sure that you're not actually mutating anything.

@schveiguy
Copy link
Member

casting away const does not stop the compiler from making assumptions about any const references to the same data

Const provides no guarantees of immutability to the compiler. A const variable can easily be mutated via a mutable reference. The spec rightly says it's UB to cast away const/immutable and mutate immutable data.

Casting away const without mutating is fine, if you're careful, though it's ugly

What if you KNOW that the underlying data is mutable (as in this case)? And in this case actually, the code that knows the data is mutable is not actually mutating, but just restoring the mutable attribute via a cast.

Really, what you and @aG0aep6G are asking is that the compiler be intelligent enough to prove that one variable can only be accessed through const references, but to willfully ignore the casting of such a variable. I think that's quite a silly proposition. IMO, the cast should disable any such optimizations.

The one case where I think this is a problem is if a pure function can accept a const reference, with no other mutable references that could possibly lead back to that const reference, casts away that const and modifies. Other than that, I think it's difficult to imagine a compiler that is so highly intelligent in flow analysis, but ignores one very important aspect of it.

I ask again, is there any part of the spec that says you cannot cast away const and modify if the underlying data is mutable? If not, I can't see how this is even worth discussing. The overwhelmingly logical choice here is to let the compiler prove the implementation is valid for const, and then do the casting outside where you know the cast is valid.

@jmdavis
Copy link
Member

jmdavis commented Jul 23, 2015

Really, what you and @aG0aep6G are asking is that the compiler be intelligent enough to prove that one variable can only be accessed through const references, but to willfully ignore the casting of such a variable. I think that's quite a silly proposition. IMO, the cast should disable any such optimizations.

Even if it is smart enough to not make assumptions around the point where the cast is made (which I would hope that it would be that smart), as soon as other function calls get involved, you're asking it to be that smart across function boundaries, and optimizations don't generally work that way.

I ask again, is there any part of the spec that says you cannot cast away const and modify if the underlying data is mutable? If not, I can't see how this is even worth discussing. The overwhelmingly logical choice here is to let the compiler prove the implementation is valid for const, and then do the casting outside where you know the cast is valid.

I don't know what the spec says, but it has come up before - including in discussions that included Walter - that the compiler can make assumptions based on const not mutating data and that it is undefined behavior to mutate an object that was const - precisely because the compiler is free to assume that data is not mutated via a const reference. e.g.

const myFoo = getFoo();
pureFunc(myFoo);
// It should be impossible for the value of myFoo to have changed at this point

If pureFunc were not pure, then there the compiler can't know that myFoo hasn't changed, because pureFunc would then have been able to access data that wasn't passed to it and thus might have been able to access the same data that myFoo refers to via another reference, but since it is pure, it does know that no other references to what myFoo refers to can have been accessed and thus knows that its state has not changed.

Now, whether the compiler currently does much based on that knowledge, I don't know. But I'm 100% certain that casting away const to mutate an object is undefined behavior, even if you know that the underlying object can't be immutable, because this has been discussed quite a bit in the past - particularly with regards to logical const. And if the spec isn't clear on that, then I would consider it a bug in the spec.

@aG0aep6G
Copy link
Contributor

const(int) f1(const(int) x) { return x;}
int *f2(int *x) { return cast(int *)f1(x);}

void main()
{
int x;
auto y = f2(&x);
*y = 2;
}

It seems to me there can't be any invalid assumptions made by the compiler due to const in this case when it examines each function.

I don't think this changes anything. If the compiler is allowed to look at all the code, it's the same situation as before, isn't it? My take on it is that once you've gone const you can't go back to mutable (from the const reference) and mutate.

Really, what you and @aG0aep6G are asking is that the compiler be intelligent enough to prove that one variable can only be accessed through const references, but to willfully ignore the casting of such a variable. I think that's quite a silly proposition. IMO, the cast should disable any such optimizations.

It's a silly, unrealistic scenario, yes. But I think a hypothetical, silly compiler like that should be allowed.

I ask again, is there any part of the spec that says you cannot cast away const and modify if the underlying data is mutable? If not, I can't see how this is even worth discussing.

You agree that there are cases where casting away const and then mutating is not ok, even when the data is mutable (see the first example in #3501 (comment) and your answer to that). If that's not in the spec, I hope you agree that we should add it.

Now, your case is different from that one, but the spec also doesn't say that it's ok to cast away const and then mutate when the data is mutable. And adding that would be bad, given the example above.

So we can either go the simple route and disallow casting away const and then mutating. Or we can go a more complex route that allows your use case while forbidding my example. I'd vote for the simple one as it's easier to remember, easier to verify to be correct, and easier to apply correctly.

The overwhelmingly logical choice here is to let the compiler prove the implementation is valid for const, and then do the casting outside where you know the cast is valid.

I can't see how your take on this is the "overwhelmingly logical choice". I think the key point where we're apart is that you think that the compiler should recognize un-const-ing casts and not rely on the const promise accordingly, whereas Jonathan and I think that the const promise still applies and the programmer is then responsible for maintaining it.

@schveiguy
Copy link
Member

Actually, what I'm saying is exactly how inout should work if it were allowed to be applied here. I think this is simply a forced implementation of inout, and should be allowed.

What you have for the conditions is:

  1. Inside a function, the data is treated as const.
  2. Once the function is finished, the data is returned to the same state it was before const was applied.

If you have this, I think it should be, and is, defined and valid. What is not valid is casting away const (and mutating) while inside the function (which is not aware of the mutability of the data inside the function).

@schveiguy
Copy link
Member

I don't think this changes anything. If the compiler is allowed to look at all the code, it's the same situation as before, isn't it?

No. If the compiler can look at (and inline) all the code, it becomes:

auto y = &x;

And it works fine.

@schveiguy
Copy link
Member

If that's not in the spec, I hope you agree that we should add it.

I agree the spec should identify that if the the compiler can prove that a variable cannot be modified inside a pure function (and it is allowed to assume no casts occur inside the function), then it can assume the variable has not been modified by the function.

@schveiguy
Copy link
Member

But I'm 100% certain that casting away const to mutate an object is undefined behavior, even if you know that the underlying object can't be immutable

Can you back this up besides anecdotal evidence? I can't see at all why this is UB:

int x;
const int *y = &x;
*(cast(int *)y) = 5;

const only exists in the compiler, it doesn't survive to generated code, the CPU has no concept of it. Any compiler that doesn't just compile the above and set x to 5 is wrong.

@aG0aep6G
Copy link
Contributor

No. If the compiler can look at (and inline) all the code, it becomes:

auto y = &x;

And it works fine.

You've omitted the type changes. With them:

auto y = cast(int*) cast(const int*) &x;

Your position (as I understand it) is that the compiler should either be dumb and not see a const promise in that cast to const, or if it's smart, it must recognize that the promise is taken back by the cast to mutable. You're disallowing silly/ridiculous compilers that would recognize the const promise but not the cast to mutable.

I agree the spec should identify that if the the compiler can prove that a variable cannot be modified inside a pure function (and it is allowed to assume no casts occur inside the function), then it can assume the variable has not been modified by the function.

That is, you're restricting the const promise to function parameters.

Actually, what I'm saying is exactly how inout should work if it were allowed to be applied here. I think this is simply a forced implementation of inout, and should be allowed.

What you have for the conditions is:

Inside a function, the data is treated as const.
Once the function is finished, the data is returned to the same state it was before const was applied.

If you have this, I think it should be, and is, defined and valid. What is not valid is casting away const (and mutating) while inside the function (which is not aware of the mutability of the data inside the function).

I'm having trouble understanding these conditions, but I guess it comes down to restricting the const promise to function parameters as above.

I don't think const is defined that way. It may actionable to do so. That is, allow casting away const and then mutating as long you don't cross function boundaries. Or something more precise along these lines.

I can't say that I understand all the implications that may have, and I think it's significantly more complex (and therefore prone to be subtly wrong) than simply saying "don't cast away const and then mutate". So I'm not a fan.

I understand that the spec doesn't actually say "don't cast away const and then mutate" either. All it says (which I could find) is "you can't mutate through a const reference".

I think we may have reached a point in the discussion where everyone understands the other side's points, yet neither of us has been convinced by them.

Shall we take it to a broader audience? I made a thread on the forum:
http://forum.dlang.org/post/riiehqozpkyluhhifwha@forum.dlang.org

@jmdavis
Copy link
Member

jmdavis commented Jul 23, 2015

I would point out that regardless of the issues of whether casting away const and mutating is ever defined behavior, I think that doing anything with const ranges at this point is a bad idea. We need to come up with a general solution for this before we start trying to add it to the containers. const and ranges simply do not mix as it stands. You have to go to great lengths to make it work. It's simpler to get a range over const elements from a container than it is to get a tail-const range, but that's really what needs to be solved, and how that is solved could affect how we get ranges over const elements from a container.

@schveiguy
Copy link
Member

It's simpler to get a range over const elements from a container than it is to get a tail-const range

That's what this PR does.

@jmdavis
Copy link
Member

jmdavis commented Jul 23, 2015

That's what this PR does.

Yes. But I don't think that we should be doing even that without solving the more general problem, because how that is solved could affect how this should be done.

@schveiguy
Copy link
Member

Not really, although I'd love to solve the general problem. I think whatever we came up with would allow a drop-in replacement. All that happens is magically mutable ranges can now be implicitly cast to const ranges. We can leave the ConstRange alias to whatever a tail-const version of the range is.

@quickfur
Copy link
Member

Are you guys sure the current language is unable to express a const member function that returns a tail-const mutable range? I didn't look too closely at the details, but I remember writing similar code before, where it's just a matter of explicitly specifying the types of your iteration pointers so that they are tail-const, which the compiler allows assigning const pointers to without casting:

struct MyContainer
{
    Data* root;
    auto getRange() inout
    {
        struct Range
        {
            inout(Data)* currentElem; // N.B. this pointer is tail-const
            ...
            void popFront()
            {
                // AFAIK this is allowed, it's assigning const(Data*)
                // to const(Data)*
                currentElem = currentElem.next;
            }
        }
        return Range(root);
    }
}

@schveiguy
Copy link
Member

Are you guys sure the current language is unable to express a const member function that returns a tail-const mutable range?

Yeah, you can. The issue is, we need inout to run the algorithm (so it goes back to the constancy of this when the algorithm is done). Otherwise, you have to copy/paste the whole thing. And you can't make a tail-inout type, since inout can't be used on a struct member (and the compiler wouldn't know what to do with it if you did).

@schveiguy
Copy link
Member

@quickfur I didn't read your code sample closely enough. what you wrote will not work, because you cannot make a struct with an inout member. You can make const or immutable versions of the range, but they won't implicitly cast right, which is what this PR is about.

@schveiguy
Copy link
Member

@jmdavis @aG0aep6G I concede the UB argument. It's clear that we could make certain cases of behavior defined, but the more I think about it, the more this seems like a horrible hack fix to cover over the lack of tail-const.

We can fix this anyway with @aG0aep6G's suggestion of copying the implementation to const/mutable/immutable for all 3 functions (upperBound lowerBound and equalRange). However, one thing is needed. We need to make _firstGreater and _firstGreaterEqual inout, since those actually deal in Node (a raw pointer which we can tail-ify).

So essentially (and I think you can just do this without changing any code, it's already done for _find), signatures become:

inout(RBNode)* _firstGreater(Elem e) inout
inout(RBNode)* _firstGreaterEqual(Elem e) inout

And then your implementations for all three constancies of each higher-level function can be (mostly) identical. It's repeated code, but you can't template based on the constancy of this. Hopefully these plumbing functions will be replaceable with one function once we can do tail-inout properly.

Sorry for the friction.

@Groterik Groterik force-pushed the const_rbtree branch 2 times, most recently from 8aaa48e to edaa65a Compare July 27, 2015 00:37
@Groterik
Copy link
Contributor Author

Unnecessary castings are eliminated. Current implementation is similar to this one that @schveiguy suggested above.

{
// can't use _find, because we cannot return null
auto cur = _end.left;
auto result = _end;
PointerTarget!(typeof(_end))* result = _end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That type is rather ugly. How about just inout(RBNode)*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is much better. I missed it.

@aG0aep6G
Copy link
Contributor

We could get rid of the duplication with template this parameters:

    private alias QualifiedRange(This : immutable RedBlackTree) = ImmutableRange;
    private alias QualifiedRange(This : const RedBlackTree) = ConstRange;
    private alias QualifiedRange(This : RedBlackTree) = Range;

    auto opSlice(this This)()
    {
        return QualifiedRange!This(_begin, _end);
    }

    /* ... analogous with upperBound, lowerBound, equalRange ... */

This would turn those methods into templates, meaning they're not virtual. But the class is final, so all methods aren't virtual anyway.

There may be other downsides I fail to see right now.

I don't think the duplication is too bad, though. So this looks good to me either way. (Aside from the PointerTarget ugliness mentioned above.)

@Groterik
Copy link
Contributor Author

Thank you, I fixed PointerTarget ugliness.
I like this approach with template this parameter.
Actually I missed the fact that template this parameterized method gets the same qualifier as this. It should be described in documentation explicitly (like in changelog). I also changed _equalRange helper to template this parameterized method.

@schveiguy
Copy link
Member

Nice. LGTM.

@aG0aep6G
Copy link
Contributor

One more thing: When we did this for Array (#2631) we moved RangeT out of the container to reduce template bloat. We should probably do the same here.

I.e., before:

class RedBlackTree(T)
{
    struct RangeT(V, N) {}
}

after:

class RedBlackTree(T) {}
struct RangeT(V, N) {}

@schveiguy
Copy link
Member

to reduce template bloat

This will do more than just reduce the name bloat, because RedBlackTree has some additional parameters. For example the following 2 ranges could theoretically use the same external range type:

RedBlackTree!int rbt1;
RedBlackTree!(int, "a > b") rbt2;

Since the comparison is only used for insertion and searching, not iteration.

I noticed this given the new possible improvement -- why does RangeT need 2 parameters? One can be constructed from the other I think.

Also, don't call it RangeT if it's outside the class, call it RBRange or something like that, to avoid conflict.

@Groterik Groterik force-pushed the const_rbtree branch 2 times, most recently from dc474dd to fb364b2 Compare August 3, 2015 00:51
@Groterik
Copy link
Contributor Author

Groterik commented Aug 3, 2015

I moved RangeT out of the container and renamed it as RBRange. I also made the range depending on 1 parameter.

@DmitryOlshansky
Copy link
Member

Otherwise, you have to copy/paste the whole thing. And you can't make a tail-inout type, since inout can't be used on a struct member (and the compiler wouldn't know what to do with it if you did).

Actually I believe this is strange limitation of the curent inout that I never understood completely. To me the fact that with templates on T you invariably end up with scope-level inout variables is obvious yet it is not implemented.

It should be fairly trivial to treat inout like const in every aspect on the scope-level isn't it? I mean scope is not returned anywhere so we can even ignore the exact modifier inout there implies.

@DmitryOlshansky
Copy link
Member

Oh and on topic - LGTM

}
else
{
// no sense in doing a full search, no duplicates are allowed,
// so we just get the next node.
return Range(beg, beg.next);
return tuple(beg, beg.next);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can redo this function to avoid the additional template machinery below:

auto equalRange(This this)(Elem e)
{
   alias RangeType = RBRange!(typeof(*_begin)*);
   ...
   return RangeType(beg, _firstGreater(e));
   ... // etc
}

@DmitryOlshansky
Copy link
Member

@Groterik ping reviewers after pushing new stuff there is no "source changed" notification from GitHub.

@DmitryOlshansky
Copy link
Member

LGTM - @schveiguy ?

@schveiguy
Copy link
Member

Auto-merge toggled on

schveiguy added a commit that referenced this pull request Aug 13, 2015
Const opSlice, upperBound, lowerBound and equalRange for rbtree
@schveiguy schveiguy merged commit de76ea0 into dlang:master Aug 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants