New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Const opSlice, upperBound, lowerBound and equalRange for rbtree #3501
Conversation
I think this is the wrong approach. First, you should not be casting away const to run an algorithm which is actually const. You should do it the other way around (implement the const version, then cast within the mutable one). Second, this is missing the immutable equivalent (easy to add when you have the boilerplate for const). This also will not work with inout unfortunately. Third, the best way to fix this is to fix the language. Although, we can likely include a reasonable compromise here and then remove the compromise when the language is fixed. |
Thank you, I agree, more work is needed here. I'll try to fix it. |
That would mean call the const opSlice on a mutable The other way around (cast -> I think it's correct the way it is, and it wouldn't be correct the way you suggest. However, simply duplicating the function bodies is an option, too. opSlice, upperBound, and lowerBound are one-liners. Just repeating them with different range types would be shorter and possibly less bug-prone than casting cleverness. equalRange is more complex, I'd avoid duplicating that one.
We did the very same ConstRange/ImmutableRange thing with |
Casting away immutable and then mutating is undefined behavior. Const is OK as long as the actual data is mutable. Obviously this is true from a mutable version of the function. |
This is an excellent point, can we just do this? I'd rather avoid casting wherever possible, even if it means not all the implementations use the same "tricks" |
I have my doubts about this. I'm pretty sure it's wrong in the general case, and I think it may be wrong in this specific case, too. Surely the following is not ok, is it? void f(const int* c) pure {* cast(int*) c = 2;}
void main()
{
int x = 1;
const int* c = &x;
f(c);
}
When we inline void main()
{
int x = 1;
const int* c = &x;
* cast(int*) c = 2;
} Which you're saying is ok, right? It looks less problematic, but unless I'm mistaken, which I well may be, the two snippets should be equivalent, i.e. either both ok or both bad. Anyway, what's wrong with casting the other way around? |
Yes, definitely if you aren't sure if the const data is immutable or mutable, you cannot mutate.
I think in this case, yes, it should be able to assume that. But that is not the same here. The casting occurs locally, not wrapped inside a function. This is more of the equivalent: const int *f(const int *c) pure { return c;}
void main()
{
int x;
auto y = f(&x);
*(cast(int *)y) = 2;
} Now, can the compiler assume that y doesn't point at x? I don't see how it can assume that.
Because this turns off the compiler checks for const. One is free to mutate inside the function without realizing you are compromising the const/immutable versions, and the compiler won't complain. If you do it the other way, the dangerous code is not the implementation, it's simply plumbing. |
|
I meant that the data being mutable is not enough to allow casting away const and then mutating. You're agreeing on the example then, so I think we're on the same page here.
I'm not completely sure if it's correct, but I'm under the impression that when a line of code itself is ok, then turning it into a function should be ok, too. Just like the other way around, inlining any function call should be ok (ignoring recursion for a moment).
Fixed: const(int*) f(const int *c) pure { return c;}
I agree that it can't assume that And when the compiler can rely on the const promise, then I think it could put
I see. Yes, ensuring correctness and maintainability are problems. I still think that the alternative is outright invalid, though. |
const promises go away when you cast. If the compiler sees the cast, it has to assume the worst. And in any case, I need to refine the analogous example. Because the function that does the cast isn't actually mutating anything. So really it looks more like this: const(int) *f1(const(int)* x) { return x;}
int *f2(int *x) { return cast(int *)f1(x);}
void main()
{
int x;
auto y = f2(&x);
*y = 2;
} It seems to me there can't be any invalid assumptions made by the compiler due to const in this case when it examines each function.
I don't see in the spec where that is so. Only modifying immutable data is invalid that I can find. |
Once you cast away So, personally, my take on it is that you just don't cast away Casting away |
Const provides no guarantees of immutability to the compiler. A const variable can easily be mutated via a mutable reference. The spec rightly says it's UB to cast away const/immutable and mutate immutable data.
What if you KNOW that the underlying data is mutable (as in this case)? And in this case actually, the code that knows the data is mutable is not actually mutating, but just restoring the mutable attribute via a cast. Really, what you and @aG0aep6G are asking is that the compiler be intelligent enough to prove that one variable can only be accessed through const references, but to willfully ignore the casting of such a variable. I think that's quite a silly proposition. IMO, the cast should disable any such optimizations. The one case where I think this is a problem is if a pure function can accept a const reference, with no other mutable references that could possibly lead back to that const reference, casts away that const and modifies. Other than that, I think it's difficult to imagine a compiler that is so highly intelligent in flow analysis, but ignores one very important aspect of it. I ask again, is there any part of the spec that says you cannot cast away const and modify if the underlying data is mutable? If not, I can't see how this is even worth discussing. The overwhelmingly logical choice here is to let the compiler prove the implementation is valid for const, and then do the casting outside where you know the cast is valid. |
Even if it is smart enough to not make assumptions around the point where the cast is made (which I would hope that it would be that smart), as soon as other function calls get involved, you're asking it to be that smart across function boundaries, and optimizations don't generally work that way.
I don't know what the spec says, but it has come up before - including in discussions that included Walter - that the compiler can make assumptions based on
If Now, whether the compiler currently does much based on that knowledge, I don't know. But I'm 100% certain that casting away |
I don't think this changes anything. If the compiler is allowed to look at all the code, it's the same situation as before, isn't it? My take on it is that once you've gone const you can't go back to mutable (from the const reference) and mutate.
It's a silly, unrealistic scenario, yes. But I think a hypothetical, silly compiler like that should be allowed.
You agree that there are cases where casting away const and then mutating is not ok, even when the data is mutable (see the first example in #3501 (comment) and your answer to that). If that's not in the spec, I hope you agree that we should add it. Now, your case is different from that one, but the spec also doesn't say that it's ok to cast away const and then mutate when the data is mutable. And adding that would be bad, given the example above. So we can either go the simple route and disallow casting away const and then mutating. Or we can go a more complex route that allows your use case while forbidding my example. I'd vote for the simple one as it's easier to remember, easier to verify to be correct, and easier to apply correctly.
I can't see how your take on this is the "overwhelmingly logical choice". I think the key point where we're apart is that you think that the compiler should recognize un-const-ing casts and not rely on the const promise accordingly, whereas Jonathan and I think that the const promise still applies and the programmer is then responsible for maintaining it. |
Actually, what I'm saying is exactly how inout should work if it were allowed to be applied here. I think this is simply a forced implementation of inout, and should be allowed. What you have for the conditions is:
If you have this, I think it should be, and is, defined and valid. What is not valid is casting away const (and mutating) while inside the function (which is not aware of the mutability of the data inside the function). |
No. If the compiler can look at (and inline) all the code, it becomes: auto y = &x; And it works fine. |
I agree the spec should identify that if the the compiler can prove that a variable cannot be modified inside a pure function (and it is allowed to assume no casts occur inside the function), then it can assume the variable has not been modified by the function. |
Can you back this up besides anecdotal evidence? I can't see at all why this is UB: int x;
const int *y = &x;
*(cast(int *)y) = 5; const only exists in the compiler, it doesn't survive to generated code, the CPU has no concept of it. Any compiler that doesn't just compile the above and set x to 5 is wrong. |
You've omitted the type changes. With them: auto y = cast(int*) cast(const int*) &x; Your position (as I understand it) is that the compiler should either be dumb and not see a const promise in that cast to const, or if it's smart, it must recognize that the promise is taken back by the cast to mutable. You're disallowing silly/ridiculous compilers that would recognize the const promise but not the cast to mutable.
That is, you're restricting the const promise to function parameters.
I'm having trouble understanding these conditions, but I guess it comes down to restricting the const promise to function parameters as above. I don't think const is defined that way. It may actionable to do so. That is, allow casting away const and then mutating as long you don't cross function boundaries. Or something more precise along these lines. I can't say that I understand all the implications that may have, and I think it's significantly more complex (and therefore prone to be subtly wrong) than simply saying "don't cast away const and then mutate". So I'm not a fan. I understand that the spec doesn't actually say "don't cast away const and then mutate" either. All it says (which I could find) is "you can't mutate through a const reference". I think we may have reached a point in the discussion where everyone understands the other side's points, yet neither of us has been convinced by them. Shall we take it to a broader audience? I made a thread on the forum: |
I would point out that regardless of the issues of whether casting away |
That's what this PR does. |
Yes. But I don't think that we should be doing even that without solving the more general problem, because how that is solved could affect how this should be done. |
Not really, although I'd love to solve the general problem. I think whatever we came up with would allow a drop-in replacement. All that happens is magically mutable ranges can now be implicitly cast to const ranges. We can leave the |
Are you guys sure the current language is unable to express a const member function that returns a tail-const mutable range? I didn't look too closely at the details, but I remember writing similar code before, where it's just a matter of explicitly specifying the types of your iteration pointers so that they are tail-const, which the compiler allows assigning const pointers to without casting:
|
Yeah, you can. The issue is, we need inout to run the algorithm (so it goes back to the constancy of |
@quickfur I didn't read your code sample closely enough. what you wrote will not work, because you cannot make a struct with an inout member. You can make const or immutable versions of the range, but they won't implicitly cast right, which is what this PR is about. |
@jmdavis @aG0aep6G I concede the UB argument. It's clear that we could make certain cases of behavior defined, but the more I think about it, the more this seems like a horrible hack fix to cover over the lack of tail-const. We can fix this anyway with @aG0aep6G's suggestion of copying the implementation to const/mutable/immutable for all 3 functions ( So essentially (and I think you can just do this without changing any code, it's already done for inout(RBNode)* _firstGreater(Elem e) inout
inout(RBNode)* _firstGreaterEqual(Elem e) inout And then your implementations for all three constancies of each higher-level function can be (mostly) identical. It's repeated code, but you can't template based on the constancy of Sorry for the friction. |
8aaa48e
to
edaa65a
Compare
Unnecessary castings are eliminated. Current implementation is similar to this one that @schveiguy suggested above. |
{ | ||
// can't use _find, because we cannot return null | ||
auto cur = _end.left; | ||
auto result = _end; | ||
PointerTarget!(typeof(_end))* result = _end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That type is rather ugly. How about just inout(RBNode)*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it is much better. I missed it.
We could get rid of the duplication with template this parameters: private alias QualifiedRange(This : immutable RedBlackTree) = ImmutableRange;
private alias QualifiedRange(This : const RedBlackTree) = ConstRange;
private alias QualifiedRange(This : RedBlackTree) = Range;
auto opSlice(this This)()
{
return QualifiedRange!This(_begin, _end);
}
/* ... analogous with upperBound, lowerBound, equalRange ... */ This would turn those methods into templates, meaning they're not virtual. But the class is final, so all methods aren't virtual anyway. There may be other downsides I fail to see right now. I don't think the duplication is too bad, though. So this looks good to me either way. (Aside from the PointerTarget ugliness mentioned above.) |
Thank you, I fixed PointerTarget ugliness. |
Nice. LGTM. |
One more thing: When we did this for I.e., before: class RedBlackTree(T)
{
struct RangeT(V, N) {}
} after: class RedBlackTree(T) {}
struct RangeT(V, N) {} |
This will do more than just reduce the name bloat, because RedBlackTree has some additional parameters. For example the following 2 ranges could theoretically use the same external range type: RedBlackTree!int rbt1;
RedBlackTree!(int, "a > b") rbt2; Since the comparison is only used for insertion and searching, not iteration. I noticed this given the new possible improvement -- why does Also, don't call it |
dc474dd
to
fb364b2
Compare
I moved |
Actually I believe this is strange limitation of the curent inout that I never understood completely. To me the fact that with templates on T you invariably end up with scope-level inout variables is obvious yet it is not implemented. It should be fairly trivial to treat inout like const in every aspect on the scope-level isn't it? I mean scope is not returned anywhere so we can even ignore the exact modifier inout there implies. |
Oh and on topic - LGTM |
} | ||
else | ||
{ | ||
// no sense in doing a full search, no duplicates are allowed, | ||
// so we just get the next node. | ||
return Range(beg, beg.next); | ||
return tuple(beg, beg.next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can redo this function to avoid the additional template machinery below:
auto equalRange(This this)(Elem e)
{
alias RangeType = RBRange!(typeof(*_begin)*);
...
return RangeType(beg, _firstGreater(e));
... // etc
}
@Groterik ping reviewers after pushing new stuff there is no "source changed" notification from GitHub. |
LGTM - @schveiguy ? |
Auto-merge toggled on |
Const opSlice, upperBound, lowerBound and equalRange for rbtree
Const versions of opSlice, upperBound, lowerBound and equalRange functions are added. They return ConstRange that is a range of const type elements.