Skip to content
This repository has been archived by the owner on Jun 20, 2019. It is now read-only.

Fix Bug 286: Returning AA value whose key is a RefCounted type gets wrong value. #640

Merged
merged 1 commit into from Mar 18, 2018

Conversation

ibuclaw
Copy link
Member

@ibuclaw ibuclaw commented Mar 11, 2018

Though waiting for the first test result, as this code-gen rewrite was added for a reason, but the test case is hidden somewhere in #571.

Copy link
Contributor

@jpf91 jpf91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but:

  • No Changelog?
  • Testcase is not completely reduced. Is it possible the test case may no longer test this issue correctly if phobos changes?
  • Could you summarize what this changes exactly? In which cases do we need to stabilize the LHS, in which should we not do that?

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 16, 2018

  1. Yep, I'll add it in. Just wanted opinion first (also makes it easy to rebase as changelog entries always conflict).

  2. Well I hope phobos doesn't change, as its only importing a primitive and a constant. :-)

I'll see if it could be reduced further but I chose to be as close to the original code where the problem was found.

  1. The bug here is this line.
return (matcherCache[set] = S286(set));

Where set is a recounted type and having the codegen rewrite in stabilize_expr meant that the side effect was not saved. And so matcherCache was accessed twice by two different keys - first for assignment and second for return.

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 16, 2018

I'll open a bug report anyway about these routines, as they could be better organized as to what should handle performing an unary operation on a comprex expression and what should remove all side effects. Currently there's some conflation going on between the two that was quickly done to support struct types that are never copied to temp.

Preferably we'll agree something this weekend as its affecting dub in ubuntu 18.04, and so would like it in before the freeze.

@jpf91
Copy link
Contributor

jpf91 commented Mar 17, 2018

Well I hope phobos doesn't change, as its only importing a primitive and a constant. :-) I'll see if it could be reduced further but I chose to be as close to the original code where the problem was found.

I understand the motivation, but what if CodePointSet gets changed into a non-RC struct some time? Then the test would silently test something different.

And so matcherCache was accessed twice by two different keys

So the key differs depending on the current reference count? Isn't that already a phobos bug by itself?

Preferably we'll agree something this weekend as its affecting dub in ubuntu 18.04, and so would like it in before the freeze.

OK. The current solution is fine by me, feel free to merge at any time. I'm just trying to understand this in detail to ensure the fix really is general enough to fix similar problems.

So if we have (a = b) += c then we need to explicitly rewrite to a = b + c, a in binop_assignment and we can't do that in stabilize_expr because then return (matcherCache[set] = S286(set)) gets rewritten to (matcherCache[set] = S286(set), matcherCache[set]) which evaluates side effect accessing set twice? I think I kinda understand this now, but the whole stabilize_expr/reference stuff is still quite a mystery to me.

So we do we need that special case for (a = b) += c, is this simply because the backend does not handle such assignment chains? In other words, is there any possibility there are other code paths where we have to do something similar and our tests just don't catch it right now, or is this approach completely specific to binop_assignment?

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 17, 2018

So the key differs depending on the current reference count? Isn't that already a phobos bug by itself?

It's a bug on our side because.

return (matcherCache[set] = S286(set));

Should lower to (pseudo-code).

tmp = set.postblit(); // increments ref count.
slot = matcherCache[set];
*slot = S286(tmp);
tmp.dtor();  // decrements ref count.
return *slot;

Whereas we were instead generating.

tmp = set.postblit(); // increments ref count.
slot = matcherCache[set];
*slot = S286(tmp);
tmp.dtor();  // decrements ref count.
return matcherCache[set];

The rewrite erroneously removed the side effect, so slot was never saved for re-use in the return, we just called _aaGetY again.

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 17, 2018

I think I kinda understand this now, but the whole stabilize_expr/reference stuff is still quite a mystery to me.

Yes, it's a bit of a mess, in part related to just getting rvo working. I think stabilize_expr should really force a temporary, returning a TARGET_EXPR slot should do.

Then what the function is currently used for renamed to something like complex_unary_op where it generically handles any operation passed to it.

e.g:

// (a, b, c)  ->  (a, b, &c);
exp = complex_unary_op(ADDR_EXPR, exp);
// (a, b, c)  ->  (a, b, *c);
exp = complex_unary_op(INDIRECT_REF, exp);

Then go though the painful process of splitting out what really needs a side-effect stabilized, and what just wants to apply an operation to the relevant part of the expression.

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 17, 2018

this approach completely specific to binop_assignment?

This approach is specific to binop_assignment, putting it in as a generic rewrite was a mistake. But I think things could be simplified as per previous comment.

@jpf91
Copy link
Contributor

jpf91 commented Mar 17, 2018

So the key differs depending on the current reference count? Isn't that already a phobos bug by itself?

It's a bug on our side because.
The rewrite erroneously removed the side effect, so slot was never saved for re-use in the return, we just called _aaGetY again.

I get that part and that's a bug in GDC for sure.
But why does matcherCache[set] even return different values? As far as I understand only the reference count of set is different, but it still isn't 0 in the second call, correct? So we pass two valid instances to _aaGetY with the only difference being the reference count value. But I don't think the AA key hash should take the reference count into account, that does not make any sense and seems like a possible source of bugs. So the set type should probably overload the toHash function.

This approach is specific to binop_assignment, putting it in as a generic rewrite was a mistake. But I think things could be simplified as per previous comment.

Ok thanks for explaining. So this fix should really be fine for now.

@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 18, 2018

But why does matcherCache[set] even return different values? As far as I understand only the reference count of set is different, but it still isn't 0 in the second call, correct? So we pass two valid instances to _aaGetY with the only difference being the reference count value. But I don't think the AA key hash should take the reference count into account, that does not make any sense and seems like a possible source of bugs. So the set type should probably overload the toHash function.

Just to clarify, this is the actual generated code in release mode.

__aaval2414 = {.filter={}, .this=0B};
__fieldPostblit (&(__copytmp2412 = *set));
*__ctor (&__aaval2414, (struct InversionList &) &__copytmp2412);
return <retval> = *(struct CharMatcher *) _aaGetY (&matcherCache, &_D99TypeInfo_HS3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionListS3reg4mainFZ11CharMatcher6__initZ, 24, (struct InversionList *) set) = __aaval2414;

As it's release mode there is no SAVE_EXPR generated for the call to _aaGetY (bounds checking requires multiple evaluation).

And because all the rewrite does is look for e1 = e2 and returns e1, with disregard to whether or not e1 has a side effect, e1 ends up being evaluated multiple times.

And so the bad codegen does:

<retval> = *_aaGetY(set) = aavalue;
// -- rewritten to -> 
*_aaGetY(set) = aavalue;
<retval> = *_aaGetY(set);

In the first call of _aaGetY(set), the value of set is [..., 1], in the second call, [..., 2].

Why does it increment inside every call to _aaGetY? Because the key has a postblit.
https://github.com/D-Programming-GDC/GDC/blob/31c6fcdd6bcd2e68a7ee5f4d48622238a38e4b2b/libphobos/libdruntime/rt/aaA.d#L397-L403

@ibuclaw ibuclaw force-pushed the brokenregex branch 2 times, most recently from 88736f4 to a1c4b11 Compare March 18, 2018 12:38
@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 18, 2018

OK, replaced test with one that doesn't depend on phobos.

@jpf91
Copy link
Contributor

jpf91 commented Mar 18, 2018

Just to clarify, this is the actual generated code in release mode [...]

Thanks for explaining again in detail. I understand that part, but I question that the reference count should even be considered when treating a CodepointSet as a key for AAs. This causes other problems, such as this: https://run.dlang.io/is/ubKJT5 Don't you think that's a bug worth filing against phobos?

OK, replaced test with one that doesn't depend on phobos.

Looks good, this should be ready for merging 👍

@ibuclaw ibuclaw merged commit faab4ac into D-Programming-GDC:master Mar 18, 2018
@ibuclaw ibuclaw deleted the brokenregex branch March 18, 2018 21:48
@ibuclaw
Copy link
Member Author

ibuclaw commented Mar 18, 2018

@jpf91 it may be worth asking about. There might be some explanation for it that I can't think of.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants