Check call has one predecessor before considering eliminating guard #6619

hzongaro · 2022-07-22T18:02:01Z

When applying constraints for a non-overridden call guard, constrainIfcmpeqne looks at the guarded call node to see whether the guard can be eliminated. However, if the call block has more than one predecessor, not all definitions that reach the call block might have been visited on the current walk. That might result in incorrect constraints being applied to the references on the call because of the incomplete information.

This change adds an additional check of whether the block containing the guarded call has only one predecessor before checking whether the guard can be eliminated. That is a simple way of ensuring all the predecessors of the guarded call must already have been walked.

This change fixes eclipse-openj9/openj9#14489 and eclipse-openj9/openj9#15311

hzongaro · 2022-07-22T18:03:34Z

Devin @jdmpapin, may I ask you to review this change? I won't ask Vijay to review, as he advised me on how I should try to fix the problem.

Vijay @vijaysun-omr, FYI.

hzongaro · 2022-07-22T18:22:15Z

Jenkins build all

hzongaro · 2022-07-23T00:23:30Z

Jenkins build plinux,zos

hzongaro · 2022-07-24T02:33:31Z

Jenkins build plinux

jdmpapin · 2022-07-27T22:57:55Z

Putting into my own words what went wrong in eclipse-openj9/openj9#15311 to make sure that I've properly understood...

There was a cold virtual call block with multiple predecessors (probably copies of some original), each with its own nonoverridden guard node. When GVP processes such a guard node, it looks at the call to determine whether the guard can be eliminated, and in particular it finds a constraint for the receiver. But GVP generally expects a node not to be constrained until processing has reached its point of evaluation, and here we are constraining the node before processing has even reached its containing block.

To constrain a load - edit: rather, to 'get the constraint' for a load - we use mergeDefConstraints(), which assumes that each def that hasn't yet been seen can reach the use only via a back-edge. That assumption usually holds because blocks are processed in some topological order ignoring back-edges, so if there is a path from the def to the use that does not include a back-edge, then the def block is processed before the use block (or the def and use are in the same block, with the def appearing first).

Because the nonoverridden guard logic constrains (edit: gets the constraint for) the load out of turn, mergeDefConstraints() gets confused when it encounters unseen defs. It ignores them, expecting their contributions to come from the back-edge constraints, which it proceeds to also ignore because the auto appears to be guaranteed to have been stored since entry into the most recent loop iteration. As a result, it produces a constraint based solely on the defs that have been seen as of the end of the predecessor.

Furthermore, whenever all the defs reaching a use have global constraints, the merged constraint also applies globally to the value number of the use. This is because a use with a single def shares the value number of the def, and a use with multiple defs has a fresh value number, i.e. anything with that value number comes from the same use.

Now to put it all together. While processing the nonoverridden guard in one predecessor, mergeDefConstraints() found only one seen def. That def had a global constraint, and the other defs were (mistakenly) ignored, resulting in a global constraint for the load. Later, GVP processed another nonoverridden guard in a second predecessor, and the same thing happened again, but with a different def, and therefore a different constraint. Because the constraints are both global and both constrain the same value number, GVP intersected them, but they happened to be contradictory, so there was an unexpected intersection failure. The intersection failure was a very noticeable symptom, but each global constraint created this way was already incorrect.

I'm not fully clear on the reason why in the second predecessor VP did not consider the def from the first predecessor to be have been seen, but I assume it's because def constraints are treated like local constraints and propagated along edges.

@vijaysun-omr, @hzongaro, please confirm whether this is the right understanding of the problem.

Then, supposing (optimistically) that I do understand properly, I expect the fix to work (at least when callNode is in the target block, which it might not be, since it looks like getVirtualCallNodeForGuard() follows gotos). When there is only one predecessor, it won't be possible to incorrectly ignore defs reaching from elsewhere simply because there won't be any of those.

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, and since it won't have been seen, there will be no seen defs, and mergeDefConstraints() should just return null, indicating that the load is unconstrained.

However, as it stands I think this will unnecessarily prevent the guard from being remembered in _callNodeToGuardNodes. If it's added to _callNodeToGuardNodes, then constrainCall() might be able to eliminate the guards later, when GVP reaches the call node and is able to properly constrain the receiver.

I also have a few notes about the comment introduced just above the code change.

First, the remat (or similar) case means that we can process all of the predecessors without seeing all definitions that reach from within the same loop iteration.

Second, I don't believe it would be correct to (only) check that all predecessors have been processed. When mergeDefConstraints() checks whether an auto has been stored on all paths, it consults _curDefinedOnAllPaths, which (in this scenario) is accurate at the guard node, rather than at the load itself. I suspect that GVP's notion of which defs have been seen is also (again in this scenario) accurate at the guard node rather than at the load. These are both appropriate even, since we only want to decide whether to eliminate the guard currently under consideration - for that decision it's irrelevant whether the constraint is accurate when coming from a different predecessor. But if that's so (especially w.r.t. the seen defs), then I think it would still be possible to inappropriately create global constraints on the load in the exact way described above. I think that to make it work, we would need a way to suppress the creation of global constraints, but if we had that, we'd be able to check the constraint without regard for other predecessors. If OTOH we want to ensure that all possible paths are taken into account at once, that seems redundant with the logic in constrainCall() using _callNodeToGuardNodes.

Since all the above is so long and can't easily be "resolved" on a point-by-point basis, I'll list my concerns compactly here:

My understanding of the problem and the proposed solution (assumed good in the following points)
Possible merge points at the targets of gotos that getVirtualCallNodeForGuard() has followed
Unnecessary withholding of the guard from _callNodeToGuardNodes
The claim about considering all definitions in the code comment
Viability/utility of the possible future improvement mentioned in the code comment

hzongaro · 2022-07-28T14:54:47Z

Thanks for reviewing, Devin @jdmpapin! I'm just poring over your detailed comments and questions.

jdmpapin · 2022-07-28T17:48:19Z

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, and since it won't have been seen, there will be no seen defs, and mergeDefConstraints() should just return null, indicating that the load is unconstrained.

On second thought, I think it might be possible to get a non-null constraint in this case, when it's the second pass through a block in a loop and there is a store in the target block but not elsewhere, e.g. because of priv arg remat followed by dead store elimination when the inlined body does not use the argument. In that case, there might be a constraint on the back-edge, and if so, I think we would end up with that constraint. I think that would still be okay though, because any such constraint from the back-edge would have originally come from the def constraint created at the same def by VP's first pass over the loop. Constraints created in the first pass are known to hold on every iteration, and when the corresponding constraints differ in the second pass, they should only be strengthened.

hzongaro · 2022-07-29T19:26:43Z

I'm not fully clear on the reason why in the second predecessor VP did not consider the def from the first predecessor to be have been seen, but I assume it's because def constraints are treated like local constraints and propagated along edges.

Yes - having discussed this off-line with Vijay @vijaysun-omr, while processing the second predecessor, the definition from the first predecessor is considered to have not been seen because that definition was on a path that was not part of the path that leads to the second predecessor.

please confirm whether this is the right understanding of the problem.

Yes, my understanding of the problem is the same as yours. Thank you for describing it so thoroughly and clearly!

However, as it stands I think this will unnecessarily prevent the guard from being remembered in _callNodeToGuardNodes. If it's added to _callNodeToGuardNodes, then constrainCall() might be able to eliminate the guards later, when GVP reaches the call node and is able to properly constrain the receiver.

Just so I'm clear - are you suggesting I should revise this pull request so that, if the number of predecessors for the target block is greater than one, an entry will be added to _callNodeToGuardNodes without calling canFoldNonOverriddenGuard?

I'll revise the comment to account for your insights and observations. Thank you, again!

jdmpapin · 2022-07-29T21:40:43Z

the definition from the first predecessor is considered to have not been seen because that definition was on a path that was not part of the path that leads to the second predecessor.

👍

Yes, my understanding of the problem is the same as yours. Thank you for describing it so thoroughly and clearly!

👍

are you suggesting I should revise this pull request so that, if the number of predecessors for the target block is greater than one, an entry will be added to _callNodeToGuardNodes without calling canFoldNonOverriddenGuard?

Yes, I think that any time we leave the nonoverridden guard in place, it should be added to _callNodeToGuardNodes, since it might be possible to eliminate it later when we actually process its block. The later removal attempt won't be able to trigger this bug because it won't happen until VP has already constrained the receiver at the usual time, and therefore with the expected state wrt defs

hzongaro · 2022-08-04T03:52:47Z

Devin @jdmpapin, I've addressed your comments in commit 5740eae. Once you're OK with the changes, I will squash the commits.

jdmpapin · 2022-08-05T17:17:37Z

Thanks, that's an improvement! However, a couple of my earlier points are still outstanding. In particular:

I think that the comment is still misleading. In the abstract, to determine whether a guard G can be eliminated, defs of the receiver that reach only via other predecessors are irrelevant. When we get to the call from G, the receiver is guaranteed to come from a def that reaches G, or from one that occurs between G and the call. So the problem isn't that this determination needs to take other defs into account. Rather, the problem is that the way we find a constraint based on the defs happens to be able to create a global constraint as a side-effect, and that side-effect is incorrect in this case. So I half retract the part of my description above where I said that it was a mistake for those other defs to be ignored. Additionally, since it's possible for a def to occur in target, the single-predecessor condition doesn't even guarantee that all defs have been seen anyway.
It's probably possible for callNode to be in a block other than target, since the search follows goto. It's hard to say whether we have a way to actually generate such IL, but it seems like it should be perfectly legal. At the least, I expect this situation to be very rare, so it should be fine to just check that it's in target, e.g. by using getVirtualCallTreeForGuard() (note: Tree rather than Node), and then using the tree to determine both callNode and getEnclosingBlock() == target.

The reason to require (2) that callNode is in target is that there could be a merge point in between. This matters partly just because the original situation could arise, but it also matters for another reason related to (1). Earlier I said:

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, [...]

This is the case where a def occurs between G and the call. But if the call is in a block other than target, there's space for a def between G and the call that does not prevent all other defs from reaching the call, e.g.

Here, the call node can still be found as long as target (r=b) and the merge block both end with goto. The target has only one predecessor, and so does the cold call block (so it doesn't suffice to check that one's predecessors instead). Both defs a and b reach the call. At G, a has been seen, but b has not, so we'd ignore b and get the constraint for a, even though when coming from G the receiver is necessarily b.

In a sense, we'd have an incorrect constraint precisely because we considered a def that only reaches the call from elsewhere.

hzongaro · 2022-08-08T00:29:13Z

Thanks, Devin @jdmpapin! I've revised things in commit 98a034a.

Again, once you're OK with things, I can squish the commits down.

klangman · 2022-08-08T14:47:17Z

In case some other similar issue exists, I was wondering if it would make sense to have the code that removes the dead-code and inserts a 'return' to also generate code that would crash (i.e. store to 0). Allowing the dummy return to execute can have far more sinister results (like unexpected behaviour or incorrect calculations). I know this is not directly related to this fix, but maybe this is an opportunity to introduce a change like this?

jdmpapin · 2022-08-08T15:49:39Z

@klangman, mustTakeException() is not involved in the bug that this PR fixes. Were you thinking of #6630?

jdmpapin · 2022-08-08T15:55:40Z

Jenkins build all

klangman · 2022-08-08T15:58:21Z

@klangman, mustTakeException() is not involved in the bug that this PR fixes. Were you thinking of #6630?

I'll defer to @hzongaro on this (since he looked at my case as well), but I believe the issue I was debugging which Henry confirmed was a duplicate resulted in the insertion of a dummy return. I agree that the link to mustTakeException() is not a direct link. I also don't want to slow down the delivery of this fix.

jdmpapin · 2022-08-08T16:12:31Z

Oh, I wasn't aware that an interaction had been seen in another duplicate

jdmpapin · 2022-08-08T16:19:22Z

The ppc64le Linux failure looks like #6571

Jenkins build plinux

hzongaro · 2022-08-08T17:05:41Z

In case some other similar issue exists, I was wondering if it would make sense to have the code that removes the dead-code and inserts a 'return' to also generate code that would crash (i.e. store to 0). Allowing the dummy return to execute can have far more sinister results (like unexpected behaviour or incorrect calculations). I know this is not directly related to this fix, but maybe this is an opportunity to introduce a change like this?

Thanks, Kevin @klangman - that's a very good idea. However, I'm inclined to handle that as a separate improvement as most of the distinct symptoms that are ultimately rooted in this bug do not involve mustTakeException(), and it's possible that other bugs in Value Propagation could trigger incorrect use of mustTakeException(). I would also want to make sure I implement that improvement correctly, and I would be worried about how quickly I could implement it.

jdmpapin · 2022-08-08T18:30:37Z

Jenkins build plinux

jdmpapin · 2022-08-08T19:00:37Z

Everything has passed except for the known pp64le Linux failure

jdmpapin · 2022-08-08T19:01:25Z

@hzongaro, you can squash now

In applying constraints for a non-overridden call guard, constrainIfcmpeqne looks at the guarded call node to see whether the guard can be eliminated. That's done using canFoldNonOverriddenGuard which calls mergeDefConstrants. That can have the side effect of creating global constraints. However, if there are paths to the call block that do not go through the block that contains the if the information used to create those global constraints might be incomplete. This change adds an additional check of whether the block containing the guarded call has only one predecessor and whether the block containing the call is actually the target of the branch before checking whether the guard can be eliminated. That avoids the possibility of other merge points between the block containing the if and the block containing the call, and hence that definitions from paths that do not include the if might reach the call.

hzongaro · 2022-08-08T19:31:22Z

you can squash now

Thanks, Devin @jdmpapin - squorshed into commit f247823

jdmpapin · 2022-08-08T20:22:55Z

Squashing introduced no new changes, and the PR builds ran beforehand

hzongaro requested a review from vijaysun-omr as a code owner July 22, 2022 18:02

hzongaro requested review from jdmpapin and removed request for vijaysun-omr July 22, 2022 18:05

hzongaro added the comp:compiler label Jul 22, 2022

jdmpapin approved these changes Aug 8, 2022

View reviewed changes

hzongaro force-pushed the safeguard-mergeDefConstraints-guarded-nonoverridden-call branch from 98a034a to f247823 Compare August 8, 2022 19:29

jdmpapin merged commit 7fd8722 into eclipse:master Aug 8, 2022

hzongaro mentioned this pull request Aug 8, 2022

JDK8 Segmentation error: vmState=0x000514ff eclipse-openj9/openj9#15311

Closed

hzongaro deleted the safeguard-mergeDefConstraints-guarded-nonoverridden-call branch August 9, 2022 15:03

0xdaryl mentioned this pull request Aug 9, 2022

Prevent addition of null global VP constraint #6356

Closed

hzongaro mentioned this pull request Jul 25, 2023

Assertion failure at omr/compiler/optimizer/OMRValuePropagation.cpp:659: c when PROD_WITH_ASSUMES is enabled eclipse-openj9/openj9#15458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check call has one predecessor before considering eliminating guard #6619

Check call has one predecessor before considering eliminating guard #6619

hzongaro commented Jul 22, 2022 •

edited

hzongaro commented Jul 22, 2022

hzongaro commented Jul 22, 2022

hzongaro commented Jul 23, 2022

hzongaro commented Jul 24, 2022

jdmpapin commented Jul 27, 2022 •

edited

hzongaro commented Jul 28, 2022

jdmpapin commented Jul 28, 2022

hzongaro commented Jul 29, 2022

jdmpapin commented Jul 29, 2022

hzongaro commented Aug 4, 2022

jdmpapin commented Aug 5, 2022 •

edited

hzongaro commented Aug 8, 2022

klangman commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

klangman commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

hzongaro commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

hzongaro commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

Check call has one predecessor before considering eliminating guard #6619

Check call has one predecessor before considering eliminating guard #6619

Conversation

hzongaro commented Jul 22, 2022 • edited

hzongaro commented Jul 22, 2022

hzongaro commented Jul 22, 2022

hzongaro commented Jul 23, 2022

hzongaro commented Jul 24, 2022

jdmpapin commented Jul 27, 2022 • edited

hzongaro commented Jul 28, 2022

jdmpapin commented Jul 28, 2022

hzongaro commented Jul 29, 2022

jdmpapin commented Jul 29, 2022

hzongaro commented Aug 4, 2022

jdmpapin commented Aug 5, 2022 • edited

hzongaro commented Aug 8, 2022

klangman commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

klangman commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

hzongaro commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

hzongaro commented Aug 8, 2022

jdmpapin commented Aug 8, 2022

hzongaro commented Jul 22, 2022 •

edited

jdmpapin commented Jul 27, 2022 •

edited

jdmpapin commented Aug 5, 2022 •

edited