Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check call has one predecessor before considering eliminating guard #6619

Conversation

hzongaro
Copy link
Member

@hzongaro hzongaro commented Jul 22, 2022

When applying constraints for a non-overridden call guard, constrainIfcmpeqne looks at the guarded call node to see whether the guard can be eliminated. However, if the call block has more than one predecessor, not all definitions that reach the call block might have been visited on the current walk. That might result in incorrect constraints being applied to the references on the call because of the incomplete information.

This change adds an additional check of whether the block containing the guarded call has only one predecessor before checking whether the guard can be eliminated. That is a simple way of ensuring all the predecessors of the guarded call must already have been walked.

This change fixes eclipse-openj9/openj9#14489 and eclipse-openj9/openj9#15311

@hzongaro
Copy link
Member Author

Devin @jdmpapin, may I ask you to review this change? I won't ask Vijay to review, as he advised me on how I should try to fix the problem.

Vijay @vijaysun-omr, FYI.

@hzongaro hzongaro requested review from jdmpapin and removed request for vijaysun-omr July 22, 2022 18:05
@hzongaro
Copy link
Member Author

Jenkins build all

@hzongaro
Copy link
Member Author

Jenkins build plinux,zos

@hzongaro
Copy link
Member Author

Jenkins build plinux

@jdmpapin
Copy link
Contributor

jdmpapin commented Jul 27, 2022

Putting into my own words what went wrong in eclipse-openj9/openj9#15311 to make sure that I've properly understood...

There was a cold virtual call block with multiple predecessors (probably copies of some original), each with its own nonoverridden guard node. When GVP processes such a guard node, it looks at the call to determine whether the guard can be eliminated, and in particular it finds a constraint for the receiver. But GVP generally expects a node not to be constrained until processing has reached its point of evaluation, and here we are constraining the node before processing has even reached its containing block.

To constrain a load - edit: rather, to 'get the constraint' for a load - we use mergeDefConstraints(), which assumes that each def that hasn't yet been seen can reach the use only via a back-edge. That assumption usually holds because blocks are processed in some topological order ignoring back-edges, so if there is a path from the def to the use that does not include a back-edge, then the def block is processed before the use block (or the def and use are in the same block, with the def appearing first).

Because the nonoverridden guard logic constrains (edit: gets the constraint for) the load out of turn, mergeDefConstraints() gets confused when it encounters unseen defs. It ignores them, expecting their contributions to come from the back-edge constraints, which it proceeds to also ignore because the auto appears to be guaranteed to have been stored since entry into the most recent loop iteration. As a result, it produces a constraint based solely on the defs that have been seen as of the end of the predecessor.

Furthermore, whenever all the defs reaching a use have global constraints, the merged constraint also applies globally to the value number of the use. This is because a use with a single def shares the value number of the def, and a use with multiple defs has a fresh value number, i.e. anything with that value number comes from the same use.

Now to put it all together. While processing the nonoverridden guard in one predecessor, mergeDefConstraints() found only one seen def. That def had a global constraint, and the other defs were (mistakenly) ignored, resulting in a global constraint for the load. Later, GVP processed another nonoverridden guard in a second predecessor, and the same thing happened again, but with a different def, and therefore a different constraint. Because the constraints are both global and both constrain the same value number, GVP intersected them, but they happened to be contradictory, so there was an unexpected intersection failure. The intersection failure was a very noticeable symptom, but each global constraint created this way was already incorrect.

I'm not fully clear on the reason why in the second predecessor VP did not consider the def from the first predecessor to be have been seen, but I assume it's because def constraints are treated like local constraints and propagated along edges.

@vijaysun-omr, @hzongaro, please confirm whether this is the right understanding of the problem.


Then, supposing (optimistically) that I do understand properly, I expect the fix to work (at least when callNode is in the target block, which it might not be, since it looks like getVirtualCallNodeForGuard() follows gotos). When there is only one predecessor, it won't be possible to incorrectly ignore defs reaching from elsewhere simply because there won't be any of those.

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, and since it won't have been seen, there will be no seen defs, and mergeDefConstraints() should just return null, indicating that the load is unconstrained.

However, as it stands I think this will unnecessarily prevent the guard from being remembered in _callNodeToGuardNodes. If it's added to _callNodeToGuardNodes, then constrainCall() might be able to eliminate the guards later, when GVP reaches the call node and is able to properly constrain the receiver.

I also have a few notes about the comment introduced just above the code change.

First, the remat (or similar) case means that we can process all of the predecessors without seeing all definitions that reach from within the same loop iteration.

Second, I don't believe it would be correct to (only) check that all predecessors have been processed. When mergeDefConstraints() checks whether an auto has been stored on all paths, it consults _curDefinedOnAllPaths, which (in this scenario) is accurate at the guard node, rather than at the load itself. I suspect that GVP's notion of which defs have been seen is also (again in this scenario) accurate at the guard node rather than at the load. These are both appropriate even, since we only want to decide whether to eliminate the guard currently under consideration - for that decision it's irrelevant whether the constraint is accurate when coming from a different predecessor. But if that's so (especially w.r.t. the seen defs), then I think it would still be possible to inappropriately create global constraints on the load in the exact way described above. I think that to make it work, we would need a way to suppress the creation of global constraints, but if we had that, we'd be able to check the constraint without regard for other predecessors. If OTOH we want to ensure that all possible paths are taken into account at once, that seems redundant with the logic in constrainCall() using _callNodeToGuardNodes.


Since all the above is so long and can't easily be "resolved" on a point-by-point basis, I'll list my concerns compactly here:

  • My understanding of the problem and the proposed solution (assumed good in the following points)
  • Possible merge points at the targets of gotos that getVirtualCallNodeForGuard() has followed
  • Unnecessary withholding of the guard from _callNodeToGuardNodes
  • The claim about considering all definitions in the code comment
  • Viability/utility of the possible future improvement mentioned in the code comment

@hzongaro
Copy link
Member Author

Thanks for reviewing, Devin @jdmpapin! I'm just poring over your detailed comments and questions.

@jdmpapin
Copy link
Contributor

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, and since it won't have been seen, there will be no seen defs, and mergeDefConstraints() should just return null, indicating that the load is unconstrained.

On second thought, I think it might be possible to get a non-null constraint in this case, when it's the second pass through a block in a loop and there is a store in the target block but not elsewhere, e.g. because of priv arg remat followed by dead store elimination when the inlined body does not use the argument. In that case, there might be a constraint on the back-edge, and if so, I think we would end up with that constraint. I think that would still be okay though, because any such constraint from the back-edge would have originally come from the def constraint created at the same def by VP's first pass over the loop. Constraints created in the first pass are known to hold on every iteration, and when the corresponding constraints differ in the second pass, they should only be strengthened.

@hzongaro
Copy link
Member Author

I'm not fully clear on the reason why in the second predecessor VP did not consider the def from the first predecessor to be have been seen, but I assume it's because def constraints are treated like local constraints and propagated along edges.

Yes - having discussed this off-line with Vijay @vijaysun-omr, while processing the second predecessor, the definition from the first predecessor is considered to have not been seen because that definition was on a path that was not part of the path that leads to the second predecessor.

please confirm whether this is the right understanding of the problem.

Yes, my understanding of the problem is the same as yours. Thank you for describing it so thoroughly and clearly!

However, as it stands I think this will unnecessarily prevent the guard from being remembered in _callNodeToGuardNodes. If it's added to _callNodeToGuardNodes, then constrainCall() might be able to eliminate the guards later, when GVP reaches the call node and is able to properly constrain the receiver.

Just so I'm clear - are you suggesting I should revise this pull request so that, if the number of predecessors for the target block is greater than one, an entry will be added to _callNodeToGuardNodes without calling canFoldNonOverriddenGuard?

I'll revise the comment to account for your insights and observations. Thank you, again!

@jdmpapin
Copy link
Contributor

the definition from the first predecessor is considered to have not been seen because that definition was on a path that was not part of the path that leads to the second predecessor.

👍

Yes, my understanding of the problem is the same as yours. Thank you for describing it so thoroughly and clearly!

👍

are you suggesting I should revise this pull request so that, if the number of predecessors for the target block is greater than one, an entry will be added to _callNodeToGuardNodes without calling canFoldNonOverriddenGuard?

Yes, I think that any time we leave the nonoverridden guard in place, it should be added to _callNodeToGuardNodes, since it might be possible to eliminate it later when we actually process its block. The later removal attempt won't be able to trigger this bug because it won't happen until VP has already constrained the receiver at the usual time, and therefore with the expected state wrt defs

@hzongaro
Copy link
Member Author

hzongaro commented Aug 4, 2022

Devin @jdmpapin, I've addressed your comments in commit 5740eae. Once you're OK with the changes, I will squash the commits.

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 5, 2022

Thanks, that's an improvement! However, a couple of my earlier points are still outstanding. In particular:

  1. I think that the comment is still misleading. In the abstract, to determine whether a guard G can be eliminated, defs of the receiver that reach only via other predecessors are irrelevant. When we get to the call from G, the receiver is guaranteed to come from a def that reaches G, or from one that occurs between G and the call. So the problem isn't that this determination needs to take other defs into account. Rather, the problem is that the way we find a constraint based on the defs happens to be able to create a global constraint as a side-effect, and that side-effect is incorrect in this case. So I half retract the part of my description above where I said that it was a mistake for those other defs to be ignored. Additionally, since it's possible for a def to occur in target, the single-predecessor condition doesn't even guarantee that all defs have been seen anyway.
  2. It's probably possible for callNode to be in a block other than target, since the search follows goto. It's hard to say whether we have a way to actually generate such IL, but it seems like it should be perfectly legal. At the least, I expect this situation to be very rare, so it should be fine to just check that it's in target, e.g. by using getVirtualCallTreeForGuard() (note: Tree rather than Node), and then using the tree to determine both callNode and getEnclosingBlock() == target.

The reason to require (2) that callNode is in target is that there could be a merge point in between. This matters partly just because the original situation could arise, but it also matters for another reason related to (1). Earlier I said:

I was worried about the case where the target block itself defines the auto before the call (e.g. due to priv arg remat), but I think it will be fine. In that case, the load will have only that def, [...]

This is the case where a def occurs between G and the call. But if the call is in a block other than target, there's space for a def between G and the call that does not prevent all other defs from reaching the call, e.g.

6619-nonoverridden-guard-example

Here, the call node can still be found as long as target (r=b) and the merge block both end with goto. The target has only one predecessor, and so does the cold call block (so it doesn't suffice to check that one's predecessors instead). Both defs a and b reach the call. At G, a has been seen, but b has not, so we'd ignore b and get the constraint for a, even though when coming from G the receiver is necessarily b.

In a sense, we'd have an incorrect constraint precisely because we considered a def that only reaches the call from elsewhere.

@hzongaro
Copy link
Member Author

hzongaro commented Aug 8, 2022

Thanks, Devin @jdmpapin! I've revised things in commit 98a034a.

Again, once you're OK with things, I can squish the commits down.

@klangman
Copy link
Contributor

klangman commented Aug 8, 2022

In case some other similar issue exists, I was wondering if it would make sense to have the code that removes the dead-code and inserts a 'return' to also generate code that would crash (i.e. store to 0). Allowing the dummy return to execute can have far more sinister results (like unexpected behaviour or incorrect calculations). I know this is not directly related to this fix, but maybe this is an opportunity to introduce a change like this?

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

@klangman, mustTakeException() is not involved in the bug that this PR fixes. Were you thinking of #6630?

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

Jenkins build all

@klangman
Copy link
Contributor

klangman commented Aug 8, 2022

@klangman, mustTakeException() is not involved in the bug that this PR fixes. Were you thinking of #6630?

I'll defer to @hzongaro on this (since he looked at my case as well), but I believe the issue I was debugging which Henry confirmed was a duplicate resulted in the insertion of a dummy return. I agree that the link to mustTakeException() is not a direct link. I also don't want to slow down the delivery of this fix.

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

Oh, I wasn't aware that an interaction had been seen in another duplicate

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

The ppc64le Linux failure looks like #6571

Jenkins build plinux

@hzongaro
Copy link
Member Author

hzongaro commented Aug 8, 2022

In case some other similar issue exists, I was wondering if it would make sense to have the code that removes the dead-code and inserts a 'return' to also generate code that would crash (i.e. store to 0). Allowing the dummy return to execute can have far more sinister results (like unexpected behaviour or incorrect calculations). I know this is not directly related to this fix, but maybe this is an opportunity to introduce a change like this?

Thanks, Kevin @klangman - that's a very good idea. However, I'm inclined to handle that as a separate improvement as most of the distinct symptoms that are ultimately rooted in this bug do not involve mustTakeException(), and it's possible that other bugs in Value Propagation could trigger incorrect use of mustTakeException(). I would also want to make sure I implement that improvement correctly, and I would be worried about how quickly I could implement it.

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

Jenkins build plinux

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

Everything has passed except for the known pp64le Linux failure

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

@hzongaro, you can squash now

In applying constraints for a non-overridden call guard,
constrainIfcmpeqne looks at the guarded call node to see whether the
guard can be eliminated.  That's done using canFoldNonOverriddenGuard
which calls mergeDefConstrants.  That can have the side effect of
creating global constraints.  However, if there are paths to the call
block that do not go through the block that contains the if the
information used to create those global constraints might be incomplete.

This change adds an additional check of whether the block containing the
guarded call has only one predecessor and whether the block containing
the call is actually the target of the branch before checking whether
the guard can be eliminated.  That avoids the possibility of other merge
points between the block containing the if and the block containing the
call, and hence that definitions from paths that do not include the if
might reach the call.
@hzongaro hzongaro force-pushed the safeguard-mergeDefConstraints-guarded-nonoverridden-call branch from 98a034a to f247823 Compare August 8, 2022 19:29
@hzongaro
Copy link
Member Author

hzongaro commented Aug 8, 2022

you can squash now

Thanks, Devin @jdmpapin - squorshed into commit f247823

@jdmpapin
Copy link
Contributor

jdmpapin commented Aug 8, 2022

Squashing introduced no new changes, and the PR builds ran beforehand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eclipse 2021-12 crashes very often after switching from jdk 11 to jdk 17
3 participants