JIT: make global morph its own phase, morph blocks in RPO #86822

AndyAyersMS · 2023-05-26T23:05:44Z

Some refactoring in antcipation of enabling cross-block local assertion prop during global morph.

Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.

Some refactoring in antcipation of enabling cross-block local assertion prop during global morph. Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.

ghost · 2023-05-26T23:05:54Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Some refactoring in antcipation of enabling cross-block local assertion prop during global morph.

Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.

Author:	AndyAyersMS
Assignees:	AndyAyersMS
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2023-05-26T23:06:56Z

A bunch of mostly neutral diffs, looks mostly like frame offset churn because temps are created in different orders (which reminds me, we should try and sort locals based on weight or something).

No new functionality actually enabled yet.

AndyAyersMS · 2023-05-26T23:07:29Z

@jakobbotsch PTAL
cc @dotnet/jit-contrib

AndyAyersMS · 2023-05-27T01:18:10Z

Not surprisingly x86 has some special behavior here I'll have to chase down. The assert I added to try and verify that morph only makes "harmless" edits to flow is firing. I suspect it suffers from both false positives and false negatives.

The main purpose behind the assert is to try and validate that any cross-block AP we might do on the blocks that exist at the start of morph (not yet happening here, but in some future PR) remain valid even if new blocks get introduced and flow is altered during morph.

So far the edits I have seen are the following:

Simplify a conditional branch. Main impact is on the (former) target of the branch, it will now have fewer preds. For forward branches (in RPO) this may let us propagate a cleaner set of assertions; for backwards branches we've already processed the block with a too-conservative set, so we're ok but sub-optimal (such is the nature of a one pass flow algorithm).
Add a single throw block, and possibly rewire a fallthrough, and remove outbound flow from a now throwing block. In addition to effects similar to the case above, this will introduce a block with no successors (so also doesn't alter forward flow and is fine to visit any time after the initial block set is processed) or an empty block that just jumps to the former fall through (has no impact on forward prop, and no impact from forward prop, so can be safely visited "out of order").
Change a tail call to branch back to method entry, possibly also introducing an empty scratch bb. There should be / will be no assertions live into the entry block, so altering the flow into it or nearby should be harmless.
For x86 ( and maybe elsewhere) we are invoking fgCreateGCPoll in morph which can introduce control flow. Again it should be harmless to process these blocks out of RPO order but I currently have no way of tracking why new blocks appear and don't want to have to parse their contents to figure out if they are harmless.

Another wrinkle we might consider here is to aggressively prune (or at least mark) unreachable blocks rather than morph them, doing so will also start to trim down the pred sets for blocks and have similar impact as the above. In fact we could do a dynamic evolution of the RPO by turning this into a worklist driven traversal (similar to how the importer runs).

AndyAyersMS · 2023-05-27T02:20:09Z

TP Impact is currently a bit high, but the hope is we can recoup that by reducing the volume of IR leaving morph., at least when optimizations are enabled, or else justify tje TP hit with better CQ.

But likely we need to iterate in normal block order if we're not optimizing since RPO will offer no benefit.

AndyAyersMS · 2023-05-27T02:28:56Z

For the time being we can relax or remove the assert, avoid using RPO unless optimized, and look for better ways to track the set of introduced blocks (so we don't have to search the bblist for them later).

EgorBo · 2023-05-27T14:42:54Z

The diffs are likely just not representative due to too many of missing contexts (MISSED contexts: base: 313,570 (25.86%), diff: 313,581 (25.86%) I've kicked off a new collection

AndyAyersMS · 2023-05-27T21:31:14Z

Locally I see fewer missed contexts (or perhaps just am not paying attention).

At any rate not doing RPO unless we're optimizing knocked down the diffs considerably and should remove the minopts TP impact. I also optimized the new block visits for the optimized case and gave up (for now) trying to detect if the flow alterations and possible code motion could pose problems later on.

I am now seeing diffs where there is an order dependence for block morphing. Block morph is sensitive to a local var's DNER state, and this gets set during global morph as needed, so depending on the relative ordering of the block morph and DNER setting we can get different expansions. I don't think it is a correctness issue since we already have this order dependence now.

jakobbotsch · 2023-05-28T16:17:02Z

src/coreclr/jit/morph.cpp

+            for (BasicBlock* const block : *fgNewBBs)
            {
-                fgMergeBlockReturn(block);
+                fgMorphBlock(block);
            }


Is this a significant chunk of throughput cost? What is the improvement of this tracking mechanism alone?

jakobbotsch · 2023-05-28T16:24:00Z

src/coreclr/jit/morph.cpp

+        fgRenumberBlocks();
+        EnsureBasicBlockEpoch();
+        fgComputeEnterBlocksSet();
+        fgDfsReversePostorder();


I'm curious why we need to do so much work to set up for the RPO traversal. SSA's TopologicalSort does not do any of this work; it simply starts at fgFirstBB (well, the BB root, but then the original fgFirstBB very soon after that). Would it be incorrect to do the same here?

Another idea: can we avoid the two step iteration process, and instead have a version of fgDfsReversePostorder that just invokes a callback instead of recording the order into ambient state? Would it make any difference? I'm not totally sure where the throughput cost comes from in this change.

I'm curious why we need to do so much work to set up for the RPO traversa

We can try and do all this on the fly, that's more or less what I was suggesting above:

we could do a dynamic evolution of the RPO by turning this into a worklist driven traversal (similar to how the importer runs).

I'm not totally sure where the throughput cost comes from in this change.

I don't understand it either, seems like this latest version should (aside from building the RPO) be fairly efficient. Will have to take a look.

Another idea: can we avoid the two step iteration process, and instead have a version of fgDfsReversePostorder that just invokes a callback instead of recording the order into ambient state? Would it make any difference? I'm not totally sure where the throughput cost comes from in this change.

I suspect this is tricky for phases like morph that also can alter the flow graph.

AndyAyersMS · 2023-07-07T18:13:23Z

Note value numbering (via ValueNumberState) does a sort of on the fly RPO to try and maximize forward propagation of VNs. I don't think this approach would be usable in Morph because of the modifications morph can make to the flow graph, and I actually wonder if VN might be better off with a precomputed schedule or callback style processing during a DFS.

AndyAyersMS · 2023-07-07T22:33:40Z

Via pin analysis it looks like the computation of the RPO is the main culprit for the ~0.25% TP increase.

It doesn't seem like it would be substantially cheaper to try and do this on the fly, we still have to enumerate each block's successors which seems like the bulk of the cost here, and which morph currently doesn't need to do.

Presumably assertion state maintenance will add another layer of TP impact, but then the additional assertions could simplify morphing and so recover some of this. Hard to know how much without actually doing the work, but at that point we'd also potentially be seeing CQ wins so could possibly justify a modest TP increase.

AndyAyersMS · 2023-07-13T23:50:14Z

Not going to happen in .NET 8, so will close for now.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 26, 2023

ghost assigned AndyAyersMS May 26, 2023

AndyAyersMS requested a review from jakobbotsch May 26, 2023 23:07

build-analysis bot mentioned this pull request May 27, 2023

Assert failure in GC/API/NoGCRegion/Callback_Svr test #86612

Closed

AndyAyersMS added 2 commits May 27, 2023 08:40

only do RPO when optimizing to mitigate TP impact

9ba3db0

track new blocks

4467a67

jakobbotsch reviewed May 28, 2023

View reviewed changes

This was referenced May 29, 2023

Infra improvements for Helix #68176

Closed

Methodical_others test JIT/Methodical/Coverage/copy_prop_byref_to_native_int crashing #69832

Open

Long Running Test: Interop/MonoAPI/MonoMono/PInvokeDetach/PInvokeDetach.sh #73040

Closed

AndyAyersMS mentioned this pull request Jun 6, 2023

JIT: Constant is not propagated through a struct #87072

Closed

AndyAyersMS closed this Jul 13, 2023

ghost locked as resolved and limited conversation to collaborators Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: make global morph its own phase, morph blocks in RPO #86822

JIT: make global morph its own phase, morph blocks in RPO #86822

AndyAyersMS commented May 26, 2023

ghost commented May 26, 2023

AndyAyersMS commented May 26, 2023

AndyAyersMS commented May 26, 2023

AndyAyersMS commented May 27, 2023 •

edited

Loading

AndyAyersMS commented May 27, 2023

AndyAyersMS commented May 27, 2023

EgorBo commented May 27, 2023

AndyAyersMS commented May 27, 2023

jakobbotsch May 28, 2023

jakobbotsch May 28, 2023

AndyAyersMS May 29, 2023

AndyAyersMS Jul 7, 2023

AndyAyersMS commented Jul 7, 2023

AndyAyersMS commented Jul 7, 2023

AndyAyersMS commented Jul 13, 2023

JIT: make global morph its own phase, morph blocks in RPO #86822

JIT: make global morph its own phase, morph blocks in RPO #86822

Conversation

AndyAyersMS commented May 26, 2023

ghost commented May 26, 2023

AndyAyersMS commented May 26, 2023

AndyAyersMS commented May 26, 2023

AndyAyersMS commented May 27, 2023 • edited Loading

AndyAyersMS commented May 27, 2023

AndyAyersMS commented May 27, 2023

EgorBo commented May 27, 2023

AndyAyersMS commented May 27, 2023

jakobbotsch May 28, 2023

Choose a reason for hiding this comment

jakobbotsch May 28, 2023

Choose a reason for hiding this comment

AndyAyersMS May 29, 2023

Choose a reason for hiding this comment

AndyAyersMS Jul 7, 2023

Choose a reason for hiding this comment

AndyAyersMS commented Jul 7, 2023

AndyAyersMS commented Jul 7, 2023

AndyAyersMS commented Jul 13, 2023

AndyAyersMS commented May 27, 2023 •

edited

Loading