Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: handle interaction of OSR, PGO, and tail calls #62263

Merged
merged 3 commits into from Dec 2, 2021

Conversation

AndyAyersMS
Copy link
Member

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.

Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).

Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.

This came up while scouting what it would take to enable OSR by default.
See #61934.

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.

Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).

Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.

This came up while scouting what it would take to enable OSR by default.
See dotnet#61934.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 2, 2021
@ghost
Copy link

ghost commented Dec 2, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.

Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).

Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.

This came up while scouting what it would take to enable OSR by default.
See #61934.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib.

Should be no SPMI diff since it only impacts OSR + PGO, and we don't have any collections with that combination today.

@@ -552,7 +552,8 @@ enum BasicBlockFlags : unsigned __int64
BBF_PATCHPOINT = MAKE_BBFLAG(36), // Block is a patchpoint
BBF_HAS_CLASS_PROFILE = MAKE_BBFLAG(37), // BB contains a call needing a class profile
BBF_PARTIAL_COMPILATION_PATCHPOINT = MAKE_BBFLAG(38), // Block is a partial compilation patchpoint
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction
BBF_TAILCALL_SUCCESSOR = MAKE_BBFLAG(40), // BB has pred that has potential tail call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant BB has successor ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- it marks blocks that come after tail calls.

Maybe a picture will help? Here's a fragment of an OSR method flow graph before we add instrumentation. We want to count how often R is executed, but we can't put probes in R because it is marked with BBF_TAILCALL_SUCCESSOR -- it needs to remain empty since the tail call preds won't execute R.

Also pictured are some non-tail call blocks A and B that conditionally share the return, and an OSR-unreachable block Z. And the blue edge is a fall-through edge. A has degenerate flow, which is rare, but possible.

image - 2021-12-02T085013 213

To handle this we need to put copies of R's probes in the tail call blocks, and create an intermediary block that all the other preds flow through to get to R. So we end up with 3 separate copies of R's pgo probe that collectively give us the right count for R, and R remains empty so the tail calls work as expected.

image - 2021-12-02T085126 253

We also take pains not to instrument Z, since there are debug checks that verify that un-imported blocks remain empty and can be removed. And we take pains not to double-count A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah it makes sense now for me, thanks for detailed response 🙂 not going to mark it as resolved to keep it.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple comments on comments.

@@ -579,7 +579,6 @@ void Compiler::fgReplaceJumpTarget(BasicBlock* block, BasicBlock* newTarget, Bas
if (jumpTab[i] == oldTarget)
{
jumpTab[i] = newTarget;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this is safe and won't cause diffs. However, the header comment specifically says:

// 2. Only the first target found is updated. If there are multiple ways for a block
//    to reach 'oldTarget' (e.g., multiple arms of a switch), only the first one found is changed.

so that should be updated.

One caller, fgNormalizeEHCase2() specifically expects the old behavior:

// Now change the branch. If it was a BBJ_NONE fall-through to the top block, this will
// do nothing. Since cheap preds contains dups (for switch duplicates), we will call
// this once per dup.

but it's ok, because subsequent calls will just do nothing.

Unrelated, I also note the comment says:

 4. The switch table "unique successor" cache is invalidated.

Although we don't call InvalidateUniqueSwitchSuccMap(), so that comment is a little misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let me update this documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the invalidation and updated the comments.

// Build cheap preds.
//
m_comp->fgComputeCheapPreds();
m_comp->NewBasicBlockEpoch();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would EnsureBasicBlockEpoch be sufficient? If not, it's useful to comment why. E.g., see the long comment in fgRenumberBlocks about one versus the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters this early. This is going to be the first time we've done anything epoch related. But happy to change it (there's one other use like this nearby, for edge instrumentation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it over.

There is a subtle issue here using BlockSet and similar this early, if you're in an inlinee compiler you need to make sure to base all these on the root compiler instance, as we share the flow graph across the two. So, added a comment and an assert that we're not in an inlinee compiler.

@AndyAyersMS AndyAyersMS merged commit 3810633 into dotnet:main Dec 2, 2021
@dotnet dotnet locked as resolved and limited conversation to collaborators Jan 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants