Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: handle interaction of OSR, PGO, and tail calls #62263

Merged
merged 3 commits into from Dec 2, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/coreclr/jit/block.h
Expand Up @@ -552,7 +552,8 @@ enum BasicBlockFlags : unsigned __int64
BBF_PATCHPOINT = MAKE_BBFLAG(36), // Block is a patchpoint
BBF_HAS_CLASS_PROFILE = MAKE_BBFLAG(37), // BB contains a call needing a class profile
BBF_PARTIAL_COMPILATION_PATCHPOINT = MAKE_BBFLAG(38), // Block is a partial compilation patchpoint
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction
BBF_TAILCALL_SUCCESSOR = MAKE_BBFLAG(40), // BB has pred that has potential tail call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant BB has successor ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- it marks blocks that come after tail calls.

Maybe a picture will help? Here's a fragment of an OSR method flow graph before we add instrumentation. We want to count how often R is executed, but we can't put probes in R because it is marked with BBF_TAILCALL_SUCCESSOR -- it needs to remain empty since the tail call preds won't execute R.

Also pictured are some non-tail call blocks A and B that conditionally share the return, and an OSR-unreachable block Z. And the blue edge is a fall-through edge. A has degenerate flow, which is rare, but possible.

image - 2021-12-02T085013 213

To handle this we need to put copies of R's probes in the tail call blocks, and create an intermediary block that all the other preds flow through to get to R. So we end up with 3 separate copies of R's pgo probe that collectively give us the right count for R, and R remains empty so the tail calls work as expected.

image - 2021-12-02T085126 253

We also take pains not to instrument Z, since there are debug checks that verify that un-imported blocks remain empty and can be removed. And we take pains not to double-count A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah it makes sense now for me, thanks for detailed response 🙂 not going to mark it as resolved to keep it.


// The following are sets of flags.

Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/compiler.h
Expand Up @@ -7343,6 +7343,7 @@ class Compiler
#define OMF_NEEDS_GCPOLLS 0x00000200 // Method needs GC polls
#define OMF_HAS_FROZEN_STRING 0x00000400 // Method has a frozen string (REF constant int), currently only on CoreRT.
#define OMF_HAS_PARTIAL_COMPILATION_PATCHPOINT 0x00000800 // Method contains partial compilation patchpoints
#define OMF_HAS_TAILCALL_SUCCESSOR 0x00001000 // Method has potential tail call in a non BBJ_RETURN block

bool doesMethodHaveFatPointer()
{
Expand Down
1 change: 0 additions & 1 deletion src/coreclr/jit/fgbasic.cpp
Expand Up @@ -579,7 +579,6 @@ void Compiler::fgReplaceJumpTarget(BasicBlock* block, BasicBlock* newTarget, Bas
if (jumpTab[i] == oldTarget)
{
jumpTab[i] = newTarget;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this is safe and won't cause diffs. However, the header comment specifically says:

// 2. Only the first target found is updated. If there are multiple ways for a block
//    to reach 'oldTarget' (e.g., multiple arms of a switch), only the first one found is changed.

so that should be updated.

One caller, fgNormalizeEHCase2() specifically expects the old behavior:

// Now change the branch. If it was a BBJ_NONE fall-through to the top block, this will
// do nothing. Since cheap preds contains dups (for switch duplicates), we will call
// this once per dup.

but it's ok, because subsequent calls will just do nothing.

Unrelated, I also note the comment says:

 4. The switch table "unique successor" cache is invalidated.

Although we don't call InvalidateUniqueSwitchSuccMap(), so that comment is a little misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let me update this documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the invalidation and updated the comments.

}
}
break;
Expand Down
186 changes: 184 additions & 2 deletions src/coreclr/jit/fgprofile.cpp
Expand Up @@ -367,6 +367,151 @@ void BlockCountInstrumentor::Prepare(bool preImport)
return;
}

// If this is an OSR method, look for potential tail calls in
// blocks that are not BBJ_RETURN.
//
// If we see any, we need to adjust our instrumentation pattern.
//
if (m_comp->opts.IsOSR() && ((m_comp->optMethodFlags & OMF_HAS_TAILCALL_SUCCESSOR) != 0))
{
JITDUMP("OSR + PGO + potential tail call --- preparing to relocate block probes\n");

// Build cheap preds.
//
m_comp->fgComputeCheapPreds();
m_comp->NewBasicBlockEpoch();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would EnsureBasicBlockEpoch be sufficient? If not, it's useful to comment why. E.g., see the long comment in fgRenumberBlocks about one versus the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters this early. This is going to be the first time we've done anything epoch related. But happy to change it (there's one other use like this nearby, for edge instrumentation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it over.

There is a subtle issue here using BlockSet and similar this early, if you're in an inlinee compiler you need to make sure to base all these on the root compiler instance, as we share the flow graph across the two. So, added a comment and an assert that we're not in an inlinee compiler.


// Keep track of return blocks needing special treatment.
// We also need to track of duplicate preds.
//
JitExpandArrayStack<BasicBlock*> specialReturnBlocks(m_comp->getAllocator(CMK_Pgo));
BlockSet predsSeen = BlockSetOps::MakeEmpty(m_comp);

// Walk blocks looking for BBJ_RETURNs that are successors of potential tail calls.
//
// If any such has a conditional pred, we will need to reroute flow from those preds
// via an intermediary block. That block will subsequently hold the relocated block
// probe for the return for those preds.
//
// Scrub the cheap pred list for these blocks so that each pred appears at most once.
//
for (BasicBlock* const block : m_comp->Blocks())
{
// Ignore blocks that we won't process.
//
if (!ShouldProcess(block))
{
continue;
}

if ((block->bbFlags & BBF_TAILCALL_SUCCESSOR) != 0)
{
JITDUMP("Return " FMT_BB " is successor of possible tail call\n", block->bbNum);
assert(block->bbJumpKind == BBJ_RETURN);
bool pushed = false;
BlockSetOps::ClearD(m_comp, predsSeen);
for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next)
{
BasicBlock* const pred = predEdge->block;

// If pred is not to be processed, ignore it and scrub from the pred list.
//
if (!ShouldProcess(pred))
{
JITDUMP(FMT_BB " -> " FMT_BB " is dead edge\n", pred->bbNum, block->bbNum);
predEdge->block = nullptr;
continue;
}

BasicBlock* const succ = pred->GetUniqueSucc();

if (succ == nullptr)
{
// Flow from pred -> block is conditional, and will require updating.
//
JITDUMP(FMT_BB " -> " FMT_BB " is critical edge\n", pred->bbNum, block->bbNum);
if (!pushed)
{
specialReturnBlocks.Push(block);
pushed = true;
}

// Have we seen this pred before?
//
if (BlockSetOps::IsMember(m_comp, predsSeen, pred->bbNum))
{
// Yes, null out the duplicate pred list entry.
//
predEdge->block = nullptr;
}
}
else
{
// We should only ever see one reference to this pred.
//
assert(!BlockSetOps::IsMember(m_comp, predsSeen, pred->bbNum));

// Ensure flow from non-critical preds is BBJ_ALWAYS as we
// may add a new block right before block.
//
if (pred->bbJumpKind == BBJ_NONE)
{
pred->bbJumpKind = BBJ_ALWAYS;
pred->bbJumpDest = block;
}
assert(pred->bbJumpKind == BBJ_ALWAYS);
}

BlockSetOps::AddElemD(m_comp, predsSeen, pred->bbNum);
}
}
}

// Now process each special return block.
// Create an intermediary that falls through to the return.
// Update any critical edges to target the intermediary.
//
// Note we could also route any non-tail-call pred via the
// intermedary. Doing so would cut down on probe duplication.
//
while (specialReturnBlocks.Size() > 0)
{
bool first = true;
BasicBlock* const block = specialReturnBlocks.Pop();
BasicBlock* const intermediary = m_comp->fgNewBBbefore(BBJ_NONE, block, /* extendRegion*/ true);

intermediary->bbFlags |= BBF_IMPORTED;
intermediary->inheritWeight(block);

for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next)
{
BasicBlock* const pred = predEdge->block;

if (pred != nullptr)
{
BasicBlock* const succ = pred->GetUniqueSucc();

if (succ == nullptr)
{
// This will update all branch targets from pred.
//
m_comp->fgReplaceJumpTarget(pred, intermediary, block);

// Patch the pred list. Note we only need one pred list
// entry pointing at intermediary.
//
predEdge->block = first ? intermediary : nullptr;
first = false;
}
else
{
assert(pred->bbJumpKind == BBJ_ALWAYS);
}
}
}
}
}

#ifdef DEBUG
// Set schema index to invalid value
//
Expand Down Expand Up @@ -449,7 +594,37 @@ void BlockCountInstrumentor::Instrument(BasicBlock* block, Schema& schema, uint8
GenTree* lhsNode = m_comp->gtNewIndOfIconHandleNode(typ, addrOfCurrentExecutionCount, GTF_ICON_BBC_PTR, false);
GenTree* asgNode = m_comp->gtNewAssignNode(lhsNode, rhsNode);

m_comp->fgNewStmtAtBeg(block, asgNode);
if ((block->bbFlags & BBF_TAILCALL_SUCCESSOR) != 0)
{
// We should have built and updated cheap preds during the prepare stage.
//
assert(m_comp->fgCheapPredsValid);

// Instrument each predecessor.
//
bool first = true;
for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next)
{
BasicBlock* const pred = predEdge->block;

// We may have scrubbed cheap pred list duplicates during Prepare.
//
if (pred != nullptr)
{
JITDUMP("Placing copy of block probe for " FMT_BB " in pred " FMT_BB "\n", block->bbNum, pred->bbNum);
if (!first)
{
asgNode = m_comp->gtCloneExpr(asgNode);
}
m_comp->fgNewStmtAtBeg(pred, asgNode);
first = false;
}
}
}
else
{
m_comp->fgNewStmtAtBeg(block, asgNode);
}

m_instrCount++;
}
Expand Down Expand Up @@ -1612,7 +1787,7 @@ PhaseStatus Compiler::fgPrepareToInstrumentMethod()
else
{
JITDUMP("Using block profiling, because %s\n",
(JitConfig.JitEdgeProfiling() > 0)
(JitConfig.JitEdgeProfiling() == 0)
? "edge profiles disabled"
: prejit ? "prejitting" : osrMethod ? "OSR" : "tier0 with patchpoints");

Expand Down Expand Up @@ -1793,6 +1968,13 @@ PhaseStatus Compiler::fgInstrumentMethod()
fgCountInstrumentor->InstrumentMethodEntry(schema, profileMemory);
fgClassInstrumentor->InstrumentMethodEntry(schema, profileMemory);

// If we needed to create cheap preds, we're done with them now.
//
if (fgCheapPredsValid)
{
fgRemovePreds();
}

return PhaseStatus::MODIFIED_EVERYTHING;
}

Expand Down
61 changes: 38 additions & 23 deletions src/coreclr/jit/importer.cpp
Expand Up @@ -9634,34 +9634,49 @@ var_types Compiler::impImportCall(OPCODE opcode,
}

// A tail recursive call is a potential loop from the current block to the start of the method.
if ((tailCallFlags != 0) && canTailCall && gtIsRecursiveCall(methHnd))
if ((tailCallFlags != 0) && canTailCall)
{
assert(verCurrentState.esStackDepth == 0);
BasicBlock* loopHead = nullptr;
if (opts.IsOSR())
// If a root method tail call candidate block is not a BBJ_RETURN, it should have a unique
// BBJ_RETURN successor. Mark that successor so we can handle it specially during profile
// instrumentation.
//
if (!compIsForInlining() && (compCurBB->bbJumpKind != BBJ_RETURN))
{
// We might not have been planning on importing the method
// entry block, but now we must.

// We should have remembered the real method entry block.
assert(fgEntryBB != nullptr);

JITDUMP("\nOSR: found tail recursive call in the method, scheduling " FMT_BB " for importation\n",
fgEntryBB->bbNum);
impImportBlockPending(fgEntryBB);
loopHead = fgEntryBB;
BasicBlock* const successor = compCurBB->GetUniqueSucc();
assert(successor->bbJumpKind == BBJ_RETURN);
successor->bbFlags |= BBF_TAILCALL_SUCCESSOR;
optMethodFlags |= OMF_HAS_TAILCALL_SUCCESSOR;
}
else

if (gtIsRecursiveCall(methHnd))
{
// For normal jitting we'll branch back to the firstBB; this
// should already be imported.
loopHead = fgFirstBB;
}
assert(verCurrentState.esStackDepth == 0);
BasicBlock* loopHead = nullptr;
if (opts.IsOSR())
{
// We might not have been planning on importing the method
// entry block, but now we must.

JITDUMP("\nFound tail recursive call in the method. Mark " FMT_BB " to " FMT_BB
" as having a backward branch.\n",
loopHead->bbNum, compCurBB->bbNum);
fgMarkBackwardJump(loopHead, compCurBB);
// We should have remembered the real method entry block.
assert(fgEntryBB != nullptr);

JITDUMP("\nOSR: found tail recursive call in the method, scheduling " FMT_BB " for importation\n",
fgEntryBB->bbNum);
impImportBlockPending(fgEntryBB);
loopHead = fgEntryBB;
}
else
{
// For normal jitting we'll branch back to the firstBB; this
// should already be imported.
loopHead = fgFirstBB;
}

JITDUMP("\nFound tail recursive call in the method. Mark " FMT_BB " to " FMT_BB
" as having a backward branch.\n",
loopHead->bbNum, compCurBB->bbNum);
fgMarkBackwardJump(loopHead, compCurBB);
}
}

// Note: we assume that small return types are already normalized by the managed callee
Expand Down
57 changes: 57 additions & 0 deletions src/tests/JIT/opt/OSR/tailpgo.cs
@@ -0,0 +1,57 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Runtime.CompilerServices;

class X
{
static int s;
static int N;

public static void F(int[] a)
{
for (int j = 0; j < N; j++)
{
for (int i = 0; i < a.Length; i++)
{
s -= a[i];
}
}
}

// OSR method that makes a tail call.
//
// If we're also adding PGO probes,
// we need to relocate the ones for
// the return to happen before the
// tail calls.
//
public static void T(bool p, int[] a)
{
if (p)
{
for (int j = 0; j < N; j++)
{
for (int i = 0; i < a.Length; i++)
{
s += a[i];
}
}

F(a);
}
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Main()
{
int[] a = new int[1000];
N = 100;
s = 100;
a[3] = 33;
a[997] = 67;
T(true, a);
return s;
}
}
24 changes: 24 additions & 0 deletions src/tests/JIT/opt/OSR/tailpgo.csproj
@@ -0,0 +1,24 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<DebugType />
<Optimize>True</Optimize>
</PropertyGroup>
<ItemGroup>
<Compile Include="$(MSBuildProjectName).cs" />
</ItemGroup>
<PropertyGroup>
<CLRTestBatchPreCommands><![CDATA[
$(CLRTestBatchPreCommands)
set COMPlus_TieredCompilation=1
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TC_OnStackReplacement=1
]]></CLRTestBatchPreCommands>
<BashCLRTestPreCommands><![CDATA[
$(BashCLRTestPreCommands)
export COMPlus_TieredCompilation=1
export COMPlus_TC_QuickJitForLoops=1
export COMPlus_TC_OnStackReplacement=1
]]></BashCLRTestPreCommands>
</PropertyGroup>
</Project>