New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: handle interaction of OSR, PGO, and tail calls #62263
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -579,7 +579,6 @@ void Compiler::fgReplaceJumpTarget(BasicBlock* block, BasicBlock* newTarget, Bas | |
if (jumpTab[i] == oldTarget) | ||
{ | ||
jumpTab[i] = newTarget; | ||
break; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAICT, this is safe and won't cause diffs. However, the header comment specifically says:
so that should be updated. One caller, fgNormalizeEHCase2() specifically expects the old behavior:
but it's ok, because subsequent calls will just do nothing. Unrelated, I also note the comment says:
Although we don't call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, let me update this documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the invalidation and updated the comments. |
||
} | ||
} | ||
break; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -367,6 +367,151 @@ void BlockCountInstrumentor::Prepare(bool preImport) | |
return; | ||
} | ||
|
||
// If this is an OSR method, look for potential tail calls in | ||
// blocks that are not BBJ_RETURN. | ||
// | ||
// If we see any, we need to adjust our instrumentation pattern. | ||
// | ||
if (m_comp->opts.IsOSR() && ((m_comp->optMethodFlags & OMF_HAS_TAILCALL_SUCCESSOR) != 0)) | ||
{ | ||
JITDUMP("OSR + PGO + potential tail call --- preparing to relocate block probes\n"); | ||
|
||
// Build cheap preds. | ||
// | ||
m_comp->fgComputeCheapPreds(); | ||
m_comp->NewBasicBlockEpoch(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it matters this early. This is going to be the first time we've done anything epoch related. But happy to change it (there's one other use like this nearby, for edge instrumentation). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed it over. There is a subtle issue here using |
||
|
||
// Keep track of return blocks needing special treatment. | ||
// We also need to track of duplicate preds. | ||
// | ||
JitExpandArrayStack<BasicBlock*> specialReturnBlocks(m_comp->getAllocator(CMK_Pgo)); | ||
BlockSet predsSeen = BlockSetOps::MakeEmpty(m_comp); | ||
|
||
// Walk blocks looking for BBJ_RETURNs that are successors of potential tail calls. | ||
// | ||
// If any such has a conditional pred, we will need to reroute flow from those preds | ||
// via an intermediary block. That block will subsequently hold the relocated block | ||
// probe for the return for those preds. | ||
// | ||
// Scrub the cheap pred list for these blocks so that each pred appears at most once. | ||
// | ||
for (BasicBlock* const block : m_comp->Blocks()) | ||
{ | ||
// Ignore blocks that we won't process. | ||
// | ||
if (!ShouldProcess(block)) | ||
{ | ||
continue; | ||
} | ||
|
||
if ((block->bbFlags & BBF_TAILCALL_SUCCESSOR) != 0) | ||
{ | ||
JITDUMP("Return " FMT_BB " is successor of possible tail call\n", block->bbNum); | ||
assert(block->bbJumpKind == BBJ_RETURN); | ||
bool pushed = false; | ||
BlockSetOps::ClearD(m_comp, predsSeen); | ||
for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next) | ||
{ | ||
BasicBlock* const pred = predEdge->block; | ||
|
||
// If pred is not to be processed, ignore it and scrub from the pred list. | ||
// | ||
if (!ShouldProcess(pred)) | ||
{ | ||
JITDUMP(FMT_BB " -> " FMT_BB " is dead edge\n", pred->bbNum, block->bbNum); | ||
predEdge->block = nullptr; | ||
continue; | ||
} | ||
|
||
BasicBlock* const succ = pred->GetUniqueSucc(); | ||
|
||
if (succ == nullptr) | ||
{ | ||
// Flow from pred -> block is conditional, and will require updating. | ||
// | ||
JITDUMP(FMT_BB " -> " FMT_BB " is critical edge\n", pred->bbNum, block->bbNum); | ||
if (!pushed) | ||
{ | ||
specialReturnBlocks.Push(block); | ||
pushed = true; | ||
} | ||
|
||
// Have we seen this pred before? | ||
// | ||
if (BlockSetOps::IsMember(m_comp, predsSeen, pred->bbNum)) | ||
{ | ||
// Yes, null out the duplicate pred list entry. | ||
// | ||
predEdge->block = nullptr; | ||
} | ||
} | ||
else | ||
{ | ||
// We should only ever see one reference to this pred. | ||
// | ||
assert(!BlockSetOps::IsMember(m_comp, predsSeen, pred->bbNum)); | ||
|
||
// Ensure flow from non-critical preds is BBJ_ALWAYS as we | ||
// may add a new block right before block. | ||
// | ||
if (pred->bbJumpKind == BBJ_NONE) | ||
{ | ||
pred->bbJumpKind = BBJ_ALWAYS; | ||
pred->bbJumpDest = block; | ||
} | ||
assert(pred->bbJumpKind == BBJ_ALWAYS); | ||
} | ||
|
||
BlockSetOps::AddElemD(m_comp, predsSeen, pred->bbNum); | ||
} | ||
} | ||
} | ||
|
||
// Now process each special return block. | ||
// Create an intermediary that falls through to the return. | ||
// Update any critical edges to target the intermediary. | ||
// | ||
// Note we could also route any non-tail-call pred via the | ||
// intermedary. Doing so would cut down on probe duplication. | ||
// | ||
while (specialReturnBlocks.Size() > 0) | ||
{ | ||
bool first = true; | ||
BasicBlock* const block = specialReturnBlocks.Pop(); | ||
BasicBlock* const intermediary = m_comp->fgNewBBbefore(BBJ_NONE, block, /* extendRegion*/ true); | ||
|
||
intermediary->bbFlags |= BBF_IMPORTED; | ||
intermediary->inheritWeight(block); | ||
|
||
for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next) | ||
{ | ||
BasicBlock* const pred = predEdge->block; | ||
|
||
if (pred != nullptr) | ||
{ | ||
BasicBlock* const succ = pred->GetUniqueSucc(); | ||
|
||
if (succ == nullptr) | ||
{ | ||
// This will update all branch targets from pred. | ||
// | ||
m_comp->fgReplaceJumpTarget(pred, intermediary, block); | ||
|
||
// Patch the pred list. Note we only need one pred list | ||
// entry pointing at intermediary. | ||
// | ||
predEdge->block = first ? intermediary : nullptr; | ||
first = false; | ||
} | ||
else | ||
{ | ||
assert(pred->bbJumpKind == BBJ_ALWAYS); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
#ifdef DEBUG | ||
// Set schema index to invalid value | ||
// | ||
|
@@ -449,7 +594,37 @@ void BlockCountInstrumentor::Instrument(BasicBlock* block, Schema& schema, uint8 | |
GenTree* lhsNode = m_comp->gtNewIndOfIconHandleNode(typ, addrOfCurrentExecutionCount, GTF_ICON_BBC_PTR, false); | ||
GenTree* asgNode = m_comp->gtNewAssignNode(lhsNode, rhsNode); | ||
|
||
m_comp->fgNewStmtAtBeg(block, asgNode); | ||
if ((block->bbFlags & BBF_TAILCALL_SUCCESSOR) != 0) | ||
{ | ||
// We should have built and updated cheap preds during the prepare stage. | ||
// | ||
assert(m_comp->fgCheapPredsValid); | ||
|
||
// Instrument each predecessor. | ||
// | ||
bool first = true; | ||
for (BasicBlockList* predEdge = block->bbCheapPreds; predEdge != nullptr; predEdge = predEdge->next) | ||
{ | ||
BasicBlock* const pred = predEdge->block; | ||
|
||
// We may have scrubbed cheap pred list duplicates during Prepare. | ||
// | ||
if (pred != nullptr) | ||
{ | ||
JITDUMP("Placing copy of block probe for " FMT_BB " in pred " FMT_BB "\n", block->bbNum, pred->bbNum); | ||
if (!first) | ||
{ | ||
asgNode = m_comp->gtCloneExpr(asgNode); | ||
} | ||
m_comp->fgNewStmtAtBeg(pred, asgNode); | ||
first = false; | ||
} | ||
} | ||
} | ||
else | ||
{ | ||
m_comp->fgNewStmtAtBeg(block, asgNode); | ||
} | ||
|
||
m_instrCount++; | ||
} | ||
|
@@ -1612,7 +1787,7 @@ PhaseStatus Compiler::fgPrepareToInstrumentMethod() | |
else | ||
{ | ||
JITDUMP("Using block profiling, because %s\n", | ||
(JitConfig.JitEdgeProfiling() > 0) | ||
(JitConfig.JitEdgeProfiling() == 0) | ||
? "edge profiles disabled" | ||
: prejit ? "prejitting" : osrMethod ? "OSR" : "tier0 with patchpoints"); | ||
|
||
|
@@ -1793,6 +1968,13 @@ PhaseStatus Compiler::fgInstrumentMethod() | |
fgCountInstrumentor->InstrumentMethodEntry(schema, profileMemory); | ||
fgClassInstrumentor->InstrumentMethodEntry(schema, profileMemory); | ||
|
||
// If we needed to create cheap preds, we're done with them now. | ||
// | ||
if (fgCheapPredsValid) | ||
{ | ||
fgRemovePreds(); | ||
} | ||
|
||
return PhaseStatus::MODIFIED_EVERYTHING; | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
|
||
using System; | ||
using System.Runtime.CompilerServices; | ||
|
||
class X | ||
{ | ||
static int s; | ||
static int N; | ||
|
||
public static void F(int[] a) | ||
{ | ||
for (int j = 0; j < N; j++) | ||
{ | ||
for (int i = 0; i < a.Length; i++) | ||
{ | ||
s -= a[i]; | ||
} | ||
} | ||
} | ||
|
||
// OSR method that makes a tail call. | ||
// | ||
// If we're also adding PGO probes, | ||
// we need to relocate the ones for | ||
// the return to happen before the | ||
// tail calls. | ||
// | ||
public static void T(bool p, int[] a) | ||
{ | ||
if (p) | ||
{ | ||
for (int j = 0; j < N; j++) | ||
{ | ||
for (int i = 0; i < a.Length; i++) | ||
{ | ||
s += a[i]; | ||
} | ||
} | ||
|
||
F(a); | ||
} | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
public static int Main() | ||
{ | ||
int[] a = new int[1000]; | ||
N = 100; | ||
s = 100; | ||
a[3] = 33; | ||
a[997] = 67; | ||
T(true, a); | ||
return s; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
<PropertyGroup> | ||
<OutputType>Exe</OutputType> | ||
<DebugType /> | ||
<Optimize>True</Optimize> | ||
</PropertyGroup> | ||
<ItemGroup> | ||
<Compile Include="$(MSBuildProjectName).cs" /> | ||
</ItemGroup> | ||
<PropertyGroup> | ||
<CLRTestBatchPreCommands><![CDATA[ | ||
$(CLRTestBatchPreCommands) | ||
set COMPlus_TieredCompilation=1 | ||
set COMPlus_TC_QuickJitForLoops=1 | ||
set COMPlus_TC_OnStackReplacement=1 | ||
]]></CLRTestBatchPreCommands> | ||
<BashCLRTestPreCommands><![CDATA[ | ||
$(BashCLRTestPreCommands) | ||
export COMPlus_TieredCompilation=1 | ||
export COMPlus_TC_QuickJitForLoops=1 | ||
export COMPlus_TC_OnStackReplacement=1 | ||
]]></BashCLRTestPreCommands> | ||
</PropertyGroup> | ||
</Project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant
BB has successor
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No -- it marks blocks that come after tail calls.
Maybe a picture will help? Here's a fragment of an OSR method flow graph before we add instrumentation. We want to count how often
R
is executed, but we can't put probes inR
because it is marked withBBF_TAILCALL_SUCCESSOR
-- it needs to remain empty since the tail call preds won't executeR
.Also pictured are some non-tail call blocks
A
andB
that conditionally share the return, and an OSR-unreachable blockZ
. And the blue edge is a fall-through edge.A
has degenerate flow, which is rare, but possible.To handle this we need to put copies of
R
's probes in the tail call blocks, and create an intermediary block that all the other preds flow through to get toR
. So we end up with 3 separate copies of R's pgo probe that collectively give us the right count forR
, andR
remains empty so the tail calls work as expected.We also take pains not to instrument
Z
, since there are debug checks that verify that un-imported blocks remain empty and can be removed. And we take pains not to double-countA
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah it makes sense now for me, thanks for detailed response 🙂 not going to mark it as resolved to keep it.