Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: verify full profile consistency after importation #100869

Merged
merged 7 commits into from
Apr 13, 2024

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Apr 10, 2024

Move the full profile check down past the importer. Attempt local repair
for cases where the importer alters BBJ_COND. If that is unable to guarantee
consistency, mark the PGO data as inconsistent.

If the importer alters BBJ_SWITCH don't attempt repair, just mark the profile
as inconsistent.

If in an OSR method the original method entry is a loop header, and that is
not the loop that triggered OSR, mark the profile as inconsistent.

If the importer re-imports a LEAVE, there are still orphaned blocks left from
the first importation, these can mess up profiles. In that case, mark the
profile as inconsistent.

Exempt blocks with EH preds (catches, etc) from inbound checking, as
profile data propagation along EH edges is not modelled.

Modify the post-phase checks to allow either small relative errors or small
absolute errors, so that flow out of EH regions though intermediaries (say
step blocks) does not trip the checker.

Ensure the initial pass of likelihood adjustments pays attention to
throws. And only mark throws as rare in the importer if we have not
synthesized profile data (which may in fact tell us the throw is not cold).

Contributes to #93020

Move the full profile check down past the importer. Attempt local repair
in the one place the importer may alter flow. If that is unable to guarantee
consistency, mark the PGO data as inconsistent.

Exempt blocks with EH preds (catches, etc) from imbound checking, as
profile data propagation along EH edges is not modelled.

Ensure the initial pass of likelihood adjustments pays attention to
throws. And only mark throws as rare in the importer if we have not
synthesized profile data (which may in fact tell us the throw is not cold).

Contributes to dotnet#93020
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

@amanasifkhalid PTAL
cc @dotnet/jit-contrib

This causes a bit more churn than I expected... not always marking throws as cold in the importer alters inlining decisions in some cases.

The change in fgLinkBasicBlocks is related to another issue we need to address (as a follow-up) -- the "raw" unsynthesized profile weights can sometimes make it through and be used during jitting. In the case I looked at the callee had a profile schema but all profile data was zero. The likelihoods were consistent, so the repair phase left them as is, and the results converged, so the solver was happy.

@AndyAyersMS
Copy link
Member Author

Some stats from asp.net collection

202,728 methods have   consistent profiles after profile incorporation (roots and inlinees).
     32 methods have inconsistent profiles after profile incorporation
   5446 methods have inconsistent profiles after importation (not including the 32)

So the vast majority of methods remain consistent through importation (not as impressive as it sounds, because many methods, especially inlinees, are single basic block...).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Apr 10, 2024

Realized this is not doing as much checking as it should, so there could be more changes needed.. let me fix that.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass LGTM. I'll take another look once you update this.

//
BasicBlock* const target = block->GetTarget();
assert(target->hasProfileWeight());
target->setBBProfileWeight(target->bbWeight + weight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is premature, but do you think we should consider changing the block weight API surface sooner rather than later? It might be useful as we do these profile fixups to have something like incrementBBProfileWeight that asserts the increment amount is positive, and clears the BBF_RUN_RARELY flag (until we decide to decouple the meaning of this flag from BB_ZERO_WEIGHT) -- and maybe doesn't touch the BBF_PROF_WEIGHT flag?

I'm happy to add some new helpers like this, if you think they'd be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that -- want to get a bit more experience with the updates first, so we can see what patterns are common.

{
assert(fgProfileWeightsEqual(alternateNewWeight, 0));
}
alternate->setBBProfileWeight(max(0.0, alternateNewWeight));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto my above comment -- maybe something like decrementBBProfileWeight that does these underflow checks.

@AndyAyersMS
Copy link
Member Author

@amanasifkhalid take anther look when you can. Changes since your last review:

  • reenable proper checking
  • allow small absolute count discrepancies (nominally, from EH) as well as small relative ones. This is the checking counterpart to JIT: revise how synthesis treats handler regions #100899
  • handle folded switches (by just bailing out)
  • handle OSR special case (where original method entry was loop header) (by just bailing out)

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM -- thanks for summarizing them! I'll take another look once you get to the stress failures.

if (!fgPgoSynthesized)
{
// Any block with a throw is rarely executed.
block->bbSetRunRarely();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the planned BasicBlock interface changes, I imagine this pattern will come up enough that we'll want to move the check into bbSetRunRarely (or do some similar cleanup).

@AndyAyersMS
Copy link
Member Author

Looks like some optional CI failures to sort through.

@AndyAyersMS
Copy link
Member Author

jitstress failure is timeout in BinderTracingTest.ResolutionFlow (~ ##97735)

libraries-pgo failures look unrelated (~ #98292 ?)

pgostress is related, looks like one test case failing across a number of configs, with fully synthesized profile data.

19:19:22.229 Running test: JIT/Regression/JitBlue/GitHub_9651/GitHub_9651/GitHub_9651.dll

Assert failure(PID 3120 [0x00000c30], Thread: 4296 [0x10c8]): Assertion failed '!"Inconsistent profile data"' in 'FinallyReimp:MainX(System.String[]):int' during 'Importation' (IL size 29; hash 0xd0116466; Tier0)

    File: D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:5524
    Image: C:\h\w\B54D0997\p\corerun.exe

@AndyAyersMS
Copy link
Member Author

In that test we end up re-importing a leave, which creates a dead callfinally/finallyret pair, but we don't clean that up and so downstream code sees some extra bogus preds with profile data after importation:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1  0                        100    100 [000..008)-> BB03(0.48),BB02(0.52)   ( cond ) T0      try {       i IBC keep
BB02 [0001]  1  0    BB01                 52     52 [008..00F)-> BB04(1)                 (always) T0                  i IBC
BB03 [0002]  1  0    BB01                 48     48 [00F..010)-> BB04(1)                 (always) T0                  i IBC
BB04 [0003]  2  0    BB02,BB03           100    100 [010..016)-> BB09(1)                 (always) T0      }           i IBC
BB09 [0008]  1       BB04                100    100 [???..???)-> BB05(1)                 (callf )                     i IBC internal
BB10 [0009]  1       BB05                100    100 [???..???)-> BB06(1)                 (callfr)                     i IBC internal
BB07 [0006]  0                           100    100 [???..???)-> BB05(1)                 (callf )                     i IBC internal
BB08 [0007]  1       BB05                100    100 [???..???)-> BB06(1)                 (callfr)                     i IBC internal
BB05 [0004]  3     0 BB07,BB09           100.00 100 [016..01A)-> BB08(0.5),BB10(0.5)     (finret)    H0   finally { } i IBC keep
BB06 [0005]  2       BB08,BB10           100    100 [01A..01D)                           (return)                     i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Here BB06 sees 200 flow incoming, 100 weight, hence the assert.

Since reimportation of a BBJ_LEAVE is rare, I will just have it bail out as well (that is, set fgPgoConsistent=false). Ideally perhaps we'd undo the extra flow. It gets cleaned up eventually.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming CI passes. Thanks!

@AndyAyersMS
Copy link
Member Author

I don't see any good reason to rerun the optional CI modes.

@AndyAyersMS AndyAyersMS merged commit c9bb3e7 into dotnet:main Apr 13, 2024
110 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
Move the full profile check down past the importer. Attempt local repair
for cases where the importer alters BBJ_COND. If that is unable to guarantee
consistency, mark the PGO data as inconsistent.

If the importer alters BBJ_SWITCH don't attempt repair, just mark the profile
as inconsistent.

If in an OSR method the original method entry is a loop header, and that is
not the loop that triggered OSR, mark the profile as inconsistent.

If the importer re-imports a LEAVE, there are still orphaned blocks left from
the first importation, these can mess up profiles. In that case, mark the
profile as inconsistent.

Exempt blocks with EH preds (catches, etc) from inbound checking, as
profile data propagation along EH edges is not modelled.

Modify the post-phase checks to allow either small relative errors or small
absolute errors, so that flow out of EH regions though intermediaries (say
step blocks) does not trip the checker.

Ensure the initial pass of likelihood adjustments pays attention to
throws. And only mark throws as rare in the importer if we have not
synthesized profile data (which may in fact tell us the throw is not cold).

Contributes to dotnet#93020
@github-actions github-actions bot locked and limited conversation to collaborators May 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants