Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile Synthesis Work Items #82964

Closed
7 of 21 tasks
Tracked by #74873
AndyAyersMS opened this issue Mar 3, 2023 · 7 comments
Closed
7 of 21 tasks
Tracked by #74873

Profile Synthesis Work Items #82964

AndyAyersMS opened this issue Mar 3, 2023 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Mar 3, 2023

This issue captures work items and other notes related to profile synthesis in RyuJit.

Overview

Profile synthesis is an algorithm to estimate plausible block and edge profile data. It can be used to guide optimizations in the absence of true profile data, fill in gaps in profile data, or repair damaged (inconsistent data).

See the section on synthesis in the main Dynamic PGO for more context.

Completed for .NET 8

Opportunistic work for .NET 8

Note a number of the above were handled in .NET 9, see #93020

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 3, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 3, 2023
@AndyAyersMS AndyAyersMS self-assigned this Mar 3, 2023
@ghost
Copy link

ghost commented Mar 3, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue captures work items and other notes related to profile synthesis in RyuJit.

Overview

Profile synthesis is an algorithm to estimate plausible block and edge profile data. It can be used to guide optimizations in the absence of true profile data, fill in gaps in profile data, or repair damaged (inconsistent data).

See the section on synthesis in the main Dynamic PGO for more context.

Proposed work

  • implement core reconstruction algorithm (JIT: initial implementation of profile synthesis #82926). This turns local likelihoods into global counts.
  • extend to handle inlinee flow graphs (see Fatal error. Internal CLR error. (0x80131506) in .NET 7 during GC #82753)
  • consistency check and fix issues
  • come up with some strategy for irredicible loops (see notes below)
  • assess how often $cp$ gets capped in real programs (see notes below)
  • faster way of doing fgGetPredForBlock -- we often want to go from a block to the successor edges
  • vet heuristics against some real data -- maybe leverage MIBC compare capabilities
  • IR based heuristics (perhaps)
  • PGO blend modes, random, etc
  • Repair mode
  • When computing $cp$ avoid repeatedly propagating through nested loops
  • Fake BB0 or always force scratch BB or just tolerate fgFirstBB being special
  • Reconcile with our other loop finding stuff
  • Stop the upweight/downweight of loop blocks in rest of jit
  • Durable edge properties (entry, exit, back)
  • Proper weight comp for finallies
  • Tweak RunRarely to mean at or near zero
  • OSR entry weight still seems off
  • Special handling for deep nests?
Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2023
@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone Mar 3, 2023
@AndyAyersMS AndyAyersMS added this to Needs Triage in .NET Core CodeGen via automation Mar 3, 2023
@AndyAyersMS AndyAyersMS moved this from Needs Triage to PGO in .NET Core CodeGen Mar 3, 2023
@AndyAyersMS
Copy link
Member Author

Some initial data using aspnet.run collection:

  • excluding methods with irreducible or capped profiles, synthesis results pass the profile checker, so are generally consistent
  • 8 methods get capped. They all have loops where the loop exit is a throw. One such example:

https://github.com/aspnet/Security/blob/26d27d871b7992022c082dc207e3d126e1d9d278/shared/Microsoft.AspNetCore.ChunkingCookieManager.Sources/ChunkingCookieManager.cs#L111-L121

We can get rid of the capping by fixing the heuristic, but then we have a throw block with nonzero profile weight.

  • Around 1000 methods have irreducible loops. Many of these are MoveNext async state machine methods. For example:
    image - 2023-03-03T150450 733

Note how there is a loop complex involving BB12, BB06, BB04, BB07, and BB11 but no single entry point.

We can try and handle these during synthesis, but the heuristics might not be great and it would require iteration. Or perhaps we can look into fixing the async codegen so the loops it produces are reducible.

@AndyAyersMS
Copy link
Member Author

@VSadov who is the best person to ask these days about async method IL generation in Roslyn? I am wondering if the irreducible flow I am seeing (like in the example just above) is something that could easily be fixed with a bit of code duplication.

As is, the jit will not recognize the loops in these methods as loops and so a lot of the classic loop optimizations (say hoisting) won't fire. It may not matter in practice as these methods are tough to optimize anyways and maybe not that CPU intensive. But perhaps worth a closer look.

One crude way to view this is that if there were a source gen option for the async state machines, and that code spit had to resort to goto to express the desired control flow, it would quite possibly indicate irreducible flow.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 7, 2023
Add the ability to verify that the profile counts produced by synthesis are
self-consistent, and optionally to assert if they're not. Disable checking
if we know profile flow will be inconsistent (because of irreducible loops
and/or capped cyclic probabilities).

Consistently ignore the likely flow out of handlers. Generally we model
handlers as isolated subgraphs that do not contribute flow to the main flow
graph. This is generally acceptable.

The one caveat is for flow into finallies. The plan here is to fix the weights
for finallies up in a subsequent pass via simple scaling once callfinallies
are introduced. Flow weights out of finallies will be ignored as the
callfinally block will be modeled as always flowing to its pair tail.

Also add the ability to propagate the synthesized counts into the live profile
data. This is mainly to facilitate using the MIBC comparison tools we have
to assess how closely the synthesiszed data comes to actual PGO data.

Finally, enable the new synthesized plus checked modes in a few of our PGO
pipelines.

Contributes to dotnet#82964.
@BruceForstall
Copy link
Member

@AndyAyersMS Is the reference to #82753 incorrect?

AndyAyersMS added a commit that referenced this issue Mar 7, 2023
Add the ability to verify that the profile counts produced by synthesis are
self-consistent, and optionally to assert if they're not. Disable checking
if we know profile flow will be inconsistent (because of irreducible loops
and/or capped cyclic probabilities).

Consistently ignore the likely flow out of handlers. Generally we model
handlers as isolated subgraphs that do not contribute flow to the main flow
graph. This is generally acceptable.

The one caveat is for flow into finallies. The plan here is to fix the weights
for finallies up in a subsequent pass via simple scaling once callfinallies
are introduced. Flow weights out of finallies will be ignored as the
callfinally block will be modeled as always flowing to its pair tail.

Also add the ability to propagate the synthesized counts into the live profile
data. This is mainly to facilitate using the MIBC comparison tools we have
to assess how closely the synthesiszed data comes to actual PGO data.

Finally, enable the new synthesized plus checked modes in a few of our PGO
pipelines.

Contributes to #82964.
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Mar 7, 2023

@AndyAyersMS Is the reference to #82753 incorrect?

Thanks -- fixed that above: the right issue is #82755.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 9, 2023
…ropagation

Once synthesis arrives on the scene, we're not going to want phases in the JIT
to arbitrarily modifing block weights. There is already a guard of this sort
for regular profile data, so it makes sense to extend that to synthesized data
as well.

When synthesizing counts, propagate counts to finallies from the associated
trys. This needs to be done carefully as we have make sure not to visit the
finally until the count in the try is set. We rely on some of the properties
of DFS pre and post number bracketing to do this efficiently, without needing
to track extra state.

Contributes to dotnet#82964.
AndyAyersMS added a commit that referenced this issue Mar 9, 2023
…83185)

Once synthesis arrives on the scene, we're not going to want phases in the JIT
to arbitrarily modifying block weights. There is already a guard of this sort
for regular profile data, so it makes sense to extend that to synthesized data
as well.

When synthesizing counts, propagate counts to finallies from the associated
trys. This needs to be done carefully as we have make sure not to visit the
finally until the count in the try is set. We rely on some of the properties
of DFS pre and post number bracketing to do this efficiently, without needing
to track extra state.

Contributes to #82964.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 17, 2023
Implement blend and repair modes for synthesis. Blend merges a bit of
synthesized data into an existing PGO data set; repair tries to fix any
local inconsistencies (via heuristics). Both run count construction
afterwards.

Trust blended data like we trust dynamic data. Probably will want more
nuance here (eg trust dynamic blend, but not static blend) but this is
sufficent for now.

Also implement random and reverse modes; these will ultimately be used
for stress testing (not called anywhere yet).

Parameterize some of the magic constants that have cropped up.

Add blend mode as a new weekend pgo stress mode; fix the other synthesis
mode I added recently to pgo stress to set the config properly.

Contributes to dotnet#82964.
AndyAyersMS added a commit that referenced this issue Mar 17, 2023
Implement blend and repair modes for synthesis. Blend merges a bit of
synthesized data into an existing PGO data set; repair tries to fix any
local inconsistencies (via heuristics). Both run count construction
afterwards.

Trust blended data like we trust dynamic data. Probably will want more
nuance here (eg trust dynamic blend, but not static blend) but this is
sufficent for now.

Also implement random and reverse modes; these will ultimately be used
for stress testing (not called anywhere yet).

Parameterize some of the magic constants that have cropped up.

Add blend mode as a new weekend pgo stress mode; fix the other synthesis
mode I added recently to pgo stress to set the config properly.

Contributes to #82964.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 17, 2023
Start numbering inlinee blocks from 1 instead of 1 + the root compiler's
max BB num. Update inlinee block bbNums when they are inserted into the
root compiler's graph.

Adjust computations in various places that knew about the old approach
and looked from inlinee compiler to root compiler for bitset, epochs and
the like.

Enable synthesis for inlinees, now that regular bitsets on inlinee compiler
instances behave sensibly.

There is still some messiness around inlinees inheriting root compiler
EH info which requires special checks. I will clean this up separately.

Fixes dotnet#82755.
Contributes to dotnet#82964.

enable synthesis for inlinees
AndyAyersMS added a commit that referenced this issue Mar 20, 2023
…83610)

Start numbering inlinee blocks from 1 instead of 1 + the root compiler's
max BB num. Update inlinee block bbNums when they are inserted into the
root compiler's graph.

Adjust computations in various places that knew about the old approach
and looked from inlinee compiler to root compiler for bitset, epochs and
the like.

Enable synthesis for inlinees, now that regular bitsets on inlinee compiler
instances behave sensibly.

There is still some messiness around inlinees inheriting root compiler
EH info which requires special checks. I will clean this up separately.

Fixes #82755.
Contributes to #82964.
@AndyAyersMS
Copy link
Member Author

Roslyn issue was closed as a dup of dotnet/roslyn#39814

@AndyAyersMS AndyAyersMS added the Priority:3 Work that is nice to have label Jun 5, 2023
@AndyAyersMS
Copy link
Member Author

At this point we've wrapped up all the planned work on Profile Synthesis for .NET 8.

So closing this; we'll revisit some of the open items here in the next release's planning.

.NET Core CodeGen automation moved this from PGO to Done Jul 5, 2023
@dotnet dotnet locked as resolved and limited conversation to collaborators Aug 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Projects
Development

No branches or pull requests

2 participants