Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: it is difficult to do flow analysis on inlinees #82755

Closed
AndyAyersMS opened this issue Feb 28, 2023 · 1 comment · Fixed by #83610
Closed

JIT: it is difficult to do flow analysis on inlinees #82755

AndyAyersMS opened this issue Feb 28, 2023 · 1 comment · Fixed by #83610
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

Inlinee compilers allocate their blocks in the root compiler's flow graph pool, and use the root compiler's block numbering.

Because of this, running analysis on the inlinee compiler's flowgraph "fragment" proves tricky. We got a taste of this in #80958, where I had to introduce fgBBNumMin, whose value is only 1 for the root compiler.

For example, to use a BlockSet one must either use the root compiler's epoch (in which case the bit vectors and scratch storage based on fgBBNumMax are much too large) or else remember to consistently things downward (like bbID) by fgBBNumMin.

This impacts a lot of flow utility methods that might be useful to run on inlinee flowgraph fragments, eg fgDfsReversePostorder (see #82752).

It would be helpful to come up with a better model so that code can be written "naturally" but properly handle cases where it's invoked on inlinee graphs.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 28, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 28, 2023
@ghost
Copy link

ghost commented Feb 28, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Inlinee compilers allocate their blocks in the root compiler's flow graph pool, and use the root compiler's block numbering.

Because of this, running analysis on the inlinee compiler's flowgraph "fragment" proves tricky. We got a taste of this in #80958, where I had to introduce fgBBNumMin, whose value is only 1 for the root compiler.

For example, to use a BlockSet one must either use the root compiler's epoch (in which case the bit vectors and scratch storage based on fgBBNumMax are much too large) or else remember to consistently things downward (like bbID) by fgBBNumMin.

This impacts a lot of flow utility methods that might be useful to run on inlinee flowgraph fragments, eg fgDfsReversePostorder (see #82752).

It would be helpful to come up with a better model so that code can be written "naturally" but properly handle cases where it's invoked on inlinee graphs.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone Feb 28, 2023
@AndyAyersMS AndyAyersMS self-assigned this Feb 28, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 28, 2023
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 3, 2023
Implements a profile synthesis algorithm based on the classic Wu-Larus
paper (Static branch frequency and program profile analysis, Micro-27,
1994), with a simple set of heuristics.

First step is construction of a depth-first spanning tree (DFST) for the
flowgraph, and corresponding reverse postorder (RPO). Together these drive
loop recognition; currently we only recognize reducible loops. We use DFST
(non-) ancestry as a proxy for (non-) domination: the dominator of a node
is required to be a DFST ancestor. So no explicit dominance computation is
needed. Irreducible loops are noted but ignored. Loop features like entry,
back, and exit edges, body sets, and nesting are computed and saved.

Next step is assignment of edge likelihoods. Here we use some simple local
heuristics based on loop structure, returns, and throws. A final heuristic
gives slight preference to conditional branches that fall through to the
next IL offset.

After that we use loop nesting to compute the "cyclic probability" $cp$ for
each loop, working inner to outer in loops and RPO within loops. $cp$ summarizes
the effect of flow through the loop and around loop back edges. We cap $cp$ at
no more than 1000. When finding $cp$ for outer loops we use $cp$ for inner
loops.

Once all $cp$ values are known, we assign "external" input weights to method
and EH entry points, and then a final RPO walk computes the expected weight
of each block (and, via edge likelihoods, each edge).

We use the existing DFS code to build the DFST and the RPO, augmented by
some fixes to ensure all blocks (even ones in isolated cycles) get numbered.

This initial version is intended to establish the right functionality, enable
wider correctness testing, and to provide a foundation for refinement of the
heuristics. It is not yet as efficient as it could be.

The loop recognition and recording done here overlaps with similar code
elsewhere in the JIT. The version here is structural and not sensitive to IL
patterns, so is arguably more general and I believe a good deal simpler than
the lexically driven recognition we use for the current loop optimizer.
I aspire to reconcile this somehow in future work.

All this is disabled by default; a new config option either enables using
synthesis to set block weights for all root methods or just those without PGO
data.

Synthesis for inlinees is not yet enabled; progress here is blocked by dotnet#82755.
AndyAyersMS added a commit that referenced this issue Mar 6, 2023
Implements a profile synthesis algorithm based on the classic Wu-Larus
paper (Static branch frequency and program profile analysis, Micro-27,
1994), with a simple set of heuristics.

First step is construction of a depth-first spanning tree (DFST) for the
flowgraph, and corresponding reverse postorder (RPO). Together these drive
loop recognition; currently we only recognize reducible loops. We use DFST
(non-) ancestry as a proxy for (non-) domination: the dominator of a node
is required to be a DFST ancestor. So no explicit dominance computation is
needed. Irreducible loops are noted but ignored. Loop features like entry,
back, and exit edges, body sets, and nesting are computed and saved.

Next step is assignment of edge likelihoods. Here we use some simple local
heuristics based on loop structure, returns, and throws. A final heuristic
gives slight preference to conditional branches that fall through to the
next IL offset.

After that we use loop nesting to compute the "cyclic probability" $cp$ for
each loop, working inner to outer in loops and RPO within loops. $cp$ summarizes
the effect of flow through the loop and around loop back edges. We cap $cp$ at
no more than 1000. When finding $cp$ for outer loops we use $cp$ for inner
loops.

Once all $cp$ values are known, we assign "external" input weights to method
and EH entry points, and then a final RPO walk computes the expected weight
of each block (and, via edge likelihoods, each edge).

We use the existing DFS code to build the DFST and the RPO, augmented by
some fixes to ensure all blocks (even ones in isolated cycles) get numbered.

This initial version is intended to establish the right functionality, enable
wider correctness testing, and to provide a foundation for refinement of the
heuristics. It is not yet as efficient as it could be.

The loop recognition and recording done here overlaps with similar code
elsewhere in the JIT. The version here is structural and not sensitive to IL
patterns, so is arguably more general and I believe a good deal simpler than
the lexically driven recognition we use for the current loop optimizer.
I aspire to reconcile this somehow in future work.

All this is disabled by default; a new config option either enables using
synthesis to set block weights for all root methods or just those without PGO
data.

Synthesis for inlinees is not yet enabled; progress here is blocked by #82755.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 17, 2023
Start numbering inlinee blocks from 1 instead of 1 + the root compiler's
max BB num. Update inlinee block bbNums when they are inserted into the
root compiler's graph.

Adjust computations in various places that knew about the old approach
and looked from inlinee compiler to root compiler for bitset, epochs and
the like.

Enable synthesis for inlinees, now that regular bitsets on inlinee compiler
instances behave sensibly.

There is still some messiness around inlinees inheriting root compiler
EH info which requires special checks. I will clean this up separately.

Fixes dotnet#82755.
Contributes to dotnet#82964.

enable synthesis for inlinees
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 17, 2023
AndyAyersMS added a commit that referenced this issue Mar 20, 2023
…83610)

Start numbering inlinee blocks from 1 instead of 1 + the root compiler's
max BB num. Update inlinee block bbNums when they are inserted into the
root compiler's graph.

Adjust computations in various places that knew about the old approach
and looked from inlinee compiler to root compiler for bitset, epochs and
the like.

Enable synthesis for inlinees, now that regular bitsets on inlinee compiler
instances behave sensibly.

There is still some messiness around inlinees inheriting root compiler
EH info which requires special checks. I will clean this up separately.

Fixes #82755.
Contributes to #82964.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 20, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant