Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow removal of HWIntrinsic nodes that do not have side effects #84110

Merged
merged 8 commits into from
Apr 11, 2023

Conversation

tannergooding
Copy link
Member

This resolves #9626

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 30, 2023
@ghost ghost assigned tannergooding Mar 30, 2023
@ghost
Copy link

ghost commented Mar 30, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #9626

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding tannergooding marked this pull request as ready for review March 30, 2023 17:18
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib

Diffs from before the "minimize TP impact" commit

For that set of diffs, pin reports:

Base: 234350195620, Diff: 234700500131, +0.1495%

?OperRequiresAsgFlag@GenTree@@QEAA_NXZ                                                            : 383318315  : +1415503.38% : 47.92% : +0.1636%
?fgPerNodeLocalVarLiveness@Compiler@@QEAAXPEAUGenTree@@@Z                                         : 120570453  : +7.86%       : 15.07% : +0.0514%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z                               : 64498612   : +7.38%       : 8.06%  : +0.0275%
?fgTryRemoveDeadStoreLIR@Compiler@@QEAA_NPEAUGenTree@@PEAUGenTreeLclVarCommon@@PEAUBasicBlock@@@Z : 5670973    : +47727.43%   : 0.71%  : +0.0024%
?OperIsMemoryLoadOrStore@GenTreeHWIntrinsic@@QEBA_NXZ                                             : -970803    : -69.24%      : 0.12%  : -0.0004%
?lvNormalizeOnStore@LclVarDsc@@QEBA_NXZ                                                           : -6283380   : -70.15%      : 0.79%  : -0.0027%
?optVNBasedDeadStoreRemoval@Compiler@@QEAA?AW4PhaseStatus@@XZ                                     : -7743420   : -6.74%       : 0.97%  : -0.0033%
memset                                                                                            : -17610669  : -0.96%       : 2.20%  : -0.0075%
?fgMorphCall@Compiler@@AEAAPEAUGenTree@@PEAUGenTreeCall@@@Z                                       : -23334518  : -7.60%       : 2.92%  : -0.0100%
GenTreeVisitor<`Compiler::gtUpdateStmtSideEffects'::`2'::UpdateSideEffectsWalker>::WalkTree       : -27366887  : -5.81%       : 3.42%  : -0.0117%
?gtUpdateNodeOperSideEffects@Compiler@@QEAAXPEAUGenTree@@@Z                                       : -34793868  : -14.82%      : 4.35%  : -0.0148%
?fgMorphSmpOp@Compiler@@AEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@PEA_N@Z                     : -104406812 : -2.12%       : 13.05% : -0.0446%

However, if you measure with PGO for both the base and diff, you instead get:

Base: 239970483681, Diff: 240131550057, +0.0671%

?OperRequiresAsgFlag@GenTree@@QEAA_NXZ                                  : 106093786 : +37.64%  : 53.57% : +0.0442%
?OperExceptions@GenTree@@QEAA?AW4ExceptionSetFlags@@PEAVCompiler@@@Z    : 45327466  : +9.67%   : 22.89% : +0.0189%
memset                                                                  : 17608535  : +0.94%   : 8.89%  : +0.0073%
?optComputeLoopSideEffectsOfBlock@Compiler@@AEAA_NPEAUBasicBlock@@@Z    : 2310389   : +2.75%   : 1.17%  : +0.0010%
?gtUpdateNodeOperSideEffects@Compiler@@QEAAXPEAUGenTree@@@Z             : 2178327   : +3.19%   : 1.10%  : +0.0009%
?fgCheckRemoveStmt@Compiler@@QEAA_NPEAUBasicBlock@@PEAUStatement@@@Z    : 1432493   : +3.76%   : 0.72%  : +0.0006%
?HasSpecialSideEffect@HWIntrinsicInfo@@SA_NW4NamedIntrinsic@@@Z         : 695928    : NA       : 0.35%  : +0.0003%
?HasSpecialSideEffect_Barrier@HWIntrinsicInfo@@SA_NW4NamedIntrinsic@@@Z : 539694    : NA       : 0.27%  : +0.0002%
`Compiler::fgValueNumberHWIntrinsic'::`18'::<lambda_1>::operator()      : 520260    : NA       : 0.26%  : +0.0002%
?OperIsMemoryStore@GenTreeHWIntrinsic@@QEBA_NPEAPEAUGenTree@@@Z         : 493811    : +13.18%  : 0.25%  : +0.0002%
?fgTryRemoveNonLocal@Compiler@@QEAA_NPEAUGenTree@@PEAVRange@LIR@@@Z     : 459316    : +34.22%  : 0.23%  : +0.0002%
?fgPerNodeLocalVarLiveness@Compiler@@QEAAXPEAUGenTree@@@Z               : 377665    : +0.03%   : 0.19%  : +0.0002%
?VNPUnpackExc@ValueNumStore@@QEAAXUValueNumPair@@PEAU2@1@Z              : 345024    : +0.31%   : 0.17%  : +0.0001%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z     : 294062    : +0.03%   : 0.15%  : +0.0001%
?IsOptimizingRetBufAsLocal@GenTreeCall@@QEBA_NXZ                        : 289904    : +2.98%   : 0.15%  : +0.0001%
?IsVectorZero@GenTree@@QEBA_NXZ                                         : -242487   : -51.75%  : 0.12%  : -0.0001%
?OperIsMemoryLoadOrStore@GenTreeHWIntrinsic@@QEBA_NXZ                   : -767586   : -69.09%  : 0.39%  : -0.0003%
`Compiler::fgValueNumberHWIntrinsic'::`14'::<lambda_1>::operator()      : -865511   : -100.00% : 0.44%  : -0.0004%
??0NodeInfo@AliasSet@@QEAA@PEAVCompiler@@PEAUGenTree@@@Z                : -16359281 : -2.66%   : 8.26%  : -0.0068%

So most of the impact is already "resolved" from PGO.


With the most recent commit I got this down to:

Base: 234350195620, Diff: 234448824677, +0.0421%

?OperRequiresAsgFlag@GenTree@@QEAA_NXZ                                                            : 383238609  : +1415209.04% : 47.30% : +0.1635%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z                               : 64498612   : +7.38%       : 7.96%  : +0.0275%
?fgTryRemoveDeadStoreLIR@Compiler@@QEAA_NPEAUGenTree@@PEAUGenTreeLclVarCommon@@PEAUBasicBlock@@@Z : 5670973    : +47727.43%   : 0.70%  : +0.0024%
?OperIsMemoryStore@GenTreeHWIntrinsic@@QEBA_NPEAPEAUGenTree@@@Z                                   : -908068    : -43.39%      : 0.11%  : -0.0004%
?OperIsMemoryLoadOrStore@GenTreeHWIntrinsic@@QEBA_NXZ                                             : -970803    : -69.24%      : 0.12%  : -0.0004%
?lvNormalizeOnStore@LclVarDsc@@QEBA_NXZ                                                           : -6283380   : -70.15%      : 0.78%  : -0.0027%
?optVNBasedDeadStoreRemoval@Compiler@@QEAA?AW4PhaseStatus@@XZ                                     : -7743420   : -6.74%       : 0.96%  : -0.0033%
memset                                                                                            : -17603017  : -0.96%       : 2.17%  : -0.0075%
?fgMorphCall@Compiler@@AEAAPEAUGenTree@@PEAUGenTreeCall@@@Z                                       : -23334518  : -7.60%       : 2.88%  : -0.0100%
GenTreeVisitor<`Compiler::gtUpdateStmtSideEffects'::`2'::UpdateSideEffectsWalker>::WalkTree       : -27366887  : -5.81%       : 3.38%  : -0.0117%
?gtUpdateNodeOperSideEffects@Compiler@@QEAAXPEAUGenTree@@@Z                                       : -34793868  : -14.82%      : 4.29%  : -0.0148%
?fgMorphSmpOp@Compiler@@AEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@PEA_N@Z                     : -104406812 : -2.12%       : 12.89% : -0.0446%
?fgPerNodeLocalVarLiveness@Compiler@@QEAAXPEAUGenTree@@@Z                                         : -130462334 : -8.50%       : 16.10% : -0.0557%

or with PGO enabled for both base and diff:

Base: 239970483681, Diff: 239836802769, -0.0557%

?OperExceptions@GenTree@@QEAA?AW4ExceptionSetFlags@@PEAVCompiler@@@Z : 45327466   : +9.67%   : 18.37% : +0.0189%
?optComputeLoopSideEffectsOfBlock@Compiler@@AEAA_NPEAUBasicBlock@@@Z : 2310389    : +2.75%   : 0.94%  : +0.0010%
?gtUpdateNodeOperSideEffects@Compiler@@QEAAXPEAUGenTree@@@Z          : 2178327    : +3.19%   : 0.88%  : +0.0009%
?fgCheckRemoveStmt@Compiler@@QEAA_NPEAUBasicBlock@@PEAUStatement@@@Z : 1432493    : +3.76%   : 0.58%  : +0.0006%
?fgPerNodeLocalVarLiveness@Compiler@@QEAAXPEAUGenTreeHWIntrinsic@@@Z : 1148181    : NA       : 0.47%  : +0.0005%
?OperRequiresAsgFlag@GenTreeHWIntrinsic@@QEBA_NXZ                    : 601795     : NA       : 0.24%  : +0.0003%
`Compiler::fgValueNumberHWIntrinsic'::`18'::<lambda_1>::operator()   : 520260     : NA       : 0.21%  : +0.0002%
?OperIsMemoryStore@GenTreeHWIntrinsic@@QEBA_NPEAPEAUGenTree@@@Z      : 512317     : +13.68%  : 0.21%  : +0.0002%
?fgTryRemoveNonLocal@Compiler@@QEAA_NPEAUGenTree@@PEAVRange@LIR@@@Z  : 459316     : +34.22%  : 0.19%  : +0.0002%
?HasSpecialSideEffect@HWIntrinsicInfo@@SA_NW4NamedIntrinsic@@@Z      : 362232     : NA       : 0.15%  : +0.0002%
?VNPUnpackExc@ValueNumStore@@QEAAXUValueNumPair@@PEAU2@1@Z           : 345024     : +0.31%   : 0.14%  : +0.0001%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z  : 294398     : +0.03%   : 0.12%  : +0.0001%
?IsOptimizingRetBufAsLocal@GenTreeCall@@QEBA_NXZ                     : 289904     : +2.98%   : 0.12%  : +0.0001%
?OperIsMemoryLoadOrStore@GenTreeHWIntrinsic@@QEBA_NXZ                : -767586    : -69.09%  : 0.31%  : -0.0003%
`Compiler::fgValueNumberHWIntrinsic'::`14'::<lambda_1>::operator()   : -865511    : -100.00% : 0.35%  : -0.0004%
?gtCloneExpr@Compiler@@QEAAPEAUGenTree@@PEAU2@W4GenTreeFlags@@IHIH@Z : -2657566   : -0.67%   : 1.08%  : -0.0011%
??0NodeInfo@AliasSet@@QEAA@PEAVCompiler@@PEAUGenTree@@@Z             : -16359281  : -2.66%   : 6.63%  : -0.0068%
?OperRequiresAsgFlag@GenTree@@QEAA_NXZ                               : -42958596  : -15.24%  : 17.41% : -0.0179%
?fgPerNodeLocalVarLiveness@Compiler@@QEAAXPEAUGenTree@@@Z            : -125972196 : -8.58%   : 51.06% : -0.0525%

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with modeling "special" side effects for intrinsics: it doesn't work for all cases.

Consider:

COMMA
    HWI_Prefetch (or other "special" intrinsic"
    NODE

Today, morph will simplify this down to just NODE, because side effect extraction doesn't know about the "special" side effects. Same problem exists anywhere code does gtFlags & GTF_SIDE_EFFECT (a lot of places). The only reason this isn't a problem currently is because the problematic nodes are top-level. If we ever add a "special" node that produces values, that will cease to be the case.

Can we get away, CQ-wise, with labeling the special side effect with GTF_ASG? It would make the model self-consistent, simpler and more robust.

src/coreclr/jit/compiler.h Show resolved Hide resolved
src/coreclr/jit/fgdiagnostic.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/fgdiagnostic.cpp Show resolved Hide resolved
src/coreclr/jit/fgdiagnostic.cpp Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/liveness.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member Author

The main reason for the TP impact difference between PGO vs non-PGO is that the PGO data is still mostly correct and it still has information for things like OperRequiresAsgFlag saying that the oper == GT_HWINTRINSIC path is cold.

Without PGO, MSVC sees that the hwintrinsic path exists, is the only usage of the call, and doesn't bloat the method too much so it inlines it.

With PGO, MSVC sees that the hwintrinsic path is cold and so it opts to not inline the code. That in turn allows other inlining and optimizations making it overall much cheaper.

-- GT_CALL is also moved earlier in the checks w/ PGO due to it being hotter, which also improves TP overall.

@tannergooding
Copy link
Member Author

Today, morph will simplify this down to just NODE, because side effect extraction doesn't know about the "special" side effects. Same problem exists anywhere code does gtFlags & GTF_SIDE_EFFECT (a lot of places). The only reason this isn't a problem currently is because the problematic nodes are top-level. If we ever add a "special" node that produces values, that will cease to be the case.

If we add such nodes, that's potentially something we'll need to handle at that time. As is, we're already modeling the things accurately with the main flags. We're really only account for the desire to not remove the small handful of very-special void nodes.

Can we get away, CQ-wise, with labeling the special side effect with GTF_ASG? It would make the model self-consistent, simpler and more robust.

We're not doing that for other cases, like GT_MEMORYBARRIER, so I don't think we need to do that here. If we did, then it should be a separate change to all the similar scenarios.

@SingleAccretion
Copy link
Contributor

We're not doing that for other cases, like GT_MEMORYBARRIER, so I don't think we need to do that here. If we did, then it should be a separate change to all the similar scenarios.

I am not sure I understand. GT_MEMORYBARRIER is indeed labeled with ASG.

There aren't any other cases today where the side effects go outside what is supported by the flags.

@tannergooding
Copy link
Member Author

I am not sure I understand. GT_MEMORYBARRIER is indeed labeled with ASG.

Sorry, was mixing up my examples. For the barriers, we are marking with GTF_ASG, yes. For Prefetch* they are effectively non-faulting loads, but which don't actually produce a value. For Pause/Yield they are just GTF_ORDER_SIDEEFF because we don't want to reorder them and then the minimal special handling to ensure they aren't removed in the common case (as otherwise they're viewed as removable).

It would probably be a better long term investment to have a different side effect kind for things like barriers or other node types which aren't actually assignments but which also shouldn't be reordered or removed.

@SingleAccretion
Copy link
Contributor

It would probably be a better long term investment to have a different side effect kind for things like barriers or other node types which aren't actually assignments but which also shouldn't be reordered or removed.

Well, it's a question of what to do now. I think it is unlikely to have any real downside to have the special nodes be a bit more pessimistic about their side effects than they strictly need to be, but having a consistent side effects model.

We have prior art in this area: GT_KEEPALIVE - it shouldn't move and shouldn't be removed. It is labeled as GTF_GLOB_REF | GTF_CALL today.

@tannergooding
Copy link
Member Author

We have prior art in this area: GT_KEEPALIVE - it shouldn't move and shouldn't be removed. It is labeled as GTF_GLOB_REF | GTF_CALL today.

I guess we can mark it as a call or assignment, but in general I don't think doing so is "good practice". Ideally we'd simply mark this appropriately so that small optimizations can still happen as appropriate.

@tannergooding
Copy link
Member Author

Responded to all the feedback and resolved everything except for the overload resolution, which I gave a reason why not above. Should be ready for another review pass

@SingleAccretion SingleAccretion self-requested a review April 10, 2023 20:24
@BruceForstall BruceForstall self-requested a review April 10, 2023 22:25
Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments. Thank you for addressing the feedback!

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/liveness.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member Author

SPMI failures are #84631 and #84536

Previous run was passing and only minor merge conflicts were resolved.

@tannergooding tannergooding merged commit bf81beb into dotnet:main Apr 11, 2023
@tannergooding tannergooding deleted the fix-9626 branch April 11, 2023 19:40
@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the HWIntrinsic nodes to allow dead code optimization.
3 participants