Codegen: Checking for ISA support affects inline codegen #12791

john-h-k · 2019-06-02T14:35:36Z

Writing and comparing some intrinsic code for vectors compared to Sys.Numerics.Vector4 equivalents, I noticed a significant slowdown (~20%) in a tight loop of vectorised addition. I verified the native for the actual add methods is identical, and it is, with the if (Sse.IsSupported) check. However, when it is inlined as part of the loop method (as desired), the codegen is not identical, and

vaddps xmm0, xmm0, xmm0

is transformed to

vmovaps xmm1, xmm0
vaddps xmm0, xmm1, xmm0

, which I think is a completely pointless op, as it doesn't have an effect on the end result (and it slows it down, so clearly not for optimization or anything)

Tested on .NET Core preview 5 as well as on whatever preview version SharpLab is running

Image for easy visualisation of issue

category:cq
theme:vector-codegen
skill-level:expert
cost:large

The text was updated successfully, but these errors were encountered:

See https://github.com/dotnet/coreclr/issues/24912

CarolEidt · 2019-06-03T14:34:31Z

@johnkellyoxford - I looked briefly at the PR from which you referenced this issue, but it's pretty large. If you were able to come up with a small example where you see this, it would be easier to analyze.

I doubt that this is directly related to the check for Sse.IsSupported, as that is eliminated by the JIT as a compile-time constant. FWIW, Sse2 is the baseline requirement for coreclr, so you might want to see what happens if you just eliminate the check (or, if you prefer, you could add an assertion in your debug build that asserts that Sse.IsSupported is true).

My suspicion is that it is a register allocator issue, which may make it tricky to address.

EgorBo · 2019-06-03T14:43:50Z

@CarolEidt a bit less verbose sample: https://gist.github.com/EgorBo/70e879212440003d948ead6c1b824538

john-h-k · 2019-06-03T18:02:05Z

Yeah, @EgorBo 's code is exactly representative of it. In a benchmark, it resulted in a slowdown from 1.4s -> 1.7s, which is not insignificant.

FWIW, Sse2 is the baseline requirement for coreclr

That's useful to know thanks 😄 - however, they function as an implicit IsWindows test too, as opposed to eventual NEON ARM pathways, currently not written (which is why software fallbacks still needed)

CarolEidt · 2019-06-06T15:20:53Z

Quick update: looking at this further, you were indeed correct that it is fundamentally the check that causes the code to be worse. This is because, although the branch is eliminated, the blocks aren't merged yet, so the fact that the two parameters are both equal to vector is not propagated in the local optimization pass.
I still need to analyze why these copies are eliminated later, but I suspect that it is some remaining conservatism in optimizing value types.

CarolEidt · 2019-06-07T14:14:21Z

The reason we don't catch this in optVnCopyProp() is that by the time we reach the uses of left and right in the add intrinsic, vector is no longer live. It is unfortunate that we are using "liveness" here rather than "availability". That is, although vector is no longer live (its last use is when it is assigned to right), it is still available (i.e. the value we're interested in propagating has not been overwritten).
It seems to me (probably missing something here; I haven't spent much time in this code) that we could avoid pruning last-use variables from the temporary liveness set used in copyprop, and check value numbers to make sure that the current value number for the source is the same as the value number for the local we want to propagate to.
However, there appears to be a further problem in that value numbering doesn't apply value numbers to defs, apparently because we don't ever consider them as CSE candidates. However, this means that when it calls GetUseAsgDefVNOrTreeVN on a def, there is no VN and it can't do the copy prop even if the variable is still live.

I've done a little experimentation with extending the local assertion prop to preserve assertions across a fall-through edge with no other incoming edge (i.e. an edge between blocks that could/will be merged), and it handles this case. But that is unlikely to be something we'd consider until post-3.0.

cc @dotnet/jit-contrib

mikedn · 2019-06-07T15:42:50Z

The reason we don't catch this in optVnCopyProp() is that by the time we reach the uses of left and right in the add intrinsic, vector is no longer live. It is unfortunate that we are using "liveness" here rather than "availability". That is, although vector is no longer live (its last use is when it is assigned to right), it is still available (i.e. the value we're interested in propagating has not been overwritten).

Hmm, perhaps I should a PR to add comments to VN copyprop since I starred a lot at that in the past (and AFAIR one of the existing comments is wrong/misleading).

The short version is that:

it needs liveness because we only insert PHIs for live variables so you can't substitute forward past a last-user without risking also going past a missing PHI.
it also needs liveness as a heuristic, without it you risk extending the live range of some variables without also eliminating/shortening the live range of other variables to compensate.

CarolEidt · 2019-06-07T15:52:54Z

it needs liveness because we only insert PHIs for live variables so you can't substitute forward past a last-user without risking also going past a missing PHI.

Right - since we have pruned (semi-pruned?) SSA, we need to rely on liveness at block boundaries to ensure that we don't propagate any lclVars that may no longer be valid. That said, there's no correctness requirement to remove last-uses from the live/available set within a BasicBlock (that's what I was describing above), and I'm sure there must be a better heuristic than never extending those live ranges. Especially since b = a (last use); c = b is a very common pattern that we'd like to be able to propagate.

mikedn · 2019-06-07T16:03:26Z

That said, there's no correctness requirement to remove last-uses from the live/available set within a BasicBlock (that's what I was describing above), and I'm sure there must be a better heuristic than never extending those live ranges.

Yes, there are some improvements that can be made. One thing I tried was to make SSA mark single use defs (which is pretty much always the case with these pesky copy chains) and use that to override the liveness based heuristic:

x = y;
...
z = x;

If x is single use then we now that substituting x with y will result in x's live range going away, compensating for the potential extension of y's own live range.

AFAIR that resulted in a 10k diff but there were regressions too. That coupled with the fact that VN copyprop code is also pretty ugly and inefficient made me move away to other things. I ended up using the SSA single-use thing in forward substitution experiment that too can eliminate such copy chains but can do other things as well.

mikedn · 2019-06-07T16:19:55Z

since we have pruned (semi-pruned?) SSA

BTW, maybe it's worth trying non pruned SSA, perhaps it's not too costly. And maybe that would also eliminate the need for the liveness pass before SSA (yes, it's pruned, not semi). Though there's still the pesky issue of the VN copyprop heuristic.

Ultimately the only good way to do copyprop is allow going into the non-conventional SSA form. But then that requires an out of SSA pass and with the current LclVar design that's probably not going to be easy.

- Within a block, don't remove last uses from the live set, enabling us to catch the `b = a (last use); c = b` case. - We don't label defs VNs, so lookup the VN for DEF, not just USASG. Fix #24912

Fix #24912

AndyAyersMS · 2022-01-13T01:45:56Z

Seems like this is fixed now?

Sharplab 6.0 codegen is identical between AddNoCheckInLoop and AddCheckInLoop.

tannergooding · 2022-01-13T15:48:47Z

This is likely resolved due to the work @EgorBo had done to take advantage of the token resolution during inlining support I added.

It means we can actually see its an IsSupported call and take that into account when upping the "inline profitability".

BruceForstall · 2022-10-30T16:52:38Z

Since it appears to be fixed, I'm closing.

john-h-k referenced this issue in john-h-k/MathSharp Jun 2, 2019

Manual inlining to fix poor codegen

5a952ea

See https://github.com/dotnet/coreclr/issues/24912

CarolEidt self-assigned this Jun 3, 2019

CarolEidt referenced this issue in CarolEidt/coreclr Sep 19, 2019

[WIP] Use extended blocks for assertion prop

098f4b0

Fix #24912

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

CarolEidt removed their assignment Dec 4, 2020

BruceForstall closed this as completed Oct 30, 2022

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2022

ghost locked as resolved and limited conversation to collaborators Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen: Checking for ISA support affects inline codegen #12791

Codegen: Checking for ISA support affects inline codegen #12791

john-h-k commented Jun 2, 2019

CarolEidt commented Jun 3, 2019

EgorBo commented Jun 3, 2019

john-h-k commented Jun 3, 2019

CarolEidt commented Jun 6, 2019 •

edited

Loading

CarolEidt commented Jun 7, 2019

mikedn commented Jun 7, 2019

CarolEidt commented Jun 7, 2019

mikedn commented Jun 7, 2019

mikedn commented Jun 7, 2019

AndyAyersMS commented Jan 13, 2022

tannergooding commented Jan 13, 2022

BruceForstall commented Oct 30, 2022

Codegen: Checking for ISA support affects inline codegen #12791

Codegen: Checking for ISA support affects inline codegen #12791

Comments

john-h-k commented Jun 2, 2019

CarolEidt commented Jun 3, 2019

EgorBo commented Jun 3, 2019

john-h-k commented Jun 3, 2019

CarolEidt commented Jun 6, 2019 • edited Loading

CarolEidt commented Jun 7, 2019

mikedn commented Jun 7, 2019

CarolEidt commented Jun 7, 2019

mikedn commented Jun 7, 2019

mikedn commented Jun 7, 2019

AndyAyersMS commented Jan 13, 2022

tannergooding commented Jan 13, 2022

BruceForstall commented Oct 30, 2022

CarolEidt commented Jun 6, 2019 •

edited

Loading