-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codegen: Checking for ISA support affects inline codegen #12791
Comments
@johnkellyoxford - I looked briefly at the PR from which you referenced this issue, but it's pretty large. If you were able to come up with a small example where you see this, it would be easier to analyze. I doubt that this is directly related to the check for My suspicion is that it is a register allocator issue, which may make it tricky to address. |
@CarolEidt a bit less verbose sample: https://gist.github.com/EgorBo/70e879212440003d948ead6c1b824538 |
Yeah, @EgorBo 's code is exactly representative of it. In a benchmark, it resulted in a slowdown from 1.4s -> 1.7s, which is not insignificant.
That's useful to know thanks 😄 - however, they function as an implicit |
Quick update: looking at this further, you were indeed correct that it is fundamentally the check that causes the code to be worse. This is because, although the branch is eliminated, the blocks aren't merged yet, so the fact that the two parameters are both equal to |
The reason we don't catch this in I've done a little experimentation with extending the local assertion prop to preserve assertions across a fall-through edge with no other incoming edge (i.e. an edge between blocks that could/will be merged), and it handles this case. But that is unlikely to be something we'd consider until post-3.0. cc @dotnet/jit-contrib |
Hmm, perhaps I should a PR to add comments to VN copyprop since I starred a lot at that in the past (and AFAIR one of the existing comments is wrong/misleading). The short version is that:
|
Right - since we have pruned (semi-pruned?) SSA, we need to rely on liveness at block boundaries to ensure that we don't propagate any lclVars that may no longer be valid. That said, there's no correctness requirement to remove last-uses from the live/available set within a BasicBlock (that's what I was describing above), and I'm sure there must be a better heuristic than never extending those live ranges. Especially since |
Yes, there are some improvements that can be made. One thing I tried was to make SSA mark single use defs (which is pretty much always the case with these pesky copy chains) and use that to override the liveness based heuristic:
If AFAIR that resulted in a 10k diff but there were regressions too. That coupled with the fact that VN copyprop code is also pretty ugly and inefficient made me move away to other things. I ended up using the SSA single-use thing in forward substitution experiment that too can eliminate such copy chains but can do other things as well. |
BTW, maybe it's worth trying non pruned SSA, perhaps it's not too costly. And maybe that would also eliminate the need for the liveness pass before SSA (yes, it's pruned, not semi). Though there's still the pesky issue of the VN copyprop heuristic. Ultimately the only good way to do copyprop is allow going into the non-conventional SSA form. But then that requires an out of SSA pass and with the current LclVar design that's probably not going to be easy. |
- Within a block, don't remove last uses from the live set, enabling us to catch the `b = a (last use); c = b` case. - We don't label defs VNs, so lookup the VN for DEF, not just USASG. Fix #24912
Seems like this is fixed now? Sharplab 6.0 codegen is identical between |
This is likely resolved due to the work @EgorBo had done to take advantage of the token resolution during inlining support I added. It means we can actually see its an |
Since it appears to be fixed, I'm closing. |
SharpLab
Writing and comparing some intrinsic code for vectors compared to
Sys.Numerics.Vector4
equivalents, I noticed a significant slowdown (~20%) in a tight loop of vectorised addition. I verified the native for the actual add methods is identical, and it is, with theif (Sse.IsSupported)
check. However, when it is inlined as part of the loop method (as desired), the codegen is not identical, andis transformed to
, which I think is a completely pointless op, as it doesn't have an effect on the end result (and it slows it down, so clearly not for optimization or anything)
Tested on .NET Core preview 5 as well as on whatever preview version SharpLab is running
Image for easy visualisation of issue
category:cq
theme:vector-codegen
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: