-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use the INC or DEC instructions in x86 or x64 CodeGen #7697
Comments
So? Last time I checked Intel's own compiler didn't follow this particular rule. In general, INC/DEC are more likely to cause problems if the flags are used after such an instruction. This isn't usually the case in JIT generated code. |
@mikedn, regardless of whether their own compiler does it. This is overall a fairly trivial fix that is listed as having a medium impact (essentially it is removing a check in the code gen that says use inc if the addend is a constant of 1). |
Given that the fix is trivial to make, it should be fairly simple to make it and see if it has an visible perf impact. |
Yes, it is trivial, I looked at this many months ago. The problem is that |
Well, we have a plethora of perf tests now that would make this much easier to validate 😄 |
Well, have fun 😄. Note that this might be more effective after/together with #6794 |
@russellhadley @fiigii @litian2025 Do you want to weigh in on this? My impression is that "avoid inc/dec" is outdated advice, but I'm not sure. @dotnet/jit-contrib |
@RussKeldorph, this is coming from the June 2016 copy of the SOG. It might be important to note that this case was improved with the Sandy Bridge architecture and newer, but they still recommend avoiding the instructions where possible. |
We should not avoid emitting inc/dec anymore: check out VC++ default codegen and you will see we use it liberally there. -Juan (Intel) |
..and you should see that in the case where VC++ can detect the consumer of a flag generated by a partial flag writer, it should be emitting add/sub instructions -- the point is that you should not see blanket substitution of inc/dec by add/sub everywhere. -Juan (Intel) |
Yes, I agree with what Juan's points above, beside VC++, in Chakra, here is what we do it, first check if there's partial flag dependency, then make the changes only when necessary: https://github.com/Microsoft/ChakraCore/blob/master/lib/Backend/Peeps.cpp#L160_#L172 |
@RussKeldorph, do you know if CoreCLR has the appropriate logic to detect the partial flag dependency (and uses add/sub in those cases)? If so, I'm fine with closing the bug; otherwise, we should probably keep it open to track adding such logic. |
@tannergooding @RussKeldorph @fiigii @jarodrig we will take a look at it and report back. |
@litian2025 It's not clear what Chakra does and why but there's nothing like this in JIT. |
ChakraCore does this in peephole optimization with the rules:
When the peephole optimizer encounters an RyuJIT does not have peephole optimization, but I think that we can implement this in code generator by checking previous/later instructions through |
Yes, a saw that comment but it's not clear where did those rules come from. For example, one variant is JCC followed by INC/DEC, what's that for? And the Chakra peep seems to be targeted solely at Atoms...
Yes, that should work for INC/DEC followed by JCC. It won't work the other way around because JCC followed by INC/DEC implies different basic blocks. |
@mikedn Yes, this ChakraCore peep only works for Atom. According to Intel® 64 and IA-32 Architectures
The partial flags stall only happens on older CPUs (P4) and Atom, for new Intel CPUs are able to rename each flag bit separately. |
So we can close this issue as optimizing for the ancient P4 architectural is not currently necessary Pentium 4 was a line of single-core central processing units (CPUs) for desktops, laptops and entry-level servers introduced by Intel on November 20, 2000 And the Atom versions were introduced in 2008-2011 for budget PC buyers. |
[Correction] |
In any case the impacted CPU's are all fairly old processors: Nehalem was replaced with the Sandy Bridge microarchitecture, released in January 2011. |
@briansull, that is barely older (less than a month) than Windows 7 SP1 which still has some 48% of the market share (and which still has extended supported until Jan 14, 2020). |
And Yes, it is totally supported, with an occasional 2 cycle stall when an inc or dec instruction is encountered. |
@mikedn for chakra code peep optimization JCC followed by INC/DEC is in case of backward branch operation like a loop |
@briansull, I would think we would want to optimize this scenario as the actual optimization seems fairly trivial and it hits a fair bit of code (most |
@litian2025 I'm not familiar with Chakra's IR but it looks very much like linear IR so it's not clear how being next in linear order has anything to do with loops. It looks more like it is detecting an INC that is outside of the loop. |
@mikedn maybe an example can help here: Some Compare Instruction the purpose of the code is to catch pattern like this between JCC and INC/DEC |
Thanks for the example. That doesn't look like typical codegen for a loop, most of the time compilers turn Well, it's a peep, it does what it can. |
When running on old processors.
Not really. Most G_M55886_IG03:
8BCE mov ecx, esi
E81AFCFFFF call System.Console:WriteLine(int)
FFC6 inc esi
83FE05 cmp esi, 5
7CF2 jl SHORT G_M55886_IG03 The flags used by |
@mikedn Just out of curiosity, why can't the JIT transform count-up loops into count-down loops? FWIW, I tried manually transforming a count-up loop to a count-down loop and got Here I'd say the 'test eax,eax' was unnecessary. |
It doesn't seem likely we will address this, and it's even more out-of-date now, so closing. |
We are currently emitting the
INC
andDEC
instructions for x86 and x64 CodeGen. For example, the following loop:generates the following assembly:
However, the
Intel 64 and IA-32 Architectures Optimization Reference Manual
recommends the following:category:cq
theme:basic-cq
skill-level:expert
cost:small
The text was updated successfully, but these errors were encountered: