Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: implement forward substitution #4655

Closed
gafter opened this issue Nov 8, 2015 · 12 comments
Closed

JIT: implement forward substitution #4655

gafter opened this issue Nov 8, 2015 · 12 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@gafter
Copy link
Member

gafter commented Nov 8, 2015

This is moved from dotnet/roslyn#6542

Commenting out the first two lines of this for loop and uncommenting the third results in a 42% speedup:

int count = 0;
for (uint i = 0; i < 1000000000; ++i) {
    var isMultipleOf16 = i % 16 == 0;
    count += isMultipleOf16 ? 1 : 0;
    //count += i % 16 == 0 ? 1 : 0;
}

Behind the timing is vastly different assembly code: 13 vs. 7 instructions in the loop. The platform is Windows 7 running .NET 4.0 x64. Code optimization is enabled, and the test app was run outside VS2010.

Repro project
StackOverflow question on this topic

/cc @breyed

Note that this issue has also been demonstrated in VS2015.

category:cq
theme:forward-substitution
skill-level:expert
cost:extra-large

@mikedn
Copy link
Contributor

mikedn commented Nov 8, 2015

The title is a bit misleading as the local variable is actually eliminated, there's no memory load/store anywhere in the generated code.

Related to #4207 as the problem is really the ceq/brtrue sequence and not local variable elimination. Also related to #4399 as both versions generate a less than ideal bool to 0/1 conversion. Also related to #4353 because both versions generate 2 useless instructions for i % 16.

The mod to and conversion has a problem that I haven't seen reported before - it generates a and, test sequence instead of a single test.

/cc @cmckinsey

@cmckinsey
Copy link
Contributor

Thanks for the report Neil. In general I'd like the JIT to be less sensitive to minor code-shape/structure issues such as whether or not a single-def/single-use named variable is used rather than embedded in the consuming tree. While users can work-around this in limited fashion, in the presence of inlining this becomes impossible. This transformation proves crucial in other MS compilers that have extensive tree-optimization capability to create bigger trees.

@mikedn Thanks for the analysis. While there are other things going on here (like missed setcc transformation even in the original version), the largest issue is lack of forward-substitution. We should ideally change the name of this issue to reflect missing forward-substitution transformation. However feel free to open an issue for the and/test->test combining.

Thanks.

@mikedn
Copy link
Contributor

mikedn commented Nov 16, 2015

@cmckinsey

However feel free to open an issue for the and/test->test combining.

No need, if the div/mod PR is merged this particular issue is solved because the and/test -> test transform is done in LowerCmp.

@papaslavik
Copy link
Contributor

@cmckinsey

This transformation proves crucial in other MS compilers that have extensive tree-optimization capability to create bigger trees.

Are there any plans for contributing such a feature to coreclr?

@GSPP
Copy link

GSPP commented Jul 13, 2016

Are there any plans to move to an SSA-based internal representation? LLVM has that and I believe GCC as well. The Hotspot JIT has it, too. V8 does this, too. AFAIK this is the most modern and flexible way to represent code inside of a compiler. All these issues that are caused by specific tree shapes should go away.

@SunnyWar
Copy link
Contributor

SunnyWar commented Jul 13, 2016

@GSPP Latest MS Visual C++ compiler has SSA back end also: [https://blogs.msdn.microsoft.com/vcblog/2016/05/04/new-code-optimizer/] The link provides some examples of code the JIT handles pretty poorly.

@GSPP
Copy link

GSPP commented Jul 13, 2016

So the old VC compiler was tree based (because it was not SSA based). Some .NET JITs are based on the VC codebase according to public information. I don't recall which ones. Maybe RyuJIT inherited this undesirable tree-based structure.

@mikedn
Copy link
Contributor

mikedn commented Jul 13, 2016

@GSPP @SunnyWar Guys, seriously, you can be sure that the JIT team knows what SSA is and many other things. RyuJIT uses SSA (which doesn't exclude being tree based) and VC++ has been using SSA for a long time (and I doubt that it's tree based). This kind of posts do not contribute anything to this issue or any other issues.

@GSPP
Copy link

GSPP commented Jul 13, 2016

@mikedn maybe you're right.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@AndyAyersMS AndyAyersMS changed the title JIT does not eliminate local variables JIT: implement forward substitution Feb 24, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 18, 2020
@BruceForstall
Copy link
Member

@AndyAyersMS Can we close this and just keep #6973?

@AndyAyersMS
Copy link
Member

This one has more cross-links, so if we close it, maybe we should enumerate the related issues in the survivor?

@BruceForstall
Copy link
Member

This one has more cross-links, so if we close it, maybe we should enumerate the related issues in the survivor?

I generally assume that someone addressing such a big work item will look at the linked items, even if closed.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

9 participants