New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partly mitigate bad Clang inlining decision #66
Partly mitigate bad Clang inlining decision #66
Conversation
Because a visitor is wrapped several times during visitation it cases extra temporaries usage and useless store and loads that can only be optimized if the `visitation_impl` is inlined into the function that creates the wrapper. Clang inliner decides not to inline functions even with small-sized switches, resulting in a poor visitation code. Forceinline mark on those internal functions perceptibly improves the situation, though does not mitigate it completely. LLVM ticket https://bugs.llvm.org/show_bug.cgi?id=41491
I'm not a big fan of |
Nope, as you can see I have literally replaced |
Clang has heuristics on explicit I do not like force inline, as it is too often misused. I think that the compiler should decide when to inline and when not. If the compiler takes the wrong decision - then the compiler should be fixed, not the code. Fixing codegen with force inline one one platform could make the codegen on other platform worse. |
Did not know that, seems to be true. https://godbolt.org/z/q8JUle
This will not change anything. The thing is that clang seems to decide inlining or not exclusively on function size. Explicit
It is not the case. The functions I have added forceinline to are exclusively internal. That's why I did not add the forceinline to
In ideal world yes. But you know, "zero cost abstractions" are not always zero cost in reality.
There is nothing wrong decision on compiler side. It did not offer perfect inlining, and I do not think inlining is something that can be done perfectly. GCC seems to be simply much more aggressive, and its just other side of the coin. The code is not compiler friendly, optimizing memory pointers is a very tough problem. If the visitor were stored and passed by value the situation should have been probably be much better. |
Many thanks! |
Because a visitor is wrapped several times during visitation it cases extra temporaries usage and useless store and loads that can only be optimized if the
visitation_impl
is inlined into the function that creates the wrapper. Clang inliner decides not to inline functions even with small-sized switches, resulting in a poor visitation code. Forceinline mark on those internal functions perceptibly improves the situation, though does not mitigate it completely.LLVM ticket https://bugs.llvm.org/show_bug.cgi?id=41491
Before:
After: